Bug 78268 - [HSW Bisected]Many subtests of igt/kms_flip fail
Summary: [HSW Bisected]Many subtests of igt/kms_flip fail
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: highest normal
Assignee: Jesse Barnes
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-05-05 02:41 UTC by Guo Jinxian
Modified: 2016-10-13 08:22 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (125.46 KB, text/plain)
2014-05-05 02:41 UTC, Guo Jinxian
no flags Details

Description Guo Jinxian 2014-05-05 02:41:33 UTC
Created attachment 98440 [details]
dmesg

*System Environment:
--------------------------
Platform: HSW
kernel: 
-nightly: 08ce6614d07dd1e426109672a5e323317c8d6ec7(fails)
-queued: e5c03ca362819ba8ffbe5674340b61b9cd75de8f (fails)
-fixes: 9bbfd20abe5025adbb0ac75160bd2e41158a9e83 (works)


 *Bug detailed description:
-----------------------------
Many subtests of igt/kms_flip below fail

igt/kms_flip/blocking-wf_vblank
igt/kms_flip/dpms-off-confusion
igt/kms_flip/dpms-off-confusion-interruptible
igt/kms_flip/flip-vs-absolute-wf_vblank
igt/kms_flip/flip-vs-absolute-wf_vblank-interruptible
igt/kms_flip/flip-vs-blocking-wf-vblank
igt/kms_flip/flip-vs-expired-vblank
igt/kms_flip/flip-vs-expired-vblank-interruptible
igt/kms_flip/flip-vs-fences
igt/kms_flip/flip-vs-fences-interruptible
igt/kms_flip/flip-vs-panning
igt/kms_flip/flip-vs-panning-interruptible
igt/kms_flip/flip-vs-wf_vblank
igt/kms_flip/flip-vs-wf_vblank-interruptible
igt/kms_flip/modeset-vs-vblank-race-interruptible
igt/kms_flip/plain-flip
igt/kms_flip/plain-flip-fb-recreate
igt/kms_flip/plain-flip-fb-recreate-interruptible
igt/kms_flip/plain-flip-interruptible
igt/kms_flip/plain-flip-ts-check
igt/kms_flip/plain-flip-ts-check-interruptible
igt/kms_flip/wf_vblank-ts-check
igt/kms_flip/wf_vblank-ts-check-interruptible



It's a regression bug
Good commit: b7c0d9df97c10ec5693a838df2fd53058f8e9e96
Bad commit: e5c03ca362819ba8ffbe5674340b61b9cd75de8f 
We will bisect is later

Output:
./kms_flip --run-subtest flip-vs-panning
IGT-Version: 1.6-gc864279 (x86_64) (Linux: 3.15.0-rc2_drm-intel-nightly_08ce66_20140504_debug+ x86_64)
Using monotonic timestamps
Beginning flip-vs-panning on crtc 5, connector 24
  1920x1200 60 1920 1968 2000 2080 1200 1203 1209 1235 0x9 0x48 154000
.............................................................................................................unexpected flip seq 26431, should be >= 26432
Subtest flip-vs-panning: FAIL

 *Reproduce steps:
---------------------------- 
1. ./kms_flip --run-subtest flip-vs-panning
Comment 1 Daniel Vetter 2014-05-15 22:18:41 UTC
Ping for bisect result ...
Comment 2 Guo Jinxian 2014-05-21 04:54:58 UTC
(In reply to comment #1)
> Ping for bisect result ...

The Bad commit: e5c03ca362819ba8ffbe5674340b61b9cd75de8f was passed now. and the result was passed on latest -next-queued(bc76e320f21f8bd790a72bd5dc06909617432352)
Comment 3 Daniel Vetter 2014-05-22 06:48:01 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > Ping for bisect result ...
> 
> The Bad commit: e5c03ca362819ba8ffbe5674340b61b9cd75de8f was passed now. and
> the result was passed on latest
> -next-queued(bc76e320f21f8bd790a72bd5dc06909617432352)

I'm sorry but I don't understan what you're trying to tell me here :(

What's the bisect?

Doe latest -nightly work now?

I'm really confused.
Comment 4 Guo Jinxian 2014-05-23 03:23:27 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > Ping for bisect result ...
> > 
> > The Bad commit: e5c03ca362819ba8ffbe5674340b61b9cd75de8f was passed now. and
> > the result was passed on latest
> > -next-queued(bc76e320f21f8bd790a72bd5dc06909617432352)
> 
> I'm sorry but I don't understan what you're trying to tell me here :(
> 
> What's the bisect?
> 
> Doe latest -nightly work now?
> 
> I'm really confused.

I did more try, and found that this issue only able to reproduce on one hsw device. other hsw devices are passed.
Comment 5 Guo Jinxian 2014-05-23 03:24:59 UTC
bisected on the bad hsw device.
33e8465e8759077106474cbf4f3d87612c41411d is the first bad commit

commit 33e8465e8759077106474cbf4f3d87612c41411d
Author:     Imre Deak <imre.deak@intel.com>
AuthorDate: Fri Apr 25 13:19:05 2014 +0300
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Wed Apr 30 10:06:25 2014 +0200

    drm/i915: vlv: init only needed state during early power well enabling

    During the initial power well enabling on the driver init/resume path
    we can avoid initialzing part of the HW/SW state that will be
    initialized anyway by the subsequent init/resume code. For some steps
    like HPD initialization this redundancy is not only an overhead but an
    actual problem, since they can't be run this early in the overall init
    sequence.

    Add a flag marking the init phase and skip reinitialzing state that is
    not strictly necessary based on that.

    This is also needed by the upcoming HPD init restructuring by Thierry
    and Daniel.

    Signed-off-by: Imre Deak <imre.deak@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Revert this commit, the test result was passed.
Comment 6 Imre Deak 2014-05-26 12:47:00 UTC
(In reply to comment #5)
> bisected on the bad hsw device.
> 33e8465e8759077106474cbf4f3d87612c41411d is the first bad commit
> 
> commit 33e8465e8759077106474cbf4f3d87612c41411d
> Author:     Imre Deak <imre.deak@intel.com>
> AuthorDate: Fri Apr 25 13:19:05 2014 +0300
> Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
> CommitDate: Wed Apr 30 10:06:25 2014 +0200
> 
>     drm/i915: vlv: init only needed state during early power well enabling
> 
>     During the initial power well enabling on the driver init/resume path
>     we can avoid initialzing part of the HW/SW state that will be
>     initialized anyway by the subsequent init/resume code. For some steps
>     like HPD initialization this redundancy is not only an overhead but an
>     actual problem, since they can't be run this early in the overall init
>     sequence.
> 
>     Add a flag marking the init phase and skip reinitialzing state that is
>     not strictly necessary based on that.
> 
>     This is also needed by the upcoming HPD init restructuring by Thierry
>     and Daniel.
> 
>     Signed-off-by: Imre Deak <imre.deak@intel.com>
>     Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Revert this commit, the test result was passed.

This commit doesn't affect HSW, as it changes only VLV specific parts. Maybe the problem you're seeing doesn't happen always resulting in an incorrect bisect result. Could you try testing each commit multiple times to make sure you don't hit the problem already before the above commit in the bisect sequence?
Comment 7 Guo Jinxian 2014-05-27 06:56:43 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > bisected on the bad hsw device.
> > 33e8465e8759077106474cbf4f3d87612c41411d is the first bad commit
> > 
> > commit 33e8465e8759077106474cbf4f3d87612c41411d
> > Author:     Imre Deak <imre.deak@intel.com>
> > AuthorDate: Fri Apr 25 13:19:05 2014 +0300
> > Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
> > CommitDate: Wed Apr 30 10:06:25 2014 +0200
> > 
> >     drm/i915: vlv: init only needed state during early power well enabling
> > 
> >     During the initial power well enabling on the driver init/resume path
> >     we can avoid initialzing part of the HW/SW state that will be
> >     initialized anyway by the subsequent init/resume code. For some steps
> >     like HPD initialization this redundancy is not only an overhead but an
> >     actual problem, since they can't be run this early in the overall init
> >     sequence.
> > 
> >     Add a flag marking the init phase and skip reinitialzing state that is
> >     not strictly necessary based on that.
> > 
> >     This is also needed by the upcoming HPD init restructuring by Thierry
> >     and Daniel.
> > 
> >     Signed-off-by: Imre Deak <imre.deak@intel.com>
> >     Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > 
> > Revert this commit, the test result was passed.
> 
> This commit doesn't affect HSW, as it changes only VLV specific parts. Maybe
> the problem you're seeing doesn't happen always resulting in an incorrect
> bisect result. Could you try testing each commit multiple times to make sure
> you don't hit the problem already before the above commit in the bisect
> sequence?

Yes, the problem you're seeing doesn't happen always. I tried 20 times for each commit, and find the first bad commit below.

5bb0c2fd8c41dbee8ada1124d625c05765cd3f02 is the first bad commit
commit 5bb0c2fd8c41dbee8ada1124d625c05765cd3f02
Author:     Ben Widawsky <benjamin.widawsky@intel.com>
AuthorDate: Fri Apr 18 18:04:29 2014 -0300
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Wed Apr 30 10:06:19 2014 +0200

    drm/i915/bdw: Disable idle DOP clock gating

    It seems we need this at least for the current platforms we have, but
    probably not later. In any event, it should cause too much harm as we do
    the same thing on several other platforms.

    Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
    Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
    Reviewed-by: Brad Volkin <bradley.d.volkin@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>


But I revert this commit and run the test again, this issue still able to reproduce.
Comment 8 Jesse Barnes 2014-06-05 20:54:13 UTC
So you say this is a regression, but you've bisected to two commits that aren't the problem...  Which is it?  regression or just an intermittent problem?

Does it still only affect one device?  Is it an early stepping by chance?  Is the CPU seated correctly?  I'm tempted to just mark this one invalid...
Comment 9 Guo Jinxian 2014-06-16 01:21:34 UTC
(In reply to comment #8)
> So you say this is a regression, but you've bisected to two commits that
> aren't the problem...  Which is it?  regression or just an intermittent
> problem?
> 
> Does it still only affect one device?  Is it an early stepping by chance? 
> Is the CPU seated correctly?  I'm tempted to just mark this one invalid...

It is intermittent problem, but I tried 20 times on each commit during bisecting.

The test was blocked by Bug 73640 now.

Please check the system information below:
lspci -nnn
00:00.0 Host bridge [0600]: Intel Corporation Haswell DRAM Controller [8086:0c00] (rev 06)
00:02.0 VGA compatible controller [0300]: Intel Corporation Haswell Integrated Graphics Controller [8086:0412] (rev 06)
00:03.0 Audio device [0403]: Intel Corporation Haswell HD Audio Controller [8086:0c0c] (rev 06)
00:14.0 USB controller [0c03]: Intel Corporation Lynx Point USB xHCI Host Controller [8086:8c31] (rev 04)
00:16.0 Communication controller [0780]: Intel Corporation Lynx Point MEI Controller #1 [8086:8c3a] (rev 04)
00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection I217-V [8086:153b] (rev 04)
00:1a.0 USB controller [0c03]: Intel Corporation Lynx Point USB Enhanced Host Controller #2 [8086:8c2d] (rev 04)
00:1b.0 Audio device [0403]: Intel Corporation Lynx Point High Definition Audio Controller [8086:8c20] (rev 04)
00:1c.0 PCI bridge [0604]: Intel Corporation Lynx Point PCI Express Root Port #1 [8086:8c10] (rev d4)
00:1c.2 PCI bridge [0604]: Intel Corporation Lynx Point PCI Express Root Port #3 [8086:8c14] (rev d4)
00:1c.3 PCI bridge [0604]: Intel Corporation Lynx Point PCI Express Root Port #4 [8086:8c16] (rev d4)
00:1c.4 PCI bridge [0604]: Intel Corporation Lynx Point PCI Express Root Port #5 [8086:8c18] (rev d4)
00:1d.0 USB controller [0c03]: Intel Corporation Lynx Point USB Enhanced Host Controller #1 [8086:8c26] (rev 04)
00:1f.0 ISA bridge [0601]: Intel Corporation Lynx Point LPC Controller [8086:8c44] (rev 04)
00:1f.2 SATA controller [0106]: Intel Corporation Lynx Point 6-port SATA Controller 1 [AHCI mode] [8086:8c02] (rev 04)
00:1f.3 SMBus [0c05]: Intel Corporation Lynx Point SMBus Controller [8086:8c22] (rev 04)
01:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 01)
02:00.0 PCI bridge [0604]: Integrated Technology Express, Inc. Device [1283:8892] (rev 41)
03:02.0 FireWire (IEEE 1394) [0c00]: Texas Instruments TSB43AB22A IEEE-1394a-2000 Controller (PHY/Link) [iOHCI-Lynx] [104c:8023]
04:00.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch [10b5:8606] (rev ba)
05:07.0 PCI bridge [0604]: PLX Technology, Inc. PEX 8606 6 Lane, 6 Port PCI Express Gen 2 (5.0 GT/s) Switch [10b5:8606] (rev ba)
06:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
Comment 10 Jesse Barnes 2014-06-25 20:05:51 UTC
I'm not seeing the failure here.  Can you post your lspci output?
Comment 11 Jesse Barnes 2014-06-25 21:02:24 UTC
I went through 100 iterations on my local HSW.  Please re-open if you see this with production systems (with soldered down CPUs).
Comment 12 Guo Jinxian 2014-06-27 06:25:51 UTC
Test passed on latest -nightly(1087d4bf01e79523898c6c31615bf0c369e0039a).

Output:
[root@x-hsw27 tests]# ./kms_flip --run-subtest flip-vs-panning
IGT-Version: 1.7-g7ef5372 (x86_64) (Linux: 3.16.0-rc2_drm-intel-nightly_1087d4_20140627+ x86_64)
Using monotonic timestamps
Beginning flip-vs-panning on crtc 6, connector 16
  1280x1024 60 1280 1328 1440 1688 1024 1025 1028 1066 0x5 0x48 108000
.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
flip-vs-panning on crtc 6, connector 16: PASSED

Beginning flip-vs-panning on crtc 10, connector 16
  1280x1024 60 1280 1328 1440 1688 1024 1025 1028 1066 0x5 0x48 108000
.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
flip-vs-panning on crtc 10, connector 16: PASSED

Beginning flip-vs-panning on crtc 14, connector 16
  1280x1024 60 1280 1328 1440 1688 1024 1025 1028 1066 0x5 0x48 108000
.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
flip-vs-panning on crtc 14, connector 16: PASSED

Subtest flip-vs-panning: SUCCESS
Comment 13 Jari Tahvanainen 2016-10-13 08:22:29 UTC
Closing verified+fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.