Bug 66808

Summary: [HSW Bisected]igt/kms_flip/blocking-absolute-wf_vblank due to power well enabled by default
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Paulo Zanoni <przanoni>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: high CC: przanoni, xunx.fang, yangweix.shui
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
dmesg with drm.debug=7
none
possible fix none

Description lu hua 2013-07-11 03:21:45 UTC
Created attachment 82308 [details]
dmesg

System Environment:
--------------------------
Platform:       Haswell
Kernel:         drm-intel-fixes d4eead50eb206b875f54f66cc0f6ec7d54122c28

Bug detailed description:
-----------------------------
It aborted on Haswell/Pineview with drm-intel-fixes kernel. It doesn't happens on drm-intel-next-queued kernel.

Bisect shows:The first bad commit could be any of:
035dc1e0f9008b48630e02bf0eaa7cc547416d1d
446f8d81ca2d9cefb614e87f2fabcc996a9e4e7e
bf51d5e2cda5d36d98e4b46ac7fca9461e512c41

output:
Using monotonic timestamps
running testcase: blocking-absolute-wf_vblank
Beginning blocking-absolute-wf_vblank on crtc 3, connector 20
  1920x1080 60 1920 1966 1996 2080 1080 1082 1086 1112 0xa 0x48 138780
..............................................................
blocking-absolute-wf_vblank on crtc 3, connector 20: PASSED

Beginning blocking-absolute-wf_vblank on crtc 5, connector 20
  1920x1080 60 1920 1966 1996 2080 1080 1082 1086 1112 0xa 0x48 138780
..............................................................
blocking-absolute-wf_vblank on crtc 5, connector 20: PASSED

Beginning blocking-absolute-wf_vblank on crtc 7, connector 20
  1920x1080 60 1920 1966 1996 2080 1080 1082 1086 1112 0xa 0x48 138780
.............................................................
blocking-absolute-wf_vblank on crtc 7, connector 20: PASSED

Beginning blocking-absolute-wf_vblank on crtc 3, connector 9
  1920x1200 60 1920 1968 2000 2080 1200 1203 1209 1235 0x9 0x48 154000
.............................................................
blocking-absolute-wf_vblank on crtc 3, connector 9: PASSED

Beginning blocking-absolute-wf_vblank on crtc 5, connector 9
  1920x1200 60 1920 1968 2000 2080 1200 1203 1209 1235 0x9 0x48 154000
.run_test_step:749 failed, ret=-16, errno=16
Aborted (core dumped)

Reproduce steps:
----------------------------
1. ./kms_flip --run-subtest blocking-absolute-wf_vblank
Comment 1 Daniel Vetter 2013-07-11 09:56:16 UTC
(In reply to comment #0)
> Bisect shows:The first bad commit could be any of:
> 035dc1e0f9008b48630e02bf0eaa7cc547416d1d
> 446f8d81ca2d9cefb614e87f2fabcc996a9e4e7e
> bf51d5e2cda5d36d98e4b46ac7fca9461e512c41

Have you used git skip or why exactly could bisect not nail down the commit further?

The first two patches should be unrelated, but the 3rd one changes the default for power wells on haswell. Can you please retest latest -nightly with i915.disable_power_well=0 added to your kernel cmdline?

Also is the failure exactly reproducible, i.e. when running this subtest it always fails with this output?


Beginning blocking-absolute-wf_vblank on crtc 5, connector 9
  1920x1200 60 1920 1968 2000 2080 1200 1203 1209 1235 0x9 0x48 154000
.run_test_step:749 failed, ret=-16, errno=16
Aborted (core dumped)

The important part is "crtc 5, connector 9". If the failure happens when testing other crtcs/connectors (on this specific machine, connectors are not numbered the same way depending upon the exact board configuratino) this would be very important to know.

Note to self: Connector 9 here is VGA on the LPT pch.
Comment 2 Chris Wilson 2013-07-11 11:43:17 UTC
The likely cause of EBUSY here requires drm.debug=7 for the error message to be printed to dmesg. Can you please do a run with drm.debug=7?
Comment 3 lu hua 2013-07-12 05:45:39 UTC
Created attachment 82357 [details]
dmesg with drm.debug=7

Test on latest nightly kernel with i915.disable_power_well=0, This issue goes away.

revert bf51d5e2cda5d36d98e4b46ac7fca9461e512c41, this issue goes away.
commit bf51d5e2cda5d36d98e4b46ac7fca9461e512c41
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date:   Wed Jul 3 17:12:13 2013 -0300

    drm/i915: switch disable_power_well default value to 1

    Now that the audio driver is using our power well API, everything
    should be working correctly, so let's give it a try.

    Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 4 Daniel Vetter 2013-07-12 12:49:02 UTC
One for Paulo to figure out. Worst case we need to disable the power well again in 3.11.
Comment 5 Paulo Zanoni 2013-07-15 19:28:01 UTC
But why is this also a regression on Pineview? Makes no sense to me.
Comment 6 Daniel Vetter 2013-07-15 20:30:24 UTC
(In reply to comment #5)
> But why is this also a regression on Pineview? Makes no sense to me.

Iirc we already have a pnv bug for soemthing similar. If not, please file a new one, since this issue here is clearly haswell related.
Comment 7 lu hua 2013-07-16 01:43:44 UTC
It has Bug 60002 on PNV.
Comment 8 Paulo Zanoni 2013-07-19 22:32:16 UTC
Problem also happens if I use eDP+DP instead of eDP+VGA. When it tries DP1 on CRTC 1, it fails. I'll keep investigating.
Comment 9 Paulo Zanoni 2013-07-22 16:31:51 UTC
For the very first time we run the test on pipe B, it works. After the power well is reset for the first time, the test on pipe B starts to fail and keeps failing forever.

Considering that the problem doesn't happen with i915.disable_power_well=0, I wonder if the bug is related to the fact that the pipe counters go back to 0 after we disable the power well (PIPE_FRMCNT, etc). Or maybe it's related to other registers that get reset.
Comment 10 Paulo Zanoni 2013-07-22 16:49:01 UTC
(In reply to comment #9)
> For the very first time we run the test on pipe B, it works. After the power
> well is reset for the first time, the test on pipe B starts to fail and
> keeps failing forever.
> 
> Considering that the problem doesn't happen with i915.disable_power_well=0,
> I wonder if the bug is related to the fact that the pipe counters go back to
> 0 after we disable the power well (PIPE_FRMCNT, etc). Or maybe it's related
> to other registers that get reset.

Yeah, if I set dev->last_vblank[1] = 0; and dev->last_vblank[2] = 0; at the !enable case inside __intel_set_power_well, the problem goes away. Now we need to discover what's the correct way to reset these counters without messing with the drm helpers or the vblank locking.
Comment 11 Paulo Zanoni 2013-07-22 21:57:14 UTC
Created attachment 82845 [details] [review]
possible fix

Hi

Can you please try this patch and repor the result? If fixes the problem on my machine.

Thanks,
Paulo
Comment 12 lu hua 2013-07-23 06:49:19 UTC
(In reply to comment #11)
> Created attachment 82845 [details] [review] [review]
> possible fix
> 
> Hi
> 
> Can you please try this patch and repor the result? If fixes the problem on
> my machine.
> 
> Thanks,
> Paulo

Fixed by this patch.
Comment 14 lu hua 2013-08-13 03:27:51 UTC
Verified.Fixed.
Comment 15 Elizabeth 2017-10-06 14:45:11 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.