Bug 76368 - [BDW] GPU hang/reset on first X startup with current drm-intel-next
Summary: [BDW] GPU hang/reset on first X startup with current drm-intel-next
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Ben Widawsky
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-19 17:29 UTC by Timo Aaltonen
Modified: 2017-07-24 22:55 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
error state (350.14 KB, text/plain)
2014-03-19 17:29 UTC, Timo Aaltonen
no flags Details
error state v2 (351.84 KB, application/gzip)
2014-03-21 13:15 UTC, Timo Aaltonen
no flags Details
error state v3 (351.83 KB, application/octet-stream)
2014-04-14 07:47 UTC, Timo Aaltonen
no flags Details
error state with a patch on top of bdw-backports (351.68 KB, application/octet-stream)
2014-05-09 17:32 UTC, Timo Aaltonen
no flags Details

Description Timo Aaltonen 2014-03-19 17:29:53 UTC
Created attachment 96065 [details]
error state

Unlike with #75736 here the system works mostly fine after the initial hesitation (gpu reset?), only visible thing is corrupt or missing text on the lightdm login screen. Restarting lightdm makes it look fine, and there is no new hang either.

drm-intel-next at e19b91371429
Comment 1 Chris Wilson 2014-03-19 21:37:34 UTC
The bug looks familiar. ACTHD is garbage, nowhere near where MI_BATCH_BUFFER_START should have sent it, which is very much like bug 75477. Maybe try i915.enable_ppgtt=0 (Ben says it currently is in his bdw branch).
Comment 2 Timo Aaltonen 2014-03-20 08:02:07 UTC
with i915.enable_ppgtt=0 I don't get a hang and texts on lightdm greeter looks fine.

weird thing is that a backport module for 3.13 based on the same branch doesn't see this issue.. I need to double-check the kernels used
Comment 3 Timo Aaltonen 2014-03-20 08:26:14 UTC
verified that it happens with a mainline build from
http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-next/2014-03-19-trusty/

and doesn't with i915.enable_ppgtt=0
Comment 4 Chris Wilson 2014-03-20 08:31:47 UTC
You can also try

Section "Device"
  Option "VSync" "off"
EndSection

to test whether it is the use of secure-batches.
Comment 5 Timo Aaltonen 2014-03-20 08:52:30 UTC
with that option and without i915.enable_ppgtt=0 I get a gpu reset but texts look good on lightdm.
Comment 6 Timo Aaltonen 2014-03-21 13:03:02 UTC
 drm/i915: Broadwell expands ACTHD to 64bit

on top of dinq fixes the text corruption
Comment 7 Timo Aaltonen 2014-03-21 13:14:29 UTC
so I'm just confused about the corruption, I guess it got fixed in dinq at some point.. I don't have a 'clean' build of it available, but RC6 + the ACTHD patch added and it's not there.

anyway, will attach a new error state from this kernel
Comment 8 Timo Aaltonen 2014-03-21 13:15:15 UTC
Created attachment 96164 [details]
error state v2
Comment 9 Ben Widawsky 2014-03-24 21:26:04 UTC
Chris, does the junk after the MI_BATCH_BUFFER_END make sense to you?
Comment 10 Chris Wilson 2014-03-25 07:33:53 UTC
Yup, it's just an embedded vertex buffer and surface state, all checks out as being self-consistent.
Comment 11 Ben Widawsky 2014-03-25 14:58:28 UTC
(In reply to comment #10)
> Yup, it's just an embedded vertex buffer and surface state, all checks out
> as being self-consistent.

So where is the final MI_NOOP? I thought we always need one of those.
Comment 12 Chris Wilson 2014-03-25 15:03:05 UTC
No.
Comment 13 Ben Widawsky 2014-03-26 02:46:58 UTC
Timo, is this with RC6 patches?
Comment 14 Timo Aaltonen 2014-03-26 11:52:57 UTC
it happens with or without, the trace is from a kernel with RC6 patches I believe
Comment 15 Ben Widawsky 2014-03-27 04:17:43 UTC
Can you please get an error state without RC6, just to make certain. There are some oddities here which I want to make sure aren't related to RC6.

Also Damien did post a fix that can prevent hangs today. It should be merged to -next with Cc: stable, if you want to try.
Comment 16 Timo Aaltonen 2014-03-27 10:52:58 UTC
it's fine on bdw-backports at least, no gpu hang.

I'll test dinq again later..
Comment 17 Timo Aaltonen 2014-04-14 07:47:31 UTC
Created attachment 97334 [details]
error state v3

this is from 3.15-rc1, default boot options
Comment 18 Timo Aaltonen 2014-04-28 14:23:34 UTC
same with rc3, and logging in the unity session doesn't finish since compiz fails to start due to this
Comment 19 Timo Aaltonen 2014-05-09 17:32:03 UTC
Created attachment 98771 [details]
error state with a patch on top of bdw-backports

this is with the latest patch to fix 77587 on bdw-backports
Comment 20 Timo Aaltonen 2014-05-15 06:47:21 UTC
I don't get that hang with v3.15-rc4 if ppgtt is disabled (=0).

Also, Ben's 'broadwell' branch doesn't suffer from this.
Comment 21 Timo Aaltonen 2014-05-15 06:50:08 UTC
disabling ppgtt doesn't work for bdw-backports though
Comment 22 Timo Aaltonen 2014-05-15 22:06:54 UTC
So this works fine on current drm-intel-next-queued and is resolved after cherry-picking 9d0a6fa6c5e618bd978d625a215dc4a240ba3b3c to 3.15-rc5. It doesn't do the trick on bdw-backports though, so I'll just give up on that and rebase..


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.