Summary: | [drm] stuck on bsd ring (bisected) | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Rainer Fiebig <mymailclone> | ||||||||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||
Severity: | minor | ||||||||||||
Priority: | medium | CC: | intel-gfx-bugs | ||||||||||
Version: | XOrg git | ||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||
OS: | Linux (All) | ||||||||||||
Whiteboard: | |||||||||||||
i915 platform: | ILK | i915 features: | GPU hang | ||||||||||
Attachments: |
|
Description
Rainer Fiebig
2016-08-18 16:16:31 UTC
Jay, can you attach GPU crash dump saved to /sys/class/drm/card0/error as well as kernel log (ie dmesg) ? thanks Created attachment 125985 [details]
/sys/class/drm/card0/error
Created attachment 125986 [details]
dmesg (part of)
(In reply to yann from comment #1) > Jay, can you attach GPU crash dump saved to /sys/class/drm/card0/error as > well as kernel log (ie dmesg) ? > thanks "error" and "dmesg" (part of) attached, kernel was 4.7.1. It seems there were quite some changes made in the i915-driver between kernels 4.2 and 4.3. So I guess it won't be too easy to figure out where the problem really is. At least the bisected commit offers a starting point. 845g with a bsd ring is impressive ;) This should be fixed on drm-intel-nightly already, don't know if that made it into the 4.8 cut. (In reply to Chris Wilson from comment #5) > 845g with a bsd ring is impressive ;) Oups, you are right, not sure why I clicked on I854G since pci id is 0x0042 (ie Ironlake (Clarkdale))... my bad Thanks Chris for correcting it :) (In reply to Chris Wilson from comment #6) > This should be fixed on drm-intel-nightly already, don't know if that made > it into the 4.8 cut. Sounds good. Had it something to do with the incriminated commit or could I have wasted my time in a better way? ;) There were certainly bugs in how requests + contexts operated on Ironlake that required fixing. But there have also been changes to make hibernation more reliable (hopefully!). And since everything touching the GPU has a request, everything may be related back to the bisect result ;) (In reply to Chris Wilson from comment #9) > There were certainly bugs in how requests + contexts operated on Ironlake > that required fixing. But there have also been changes to make hibernation > more reliable (hopefully!). And since everything touching the GPU has a > request, everything may be related back to the bisect result ;) Thanks! I read this as: "Time perhaps not completely wasted." Alright with me. ;) I was just experiencing a problem after resume from s2disk: frozen windows, missing titlebars, only partly rendered windows, keyboard not working. Could solve it with log out/in but had to shoot a VM. Kernel was 4.4.18. Among some of the new messages in dmesg (see new attachm.) : ... [drm] stuck on render ring [drm] GPU HANG: ecode 5:0:0xfdffffff, in Xorg [1183], reason: Ring hung, action: reset i915 0000:00:02.0: GEM idle failed, resume might fail pci_pm_freeze(): i915_pm_suspend+0x0/0x40 [i915] returns -11 dpm_run_callback(): pci_pm_freeze+0x0/0xe0 returns -11 PM: Device 0000:00:02.0 failed to freeze async: error -11 drm/i915: Resetting chip after gpu hang ... Seems more serious than I first thought. @Chris: Could you give me a hint concerning the patch that you mentioned in your post? I could give it a try. Or is it a whole bunch? In that case I'd rather wait until they are integrated in 4.8 (in git I have 4.8-rc2 and that doesn't seem to have the patches for this problem). Created attachment 126016 [details]
dmesg_4.4.18_problem
I realise I was thinking of some context bugs that don't affect you since Ironlake hasn't enabled HW contexts. Hmm, also the hibernate bug I was thinking of also related to using HW context. Double red herring, sorry. Gut feeling form the error state is that the GPU is being clobbered by the framebuffer upon resume. Could I just tempt you into try an rc or nightly? Even if just to grab an error state? :) (In reply to Chris Wilson from comment #13) ... > thinking of also related to using HW context. Double red herring, sorry. Eat it! ;) > > Gut feeling form the error state is that the GPU is being clobbered by the > framebuffer upon resume. Could I just tempt you into try an rc or nightly? > Even if just to grab an error state? :) I'll try 4.8-rc3 and report back. How was your red herring? ;) 4.8-rc3 looks promising so far: the messages are gone, /sys/class/drm/card0/error was clean. Did several s2disk/resume and all were good, no problems thereafter. If it stays like this, perhaps you should suggest "Double Red Herring" as the name for 4.8? ;) Created attachment 126019 [details]
dmesg_4.8-rc3_part
I've used 4.8-rc3 for some more hours, doing several s2disk-/s2both-resume and everything was fine, like described in my previous post. So I think the problem is indeed solved in 4.8. What perhaps should be done now, is identify the relevant commits (if you do not know them already) so that they can soon be backported to longterm-kernel 4.4 and to 4.7. If you then need help in testing, let me know. Thanks and bye for now! Thanks Jay, closing then this bug |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.