Created attachment 92111 [details] Kernel log showing the backtrace of i915 driver crash After some random time in X the display turns black and can not be brought back. Only after putting the machine to sleep and back on restores the display. The problem seems to occur when switching workspaces. Not sure if this is always the case, though. Reverting back to driver version xf86_video_intel 2.21.15 resolves the problem. See also this bug report: https://bugs.archlinux.org/task/38518 The kernel log including backtrace is attached.
Note that the dmesg here is from a lid-event, which is not spectacularly random. Perhaps you have more than one bug here? Can you please attach your Xorg.0.log as well?
Created attachment 92157 [details] earlier dmesg output showing drm and i915 errors before backtrace dump I updated to version 2.99.907 again, now I'm waiting for the random moment to happen. Will attach Xorg.0.log when I get it. I had tried a lid event to get the screen back to life (which did not help). Here I attached the dmesg just before the lid event backtrace dump. It contains related messages.
Created attachment 92165 [details] Xorg.0.log for a session where display crashes There is little useful output in the Xorg.0.log I fear. Is there a way I can increase verbosity? This crash happened completely random, out of the blue, no workspace switches or other activity involved. I was reading a webpage.
I can also confirm that you (Chris) are right that the backtrace dump (attachement 1) does not appear in dmesg at the time of the display crash. What does appear is this: Jan 15 10:49:15 idefix kernel: [drm] stuck on render ring Jan 15 10:49:15 idefix kernel: [drm] capturing error event; look for more information in /sys/class/drm/card0/error Jan 15 10:49:15 idefix kernel: [drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x3702000 ctx 0) at 0x370373c Jan 15 10:49:15 idefix kernel: [drm:i915_reset] *ERROR* Failed to reset chip.
The hang is bug 73348, fixed by commit 9d8473c5d9489db439aca73f470bda29a22ebab6 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Jan 7 13:43:35 2014 +0000 sna/gen4: Check for available batch space before restoring state after CA pass Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73348 References: https://bugs.freedesktop.org/show_bug.cgi?id=55500 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> But the subsequent reset failure and display failure is unexpected.
Created attachment 92166 [details] /sys/class/drm/card0/error
*** Bug 73662 has been marked as a duplicate of this bug. ***
I've been using UXA for some time and not seen this problem so far. Is that because the GPU is not reset in UXA or is the reset method different?
(In reply to comment #8) > I've been using UXA for some time and not seen this problem so far. Is that > because the GPU is not reset in UXA or is the reset method different? The reset is due to a known issue in SNA, see bug 73348. That the reset fails is unusual and a separate issue.
I have this issue as well when I am testing latest mainline kernel on my Squeeze machine. With the stock 2.6.32 kernel system can run almost for ever. I confirm that this bug happens pretty random. System can run couple of days without an issue or only couple of minutes sometimes. The reason is not clear for me too. Sometimes I just read something on the web and do nothing special, sometimes it happens when I am starting app. I will attach below my dmesg and crash dump. I hope this will help. [drm] stuck on render ring [drm:i915_set_reset_status] *ERROR* render ring hung flushing bo (0x4c3f000 ctx 0) at 0x2ce03e14 [drm:i915_reset] *ERROR* Failed to reset chip
Created attachment 92503 [details] dmesg log (2) dmesg from system with the latest mainline kernel: Linux version 3.13.0-rc8-0105--00005-ga6da83f-dirty
Created attachment 92504 [details] /sys/class/drm/card0/error (2)
I don't know if it's about a GPU reset fails, I just have an issue described in Arch Linux bug #38518 . I have this problem with a system installed freshly on 22.01.2013. What is noticeable, only video blanks out. System runs normally, and I can close all windows using Alt-F4 and even cleanly close the system by trying to blindly click the power-down icon. Linux xxxx 3.12.8-1-ARCH #1 SMP PREEMPT Thu Jan 16 09:16:34 CET 2014 x86_64 GNU/Linux xorg-server 1.15.0-5 xf86-video-intel 2.99.907-2 00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c) 00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (primary) (rev 0c) 00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (secondary) (rev 0c)
Created attachment 92919 [details] Xorg.0.log from session with a crash
I also have a problem where the display goes blank out of nowhere (I noted at least twice it coincided with clicking on some button on qt/kde programs). VT switching restores it though. The X.log gives this: [ 48895.399] (EE) intel(0): sna_mode_redisplay: page flipping failed, disabling CRTC:3 (pipe=0) For each and every time that happens. SNA compiled from 2.99.907-54-g294180b
(In reply to comment #15) > I also have a problem where the display goes blank out of nowhere (I noted > at least twice it coincided with clicking on some button on qt/kde programs). > > VT switching restores it though. > The X.log gives this: > > [ 48895.399] (EE) intel(0): sna_mode_redisplay: page flipping failed, > disabling CRTC:3 (pipe=0) > > For each and every time that happens. > > SNA compiled from 2.99.907-54-g294180b See bug 70905, in particular 2.99.907-61-g4b73a0e.
I havn't run into the blank screen problem "[drm:i915_reset] *ERROR* Failed to reset chip" with version 908 and 909. I'm not sure if that is because "[drm:i915_set_reset_status] *ERROR* render ring hung inside bo (0x3702000 ctx 0) at 0x370373c" no longer happens or if the GPU reset now works. Is there a way I can test whether the GPU reset works on [gen4]?
If you don't see the error message that a hang was detected, we haven't attempted to reset the GPU. The simplest test for GPU reset is "echo 1 > /sys/kernel/debug/dri/0/i915_wedged". However, when the GPU hangs for real it is more likely for the reset to fail.
Just did this test. It produces exactly the crash that I experienced with version 907: [ 1540.594567] [drm] Manually setting wedged to 1 [ 1540.594580] [drm] capturing error event; look for more information in /sys/class/drm/card0/error [ 1541.099920] [drm:i915_reset] *ERROR* Failed to reset chip. I can revive the GPU by putting the whole system to sleep and wake-up again. I guess that means that this bug is not resolved yet, but as long as the GPU doesn't hang the driver doesn't try to reset and this bug is not triggered.
Ville has just posted a patch set and is looking for victims^W volunteers.
(In reply to comment #22) > Ville has just posted a patch set and is looking for victims^W volunteers. Patches pushed here for easier consumption: git://gitorious.org/vsyrjala/linux.git gpu_reset_fixes_2
Ville, what's the status of the patches? Upstreamed, forgotten, what? drrossum, testing the patches helps in getting them upstreamed...
(In reply to comment #24) > Ville, what's the status of the patches? Upstreamed, forgotten, what? I didn't really spend much time on them, so they might have some issues, but at least my 946gz seemed to work with them. If someone wants to play around with them or improve them go ahead. I don't have time atm.
(In reply to comment #24) > drrossum, testing the patches helps in getting them upstreamed... I have not experienced any random crashes anymore after version 908 and 909, as noted in comment #18. I'm now on 2.99.914. I just tried the "i915_wedged" test that Chris suggested in comment #19. It does NOT crash the driver anymore. I have attached /sys/class/drm/card0/error and the tail of dmesg in case anyone is interested. I mark this bug as resolved.
Created attachment 105803 [details] dmesg after wedge test see comment #26
Created attachment 105804 [details] /sys/class/drm/card0/error after wedge test See comment #26
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.