Created attachment 79658 [details] dmesg after X freeze When watching a flash video (opera + flash-11.2.202.262 ) the kernel log starts filling up with [ 7009.603310] radeon 0000:01:00.0: GPU fault detected: 146 0x0e677004 [ 7009.603313] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [ 7009.603316] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 to sometimes eventually hang the GPU. When this starts appearing there is graphic corruption everywhere. I also managed to reproduce this when playing back a video using mplayer with the vdpau backend. ( EnableLinuxHWVideoDecode=1 is commented out in /etc/adobe/mms.cfg) I managed to get a dmesg output after the GPU hang (attached) and then tried to save what is in /sys/kernel/debug/dri/0 but not knowing which was important, tried to do it in a bash for cycle, and got a complete hang. linux-3.10-rc1 mesa, libdrm, radeon ddx from git. might be related to https://bugs.freedesktop.org/show_bug.cgi?id=62959 , but piglit just finished fine. (fine = didn't hang in this case, still a bunch of "radeon_gem_object_create:69 alloc size 1365Mb bigger than 256Mb limit" in the logs). Additional info : I have a dual screen setup. Anything more you need, I'll be happy to provide.
Created attachment 79661 [details] output after hang. The output I got when trying to cat /sys/debug/kernel/dri/0/* Sorry for the bad quality.
I too get system hangs when watching a flash video in firefox. linux-3.8.13, libdrm, mesa etc. git. Screen simply becomes black (no signal) and machine is dead, leaving a hard reset as the only option. The dmesg is flooded with the following lines: radeon 0000:01:00.0: GPU fault detected: 147 0x0d859002 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000012D8 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05090002 [...] repeated a hundred times with only the first line changing a bit [...] then: radeon 0000:01:00.0: GPU fault detected: 146 0x07151004 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [...] repeated a hundred times with only the first line changing a bit [...] Times indicate this goes on for approximately two minutes before the hang.
(In reply to comment #2) > I too get system hangs when watching a flash video in firefox. linux-3.8.13, > libdrm, mesa etc. git. Screen simply becomes black (no signal) and machine > is dead, leaving a hard reset as the only option. The dmesg is flooded with > the following lines: > > radeon 0000:01:00.0: GPU fault detected: 147 0x0d859002 > radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000012D8 > radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05090002 > [...] repeated a hundred times with only the first line changing a bit [...] Something in the mesa drivers is emitting a command buffer without a proper virtual address for CB5.
Hoping that it would be a workaround, I've applied the following patch from another bug report: diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 5407459..959e7cf 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -477,6 +477,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, if (r) { goto out; } + radeon_fence_wait(vm->fence, false); radeon_cs_sync_rings(parser); radeon_cs_sync_to(parser, vm->fence); radeon_cs_sync_to(parser, radeon_vm_grab_id(rdev, vm, parser->ring)); While the hang happened again while playing a flash video (I'll try if I can reproduce it somehow), this time I was able to vt switch, and X was killed and the following additional lines got appended to dmesg: [30243.510949] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [30243.510951] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000005c90bd last fence id 0x00000000005c90b8) [30243.510952] radeon 0000:01:00.0: couldn't schedule ib [30243.510973] radeon 0000:01:00.0: Trying to sync to a disabled ring! [30243.511047] radeon 0000:01:00.0: couldn't schedule ib [30243.511048] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.511197] radeon 0000:01:00.0: couldn't schedule ib [30243.511198] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.512323] radeon 0000:01:00.0: couldn't schedule ib [30243.512324] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.512851] radeon 0000:01:00.0: couldn't schedule ib [30243.512852] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30254.004957] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [30254.004959] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000334 last fence id 0x0000000000000333) [30254.004961] radeon 0000:01:00.0: couldn't schedule ib [30254.005052] radeon 0000:01:00.0: couldn't schedule ib [30254.005064] radeon 0000:01:00.0: couldn't schedule ib [30254.005070] radeon 0000:01:00.0: couldn't schedule ib [30254.005084] radeon 0000:01:00.0: couldn't schedule ib [30254.005092] radeon 0000:01:00.0: couldn't schedule ib [30254.005097] radeon 0000:01:00.0: Trying to sync to a disabled ring! [...] [30243.510949] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [30243.510951] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000005c90bd last fence id 0x00000000005c90b8) [30243.510952] radeon 0000:01:00.0: couldn't schedule ib [30243.510973] radeon 0000:01:00.0: Trying to sync to a disabled ring! [30243.511047] radeon 0000:01:00.0: couldn't schedule ib [30243.511048] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.511197] radeon 0000:01:00.0: couldn't schedule ib [30243.511198] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.512323] radeon 0000:01:00.0: couldn't schedule ib [30243.512324] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.512851] radeon 0000:01:00.0: couldn't schedule ib [30243.512852] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30254.004957] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [30254.004959] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000334 last fence id 0x0000000000000333) [30254.004961] radeon 0000:01:00.0: couldn't schedule ib [30254.005052] radeon 0000:01:00.0: couldn't schedule ib [30254.005064] radeon 0000:01:00.0: couldn't schedule ib [30254.005070] radeon 0000:01:00.0: couldn't schedule ib [30254.005084] radeon 0000:01:00.0: couldn't schedule ib [30254.005092] radeon 0000:01:00.0: couldn't schedule ib [30254.005097] radeon 0000:01:00.0: Trying to sync to a disabled ring! [30254.012901] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [...] many similar repeated lines about IB [...] [30264.498754] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [30264.498759] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000090e5 last fence id 0x00000000000090e3) Trying to restart X didn't work (X crashed), so I had to reboot the machine. Not sure if this brings any relevations. Anything I could do to provide more information next time when this happens?
Doesn't seem to be related to playing flash videos. A few moments ago parts of the screen started looking crazy/corrupted, then on verification again the flood in dmesg, and I was only able to reboot the machine using ssh. Looking at cayman bug reports, many people seem to have the same or similar problems. I'll try a 3.7 or maybe even a 3.6 kernel, perhaps that works reliably.
adding R600_DEBUG=nodma to my environment makes the problem go away... Not pretty, but a workaround. Same question though : how could I help debugging this?
Created attachment 80920 [details] netconsole.log I've been able to get a more complete output using netconsole. I'm not sure if it helps, but here it is. Here are the steps to reproduce the crash: 1) Go to youtube.com, start playing a video. => This prints these lines: radeon 0000:01:00.0: GPU fault detected: 146 0x0b95e004 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000010B9 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x050E0004 2) Close the web browser, activate xscreensaver with an opengl screensaver like photopile. => GPU lockup CP stall
Created attachment 80921 [details] netconsole.log Last upload didn't work.
> adding R600_DEBUG=nodma to my environment makes the problem go away... > Not pretty, but a workaround. Same question though : how could I > help debugging this? I confirm this helps; Not against the GPU fault when playing the video, but against the crashes/hangs when an opengl xscreensaver etc. activates. Thanks for mentioning the workaround. I've also applied https://bugs.freedesktop.org/attachment.cgi?id=72794, but it doesn't help.
With current up-to-date git versions of libdrm, mesa, xorg-server and xf86-video-ati, the R600_DEBUG=nodma hack no longer seems necessary (linux-3.11.0-rc6 with UVD disabled); the GPU faults have vanished and the system is stable.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.