Summary: | Hangs on Cayman (HD6950) when watching flash/using vdpau | ||
---|---|---|---|
Product: | Mesa | Reporter: | Martin Bednar <martin> |
Component: | Drivers/Gallium/r600 | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | h.judt, martin, vmerlet |
Version: | git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
dmesg after X freeze
output after hang. netconsole.log netconsole.log |
Description
Martin Bednar
2013-05-22 14:13:03 UTC
Created attachment 79661 [details]
output after hang.
The output I got when trying to cat /sys/debug/kernel/dri/0/*
Sorry for the bad quality.
I too get system hangs when watching a flash video in firefox. linux-3.8.13, libdrm, mesa etc. git. Screen simply becomes black (no signal) and machine is dead, leaving a hard reset as the only option. The dmesg is flooded with the following lines: radeon 0000:01:00.0: GPU fault detected: 147 0x0d859002 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000012D8 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05090002 [...] repeated a hundred times with only the first line changing a bit [...] then: radeon 0000:01:00.0: GPU fault detected: 146 0x07151004 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [...] repeated a hundred times with only the first line changing a bit [...] Times indicate this goes on for approximately two minutes before the hang. (In reply to comment #2) > I too get system hangs when watching a flash video in firefox. linux-3.8.13, > libdrm, mesa etc. git. Screen simply becomes black (no signal) and machine > is dead, leaving a hard reset as the only option. The dmesg is flooded with > the following lines: > > radeon 0000:01:00.0: GPU fault detected: 147 0x0d859002 > radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000012D8 > radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05090002 > [...] repeated a hundred times with only the first line changing a bit [...] Something in the mesa drivers is emitting a command buffer without a proper virtual address for CB5. Hoping that it would be a workaround, I've applied the following patch from another bug report: diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c index 5407459..959e7cf 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -477,6 +477,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev, if (r) { goto out; } + radeon_fence_wait(vm->fence, false); radeon_cs_sync_rings(parser); radeon_cs_sync_to(parser, vm->fence); radeon_cs_sync_to(parser, radeon_vm_grab_id(rdev, vm, parser->ring)); While the hang happened again while playing a flash video (I'll try if I can reproduce it somehow), this time I was able to vt switch, and X was killed and the following additional lines got appended to dmesg: [30243.510949] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [30243.510951] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000005c90bd last fence id 0x00000000005c90b8) [30243.510952] radeon 0000:01:00.0: couldn't schedule ib [30243.510973] radeon 0000:01:00.0: Trying to sync to a disabled ring! [30243.511047] radeon 0000:01:00.0: couldn't schedule ib [30243.511048] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.511197] radeon 0000:01:00.0: couldn't schedule ib [30243.511198] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.512323] radeon 0000:01:00.0: couldn't schedule ib [30243.512324] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.512851] radeon 0000:01:00.0: couldn't schedule ib [30243.512852] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30254.004957] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [30254.004959] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000334 last fence id 0x0000000000000333) [30254.004961] radeon 0000:01:00.0: couldn't schedule ib [30254.005052] radeon 0000:01:00.0: couldn't schedule ib [30254.005064] radeon 0000:01:00.0: couldn't schedule ib [30254.005070] radeon 0000:01:00.0: couldn't schedule ib [30254.005084] radeon 0000:01:00.0: couldn't schedule ib [30254.005092] radeon 0000:01:00.0: couldn't schedule ib [30254.005097] radeon 0000:01:00.0: Trying to sync to a disabled ring! [...] [30243.510949] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [30243.510951] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000005c90bd last fence id 0x00000000005c90b8) [30243.510952] radeon 0000:01:00.0: couldn't schedule ib [30243.510973] radeon 0000:01:00.0: Trying to sync to a disabled ring! [30243.511047] radeon 0000:01:00.0: couldn't schedule ib [30243.511048] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.511197] radeon 0000:01:00.0: couldn't schedule ib [30243.511198] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.512323] radeon 0000:01:00.0: couldn't schedule ib [30243.512324] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30243.512851] radeon 0000:01:00.0: couldn't schedule ib [30243.512852] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [30254.004957] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [30254.004959] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000334 last fence id 0x0000000000000333) [30254.004961] radeon 0000:01:00.0: couldn't schedule ib [30254.005052] radeon 0000:01:00.0: couldn't schedule ib [30254.005064] radeon 0000:01:00.0: couldn't schedule ib [30254.005070] radeon 0000:01:00.0: couldn't schedule ib [30254.005084] radeon 0000:01:00.0: couldn't schedule ib [30254.005092] radeon 0000:01:00.0: couldn't schedule ib [30254.005097] radeon 0000:01:00.0: Trying to sync to a disabled ring! [30254.012901] [drm:radeon_cs_ib_chunk] *ERROR* Failed to schedule IB ! [...] many similar repeated lines about IB [...] [30264.498754] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec [30264.498759] radeon 0000:01:00.0: GPU lockup (waiting for 0x00000000000090e5 last fence id 0x00000000000090e3) Trying to restart X didn't work (X crashed), so I had to reboot the machine. Not sure if this brings any relevations. Anything I could do to provide more information next time when this happens? Doesn't seem to be related to playing flash videos. A few moments ago parts of the screen started looking crazy/corrupted, then on verification again the flood in dmesg, and I was only able to reboot the machine using ssh. Looking at cayman bug reports, many people seem to have the same or similar problems. I'll try a 3.7 or maybe even a 3.6 kernel, perhaps that works reliably. adding R600_DEBUG=nodma to my environment makes the problem go away... Not pretty, but a workaround. Same question though : how could I help debugging this? Created attachment 80920 [details]
netconsole.log
I've been able to get a more complete output using netconsole. I'm not sure if it helps, but here it is.
Here are the steps to reproduce the crash:
1) Go to youtube.com, start playing a video.
=> This prints these lines:
radeon 0000:01:00.0: GPU fault detected: 146 0x0b95e004
radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000010B9
radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x050E0004
2) Close the web browser, activate xscreensaver with an opengl screensaver like photopile.
=> GPU lockup CP stall
Created attachment 80921 [details]
netconsole.log
Last upload didn't work.
> adding R600_DEBUG=nodma to my environment makes the problem go away... > Not pretty, but a workaround. Same question though : how could I > help debugging this? I confirm this helps; Not against the GPU fault when playing the video, but against the crashes/hangs when an opengl xscreensaver etc. activates. Thanks for mentioning the workaround. I've also applied https://bugs.freedesktop.org/attachment.cgi?id=72794, but it doesn't help. With current up-to-date git versions of libdrm, mesa, xorg-server and xf86-video-ati, the R600_DEBUG=nodma hack no longer seems necessary (linux-3.11.0-rc6 with UVD disabled); the GPU faults have vanished and the system is stable. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.