Summary: | Crashes on ARUBA unless R600_DEBUG=nodma | ||
---|---|---|---|
Product: | Mesa | Reporter: | udo <udovdh> |
Component: | Drivers/Gallium/r600 | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED DUPLICATE | QA Contact: | |
Severity: | major | ||
Priority: | medium | CC: | udovdh |
Version: | git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Xorg.0.log with R600_DEBUG=nodma
dmesg |
Description
udo
2013-04-01 15:31:59 UTC
Created attachment 77277 [details]
Xorg.0.log with R600_DEBUG=nodma
Created attachment 77278 [details]
dmesg
With R600_DEBUG=nodma we get some mentions of GPU fault but not as often and no crashing the whole PC. I shttps://bugs.freedesktop.org/show_bug.cgi?id=58667 a related issue? It does crash, but without reboot. Gui disappears. Pure text mode screne is shown of first few seconds of boot. No network. Kernel alive. Apr 7 07:59:47 surfplank2 dbus[3118]: [system] Rejected send message, 2 matched rules; type="method_return", sender=":1.2" (uid=0 pid=3090 comm="/usr/lib/systemd/systemd-logind ") interface="(unset)" member ="(unset)" error name="(unset)" requested_reply="0" destination=":1.34" (uid=500 pid=4127 comm="gnome-session ") Apr 7 08:11:39 surfplank2 kernel: [406000.278385] radeon 0000:00:01.0: GPU fault detected: 147 0x0f727102 Apr 7 08:11:39 surfplank2 kernel: [406000.278390] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000018F7 Apr 7 08:11:39 surfplank2 kernel: [406000.278393] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02071002 Apr 7 08:11:39 surfplank2 kernel: [406000.278396] radeon 0000:00:01.0: GPU fault detected: 147 0x0f627102 Apr 7 08:11:39 surfplank2 kernel: [406000.278399] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278401] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278404] radeon 0000:00:01.0: GPU fault detected: 147 0x07527102 Apr 7 08:11:39 surfplank2 kernel: [406000.278406] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278409] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278411] radeon 0000:00:01.0: GPU fault detected: 147 0x07627102 Apr 7 08:11:39 surfplank2 kernel: [406000.278413] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278416] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278418] radeon 0000:00:01.0: GPU fault detected: 147 0x00a27102 Apr 7 08:11:39 surfplank2 kernel: [406000.278420] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278423] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278426] radeon 0000:00:01.0: GPU fault detected: 147 0x00a27102 Apr 7 08:11:39 surfplank2 kernel: [406000.278428] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000000Apr 7 08:17:11 surfplank2 kernel: imklog 5.8.10, log source = /proc/kmsg started. Apr 7 08:17:11 surfplank2 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="3041" x-info="http://www.rsyslog.com"] start FWIW: Another lockup.. [ 9912.997377] nf_conntrack: automatic helper assignment is deprecated and it will be removed soon. Use the iptables CT target to attach helpers instead. [16500.596325] radeon 0000:00:01.0: GPU fault detected: 146 0x0eb27104 [16500.596330] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000008EB [16500.596332] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02071004 [16500.596335] radeon 0000:00:01.0: GPU fault detected: 146 0x0ec27104 [16500.596337] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [16500.596340] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [16500.596342] radeon 0000:00:01.0: GPU fault detected: 147 0x06b27102 [16500.596344] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [16500.596347] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [16500.596349] radeon 0000:00:01.0: GPU fault detected: 147 0x06c27102 [16500.596351] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [16500.596353] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [16511.077533] radeon 0000:00:01.0: GPU lockup CP stall for more than 10000msec [16511.077537] radeon 0000:00:01.0: GPU lockup (waiting for 0x000000000038b92b last fence id 0x000000000038b928) [16511.078189] radeon 0000:00:01.0: sa_manager is not empty, clearing anyway [16511.079467] radeon 0000:00:01.0: Saved 215 dwords of commands on ring 0. [16511.079470] radeon 0000:00:01.0: GPU softreset: 0x00000003 [16511.079473] radeon 0000:00:01.0: VM_CONTEXT0_PROTECTION_FAULT_ADDR 0x00000000 [16511.079475] radeon 0000:00:01.0: VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x00000000 [16511.079478] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [16511.079480] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [16511.261445] radeon 0000:00:01.0: GRBM_STATUS = 0xE5702828 [16511.261447] radeon 0000:00:01.0: GRBM_STATUS_SE0 = 0xFC000005 [16511.261450] radeon 0000:00:01.0: GRBM_STATUS_SE1 = 0x00000007 [16511.261451] radeon 0000:00:01.0: SRBM_STATUS = 0x20000040 [16511.261454] radeon 0000:00:01.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [16511.261456] radeon 0000:00:01.0: R_008678_CP_STALLED_STAT2 = 0x00018000 [16511.261458] radeon 0000:00:01.0: R_00867C_CP_BUSY_STAT = 0x00008006 [16511.261461] radeon 0000:00:01.0: R_008680_CP_STAT = 0x80038647 [16511.261462] radeon 0000:00:01.0: GRBM_SOFT_RESET=0x0000DF7B [16511.261515] radeon 0000:00:01.0: GRBM_STATUS = 0x00003828 [16511.261517] radeon 0000:00:01.0: GRBM_STATUS_SE0 = 0x00000007 [16511.261519] radeon 0000:00:01.0: GRBM_STATUS_SE1 = 0x00000007 [16511.261521] radeon 0000:00:01.0: SRBM_STATUS = 0x20000040 [16511.261523] radeon 0000:00:01.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [16511.261525] radeon 0000:00:01.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [16511.261527] radeon 0000:00:01.0: R_00867C_CP_BUSY_STAT = 0x00000000 [16511.261528] radeon 0000:00:01.0: R_008680_CP_STAT = 0x00000000 [16511.274728] radeon 0000:00:01.0: GPU reset succeeded, trying to resume [16511.463803] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [16511.463892] radeon 0000:00:01.0: WB enabled [16511.463895] radeon 0000:00:01.0: fence driver on ring 0 use gpu addr 0x0000000030000c00 and cpu addr 0xffff8802331cdc00 [16511.463897] radeon 0000:00:01.0: fence driver on ring 1 use gpu addr 0x0000000030000c04 and cpu addr 0xffff8802331cdc04 [16511.463900] radeon 0000:00:01.0: fence driver on ring 2 use gpu addr 0x0000000030000c08 and cpu addr 0xffff8802331cdc08 [16511.463902] radeon 0000:00:01.0: fence driver on ring 3 use gpu addr 0x0000000030000c0c and cpu addr 0xffff8802331cdc0c [16511.463903] radeon 0000:00:01.0: fence driver on ring 4 use gpu addr 0x0000000030000c10 and cpu addr 0xffff8802331cdc10 [16511.482550] [drm] ring test on 0 succeeded in 2 usecs [16511.482609] [drm] ring test on 3 succeeded in 2 usecs [16511.482617] [drm] ring test on 4 succeeded in 1 usecs [16511.497231] [drm] ib test on ring 0 succeeded in 0 usecs [16511.497751] [drm] ib test on ring 3 succeeded in 0 usecs [16511.498269] [drm] ib test on ring 4 succeeded in 1 usecs This may be related to bug 62959. Does attachment 72794 [details] [review] (kernel patch) fix the issue? Will start testing on 3.8.6 in a few minutes. 3.8.6 with and without patch had crashes of various kind. (hard freeze even!) Now doing 3.8.5 without patch, waiting for the raid check to complete. Despite crashes for other reasons (ARUBA (Cayman) not yet ready for OpenCL) I saw no GPU faults etc in the logs since booting into 3.8.5 with the patch. I want to give it a few more days without OpenCL disruptions to be sure. This is starting to look like a duplicate of bug 62959. Can you try attachment 77608 [details] [review]? That seems to fix 62959, hopefully it will fix this one as well. So I undo the previous patch and try this new one? (Or try them combined?) (In reply to comment #12) > So I undo the previous patch and try this new one? > (Or try them combined?) Try them separately, not combined. I guess the second patch also fixes the issue. After 1 day, 15:11 of uptime I saw no GPU faults, hangs, etc. Normally they occurred much sooner than that. *** This bug has been marked as a duplicate of bug 62959 *** |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.