Ever since booting into kernel.org 3.8.4 on my AMD A10-5800K (ARUBA graphics), running git mesa and git xf86-video-ati, I get short uptimes (15 minutes, around one hour max) due to crashes. The logs mention stuff like: [ 1332.480233] radeon 0000:00:01.0: GPU fault detected: 146 0x0134710c [ 1332.480243] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000813 [ 1332.480250] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0407100C Watching youtube `helps` triggering the issue as it appears. (correlates, no real causation yet) Having R600_DEBUG=nodma in the environment solves the problem. Occasionally I see a GPU lockup, if that is related: [29648.098135] disk 0, wo:0, o:1, dev:sda2 [29648.098140] disk 1, wo:0, o:1, dev:sdb2 [29648.098142] disk 2, wo:0, o:1, dev:sdc2 [29648.098145] disk 3, wo:0, o:1, dev:sdd2 [68707.166021] radeon 0000:00:01.0: GPU fault detected: 146 0x0d4c2604 [68707.166030] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000008D4 [68707.166043] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C026004 [70621.378798] radeon 0000:00:01.0: GPU fault detected: 146 0x013c710c [70621.378808] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000813 [70621.378815] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C07100C [70621.378837] radeon 0000:00:01.0: GPU fault detected: 147 0x0f0c7102 [70621.378843] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [70621.378848] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [70621.378854] radeon 0000:00:01.0: GPU fault detected: 147 0x0f1c7102 [70621.378859] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [70621.378864] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [70631.857918] radeon 0000:00:01.0: GPU lockup CP stall for more than 10000msec [70631.857927] radeon 0000:00:01.0: GPU lockup (waiting for 0x00000000007e1fe5 last fence id 0x00000000007e1fe3) [70631.858436] radeon 0000:00:01.0: sa_manager is not empty, clearing anyway [70631.859755] radeon 0000:00:01.0: Saved 951 dwords of commands on ring 0. [70631.859761] radeon 0000:00:01.0: GPU softreset: 0x00000003 [70631.859766] radeon 0000:00:01.0: VM_CONTEXT0_PROTECTION_FAULT_ADDR 0x00000000 [70631.859770] radeon 0000:00:01.0: VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x00000000 [70631.859774] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [70631.859778] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [70631.867299] radeon 0000:00:01.0: GRBM_STATUS = 0xA2703828 [70631.867305] radeon 0000:00:01.0: GRBM_STATUS_SE0 = 0x1D000007 [70631.867309] radeon 0000:00:01.0: GRBM_STATUS_SE1 = 0x00000007 [70631.867313] radeon 0000:00:01.0: SRBM_STATUS = 0x20000040 [70631.867317] radeon 0000:00:01.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [70631.867321] radeon 0000:00:01.0: R_008678_CP_STALLED_STAT2 = 0x00018000 [70631.867325] radeon 0000:00:01.0: R_00867C_CP_BUSY_STAT = 0x00008006 [70631.867328] radeon 0000:00:01.0: R_008680_CP_STAT = 0x80038647 [70631.867332] radeon 0000:00:01.0: GRBM_SOFT_RESET=0x0000DF7B [70631.867386] radeon 0000:00:01.0: GRBM_STATUS = 0x00003828 [70631.867390] radeon 0000:00:01.0: GRBM_STATUS_SE0 = 0x00000007 [70631.867393] radeon 0000:00:01.0: GRBM_STATUS_SE1 = 0x00000007 [70631.867397] radeon 0000:00:01.0: SRBM_STATUS = 0x20000040 [70631.867400] radeon 0000:00:01.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [70631.867404] radeon 0000:00:01.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [70631.867408] radeon 0000:00:01.0: R_00867C_CP_BUSY_STAT = 0x00000000 [70631.867411] radeon 0000:00:01.0: R_008680_CP_STAT = 0x00000000 [70631.883681] radeon 0000:00:01.0: GPU reset succeeded, trying to resume [70631.916445] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [70631.916534] radeon 0000:00:01.0: WB enabled [70631.916536] radeon 0000:00:01.0: fence driver on ring 0 use gpu addr 0x0000000030000c00 and cpu addr 0xffff880235891c00 [70631.916538] radeon 0000:00:01.0: fence driver on ring 1 use gpu addr 0x0000000030000c04 and cpu addr 0xffff880235891c04 [70631.916540] radeon 0000:00:01.0: fence driver on ring 2 use gpu addr 0x0000000030000c08 and cpu addr 0xffff880235891c08 [70631.916541] radeon 0000:00:01.0: fence driver on ring 3 use gpu addr 0x0000000030000c0c and cpu addr 0xffff880235891c0c [70631.916543] radeon 0000:00:01.0: fence driver on ring 4 use gpu addr 0x0000000030000c10 and cpu addr 0xffff880235891c10 [70631.935206] [drm] ring test on 0 succeeded in 3 usecs [70631.935264] [drm] ring test on 3 succeeded in 2 usecs [70631.935271] [drm] ring test on 4 succeeded in 1 usecs [70631.949531] [drm] ib test on ring 0 succeeded in 0 usecs [70631.950057] [drm] ib test on ring 3 succeeded in 0 usecs [70631.950576] [drm] ib test on ring 4 succeeded in 1 usecs
Created attachment 77277 [details] Xorg.0.log with R600_DEBUG=nodma
Created attachment 77278 [details] dmesg
With R600_DEBUG=nodma we get some mentions of GPU fault but not as often and no crashing the whole PC.
I shttps://bugs.freedesktop.org/show_bug.cgi?id=58667 a related issue?
It does crash, but without reboot. Gui disappears. Pure text mode screne is shown of first few seconds of boot. No network. Kernel alive. Apr 7 07:59:47 surfplank2 dbus[3118]: [system] Rejected send message, 2 matched rules; type="method_return", sender=":1.2" (uid=0 pid=3090 comm="/usr/lib/systemd/systemd-logind ") interface="(unset)" member ="(unset)" error name="(unset)" requested_reply="0" destination=":1.34" (uid=500 pid=4127 comm="gnome-session ") Apr 7 08:11:39 surfplank2 kernel: [406000.278385] radeon 0000:00:01.0: GPU fault detected: 147 0x0f727102 Apr 7 08:11:39 surfplank2 kernel: [406000.278390] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000018F7 Apr 7 08:11:39 surfplank2 kernel: [406000.278393] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02071002 Apr 7 08:11:39 surfplank2 kernel: [406000.278396] radeon 0000:00:01.0: GPU fault detected: 147 0x0f627102 Apr 7 08:11:39 surfplank2 kernel: [406000.278399] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278401] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278404] radeon 0000:00:01.0: GPU fault detected: 147 0x07527102 Apr 7 08:11:39 surfplank2 kernel: [406000.278406] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278409] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278411] radeon 0000:00:01.0: GPU fault detected: 147 0x07627102 Apr 7 08:11:39 surfplank2 kernel: [406000.278413] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278416] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278418] radeon 0000:00:01.0: GPU fault detected: 147 0x00a27102 Apr 7 08:11:39 surfplank2 kernel: [406000.278420] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278423] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 Apr 7 08:11:39 surfplank2 kernel: [406000.278426] radeon 0000:00:01.0: GPU fault detected: 147 0x00a27102 Apr 7 08:11:39 surfplank2 kernel: [406000.278428] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000000Apr 7 08:17:11 surfplank2 kernel: imklog 5.8.10, log source = /proc/kmsg started. Apr 7 08:17:11 surfplank2 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="3041" x-info="http://www.rsyslog.com"] start
FWIW: Another lockup.. [ 9912.997377] nf_conntrack: automatic helper assignment is deprecated and it will be removed soon. Use the iptables CT target to attach helpers instead. [16500.596325] radeon 0000:00:01.0: GPU fault detected: 146 0x0eb27104 [16500.596330] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000008EB [16500.596332] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02071004 [16500.596335] radeon 0000:00:01.0: GPU fault detected: 146 0x0ec27104 [16500.596337] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [16500.596340] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [16500.596342] radeon 0000:00:01.0: GPU fault detected: 147 0x06b27102 [16500.596344] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [16500.596347] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [16500.596349] radeon 0000:00:01.0: GPU fault detected: 147 0x06c27102 [16500.596351] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [16500.596353] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [16511.077533] radeon 0000:00:01.0: GPU lockup CP stall for more than 10000msec [16511.077537] radeon 0000:00:01.0: GPU lockup (waiting for 0x000000000038b92b last fence id 0x000000000038b928) [16511.078189] radeon 0000:00:01.0: sa_manager is not empty, clearing anyway [16511.079467] radeon 0000:00:01.0: Saved 215 dwords of commands on ring 0. [16511.079470] radeon 0000:00:01.0: GPU softreset: 0x00000003 [16511.079473] radeon 0000:00:01.0: VM_CONTEXT0_PROTECTION_FAULT_ADDR 0x00000000 [16511.079475] radeon 0000:00:01.0: VM_CONTEXT0_PROTECTION_FAULT_STATUS 0x00000000 [16511.079478] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [16511.079480] radeon 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000 [16511.261445] radeon 0000:00:01.0: GRBM_STATUS = 0xE5702828 [16511.261447] radeon 0000:00:01.0: GRBM_STATUS_SE0 = 0xFC000005 [16511.261450] radeon 0000:00:01.0: GRBM_STATUS_SE1 = 0x00000007 [16511.261451] radeon 0000:00:01.0: SRBM_STATUS = 0x20000040 [16511.261454] radeon 0000:00:01.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [16511.261456] radeon 0000:00:01.0: R_008678_CP_STALLED_STAT2 = 0x00018000 [16511.261458] radeon 0000:00:01.0: R_00867C_CP_BUSY_STAT = 0x00008006 [16511.261461] radeon 0000:00:01.0: R_008680_CP_STAT = 0x80038647 [16511.261462] radeon 0000:00:01.0: GRBM_SOFT_RESET=0x0000DF7B [16511.261515] radeon 0000:00:01.0: GRBM_STATUS = 0x00003828 [16511.261517] radeon 0000:00:01.0: GRBM_STATUS_SE0 = 0x00000007 [16511.261519] radeon 0000:00:01.0: GRBM_STATUS_SE1 = 0x00000007 [16511.261521] radeon 0000:00:01.0: SRBM_STATUS = 0x20000040 [16511.261523] radeon 0000:00:01.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [16511.261525] radeon 0000:00:01.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [16511.261527] radeon 0000:00:01.0: R_00867C_CP_BUSY_STAT = 0x00000000 [16511.261528] radeon 0000:00:01.0: R_008680_CP_STAT = 0x00000000 [16511.274728] radeon 0000:00:01.0: GPU reset succeeded, trying to resume [16511.463803] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). [16511.463892] radeon 0000:00:01.0: WB enabled [16511.463895] radeon 0000:00:01.0: fence driver on ring 0 use gpu addr 0x0000000030000c00 and cpu addr 0xffff8802331cdc00 [16511.463897] radeon 0000:00:01.0: fence driver on ring 1 use gpu addr 0x0000000030000c04 and cpu addr 0xffff8802331cdc04 [16511.463900] radeon 0000:00:01.0: fence driver on ring 2 use gpu addr 0x0000000030000c08 and cpu addr 0xffff8802331cdc08 [16511.463902] radeon 0000:00:01.0: fence driver on ring 3 use gpu addr 0x0000000030000c0c and cpu addr 0xffff8802331cdc0c [16511.463903] radeon 0000:00:01.0: fence driver on ring 4 use gpu addr 0x0000000030000c10 and cpu addr 0xffff8802331cdc10 [16511.482550] [drm] ring test on 0 succeeded in 2 usecs [16511.482609] [drm] ring test on 3 succeeded in 2 usecs [16511.482617] [drm] ring test on 4 succeeded in 1 usecs [16511.497231] [drm] ib test on ring 0 succeeded in 0 usecs [16511.497751] [drm] ib test on ring 3 succeeded in 0 usecs [16511.498269] [drm] ib test on ring 4 succeeded in 1 usecs
This may be related to bug 62959. Does attachment 72794 [details] [review] (kernel patch) fix the issue?
Will start testing on 3.8.6 in a few minutes.
3.8.6 with and without patch had crashes of various kind. (hard freeze even!) Now doing 3.8.5 without patch, waiting for the raid check to complete.
Despite crashes for other reasons (ARUBA (Cayman) not yet ready for OpenCL) I saw no GPU faults etc in the logs since booting into 3.8.5 with the patch. I want to give it a few more days without OpenCL disruptions to be sure.
This is starting to look like a duplicate of bug 62959. Can you try attachment 77608 [details] [review]? That seems to fix 62959, hopefully it will fix this one as well.
So I undo the previous patch and try this new one? (Or try them combined?)
(In reply to comment #12) > So I undo the previous patch and try this new one? > (Or try them combined?) Try them separately, not combined.
I guess the second patch also fixes the issue. After 1 day, 15:11 of uptime I saw no GPU faults, hangs, etc. Normally they occurred much sooner than that.
*** This bug has been marked as a duplicate of bug 62959 ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.