Summary: | Dota causes GPU fault and kernel hang | ||
---|---|---|---|
Product: | DRI | Reporter: | Tilman Sauerbeck <tilman> |
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | ||
Version: | DRI git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: |
Description
Tilman Sauerbeck
2015-01-11 20:17:52 UTC
Replaying this trace on my Kaveri, I get similar GPUVM faults, but no hangs. What are the symptoms of the hangs? Sorry about that misinformation. It's not a hang at all since I'm still able to use sysrq to reboot. What's happening after the GPU faults is that apparently the driver attempts to get the card back into working shape, but fails to do so. X doesn't become usable again after the GPU faults anyway. Here's the kernel log following the GPU faults: radeon 0000:01:00.0: ring 0 stalled for more than 10428msec radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000107f5c last fence id 0x00000000001080f7 on ring 0) radeon 0000:01:00.0: failed to get a new IB (-35) [drm:radeon_cs_ib_fill] *ERROR* Failed to get ib ! radeon 0000:01:00.0: Saved 7977 dwords of commands on ring 0. radeon 0000:01:00.0: GPU softreset: 0x00000009 [snipped list of registers that were reset (I think)] [drm] probing gen 2 caps for device 1002:5a16 = 31cd02/0 [drm] PCIE gen 2 link speeds already enabled [drm] PCIE GART of 1024M enabled (table at 0x000000000078C000). radeon 0000:01:00.0: WB enabled radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000080000c00 and cpu addr 0xffff8800bac4cc00 radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000080000c04 and cpu addr 0xffff8800bac4cc04 radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000080000c08 and cpu addr 0xffff8800bac4cc08 radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000080000c0c and cpu addr 0xffff8800bac4cc0c radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000080000c10 and cpu addr 0xffff8800bac4cc10 radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000076c98 and cpu addr 0xffffc90010c36c98 radeon 0000:01:00.0: fence driver on ring 6 use gpu addr 0x0000000080000c18 and cpu addr 0xffff8800bac4cc18 radeon 0000:01:00.0: fence driver on ring 7 use gpu addr 0x0000000080000c1c and cpu addr 0xffff8800bac4cc1c [drm] ring test on 0 succeeded in 3 usecs [drm:cik_ring_test] *ERROR* radeon: ring 1 test failed (scratch(0x3010C)=0xCAFEDEAD) [drm:cik_ring_test] *ERROR* radeon: ring 2 test failed (scratch(0x3010C)=0xCAFEDEAD) [drm:cik_sdma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD) [drm:cik_resume] *ERROR* cik startup failed on resume [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed I can reproduce this on Kabini with current proper mesa and llvm, but not with Tom's perf-Jan-08-2015 llvm + vgpr-spilling-Jan07-2014 mesa branches it works fine there. (In reply to smoki from comment #3) > I can reproduce this on Kabini with current proper mesa and llvm, but not > with Tom's perf-Jan-08-2015 llvm + vgpr-spilling-Jan07-2014 mesa branches it > works fine there. Indeed, switching llvm to perf-Jan-08-2015 fixes the GPU faults. For the record, with my Mesa installation from git master I do get > Warning: Compiler emitted unknown config register: 0x286e8 in glretrace, but that doesn't seem to cause any visible breakage. Should I leave the bug open until the fix hits LLVM trunk? This is fixed now with current mesa and llvm. Tilman, you may want to confirm and eventually to close this bug. (In reply to smoki from comment #5) > This is fixed now with current mesa and llvm. > > Tilman, you may want to confirm and eventually to close this bug. Confirmed. Thanks for the heads-up. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.