Bug 109085 - Radeon driver crashes with a message "ring 0 stalled for more than 10344msec" when using Citra and Newly free to play game Albion Online on UBUNTU 18.04LTS
Summary: Radeon driver crashes with a message "ring 0 stalled for more than 10344msec"...
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: highest critical
Assignee: sudheer
QA Contact:
URL: https://albiononline.com/
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-18 11:39 UTC by Łukasz Skocz
Modified: 2019-11-19 09:34 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg output (76.17 KB, text/plain)
2018-12-18 11:39 UTC, Łukasz Skocz
no flags Details
another dmesg output (95.75 KB, text/plain)
2018-12-18 11:45 UTC, Łukasz Skocz
no flags Details

Description Łukasz Skocz 2018-12-18 11:39:24 UTC
Created attachment 142845 [details]
dmesg output

Overview: 

Radeon driver crashes and completely corrupts the screen during an attempt to use Citra emulator (always right after loading a game), and sometimes randomly during other GPU-demanding tasks, like using hardware video decoding through VDPAU. Strangely, the X server itself doesn't crash, and i can still see and move the mouse cursor, although heavily corrupted.

Attached dmesg from the time of the crash.


Steps to Reproduce: 

Open Citra, try to load any game.


Actual Results: The driver crashes, resulting in screen corruption.

Expected Results: Citra running the game or failing safely without crashing the driver.

Software versions:

Linux 4.19.9
Mesa 18.3.1
Xorg 1.20.3
xf86-video-ati 18.1.0

lspci output for the GPU:

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV710/M92 [Mobility Radeon HD 4530/4570/545v] (prog-if 00 [VGA controller])
        Subsystem: Dell Mobility Radeon HD 4570 / 545v
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 24
        Region 0: Memory at d0000000 (32-bit, prefetchable) [size=256M]
        Region 1: I/O ports at 2000 [size=256]
        Region 2: Memory at fc000000 (32-bit, non-prefetchable) [size=64K]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (ok), Width x16 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee02004  Data: 4023
        Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Kernel driver in use: radeon
        Kernel modules: radeon
00: 02 10 53 95 07 05 10 00 00 00 00 03 10 00 80 00
10: 08 00 00 d0 01 20 00 00 00 00 00 fc 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 be 02
30: 00 00 00 00 50 00 00 00 00 00 00 00 05 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 be 02
50: 01 58 03 06 00 00 00 00 10 a0 12 00 a0 8f 2c 01
60: 10 09 0a 00 01 0d 04 00 43 00 01 11 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 05 00 81 00 04 20 e0 fe 00 00 00 00 23 40 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00



Additional Information: possibly related bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=102909
Comment 1 Łukasz Skocz 2018-12-18 11:45:23 UTC
Created attachment 142846 [details]
another dmesg output
Comment 2 Octavio Paez 2019-02-14 16:20:12 UTC
Similar situation:

Radeon driver dies whith any load from xorg (i.g. firefox browser, evince, gimp etc.). The system keeps running, nothing else dies.

Output from dmesg:
[ 2067.992330] radeon 0000:01:00.0: ring 0 stalled for more than 10088msec
[ 2067.992339] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000004e138 last fence id 0x000000000004e1d6 on ring 0)
[ 2068.002741] BUG: unable to handle kernel paging request at ffffc90401c1fffc
[ 2068.002810] IP: [<ffffffffc0208eba>] radeon_ring_backup+0xda/0x190 [radeon]
[ 2068.002912] PGD 2371aa067 PUD 0 
[ 2068.002949] Oops: 0000 [#1] SMP 
[ 2068.002986] Modules linked in: drbg ansi_cprng ctr ccm ipt_REJECT nf_reject_ipv4 rfcomm xt_multiport bnep arc4 ath9k ath9k_common ath9k_hw snd_hda_codec_realtek ath mac80211 snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_codec btusb snd_hda_core btrtl btbcm snd_hwdep btintel cfg80211 bluetooth snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq gpio_ich dcdbas dell_smm_hwmon intel_powerclamp snd_seq_device coretemp snd_timer kvm_intel snd kvm mei_me mei input_leds joydev soundcore irqbypass serio_raw shpchp i7core_edac i2c_i801 edac_core lpc_ich mac_hid ip6table_filter ip6_tables iptable_filter ip_tables x_tables binfmt_misc parport_pc ppdev lp parport autofs4 hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect
[ 2068.003826]  sysimgblt fb_sys_fops drm psmouse broadcom bcm_phy_lib firewire_ohci tg3 firewire_core ahci ptp libahci pps_core crc_itu_t fjes
[ 2068.003977] CPU: 2 PID: 1201 Comm: Xorg Not tainted 4.4.167 #1
[ 2068.004024] Hardware name: Dell Inc. Studio XPS 8100/0T568R, BIOS A02 11/27/2009
[ 2068.004081] task: ffff880233a66600 ti: ffff880232644000 task.ti: ffff880232644000
[ 2068.004138] RIP: 0010:[<ffffffffc0208eba>]  [<ffffffffc0208eba>] radeon_ring_backup+0xda/0x190 [radeon]
[ 2068.004249] RSP: 0018:ffff880232647c48  EFLAGS: 00010202
[ 2068.004291] RAX: ffffc900014a8000 RBX: 00000000ffffffff RCX: 0000000000000000
[ 2068.004345] RDX: 0000000000000000 RSI: ffffc90401c1fffc RDI: 0000000000010340
[ 2068.004399] RBP: ffff880232647c78 R08: 00003ffffffff000 R09: 8000000000000163
[ 2068.004453] R10: 0000000000000000 R11: ffffffff81cd74f7 R12: ffff880235f054e0
[ 2068.004507] R13: ffff880235f054b8 R14: 00000000000040d1 R15: ffff880232647cc0
[ 2068.004563] FS:  00007f21afcbda00(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
[ 2068.004624] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2068.004669] CR2: ffffc90401c1fffc CR3: 0000000231f96000 CR4: 0000000000000670
[ 2068.004722] Stack:
[ 2068.004741]  ffff880235f04000 ffff880235f04000 ffff880235f054e0 ffff880232647cc0
[ 2068.004811]  ffff880235f054e0 0000000000000000 ffff880232647d30 ffffffffc01d6d59
[ 2068.004881]  ffff880235f04740 00ff880200000001 ffff880235f04018 ffff8802292d6840
[ 2068.004950] Call Trace:
[ 2068.004999]  [<ffffffffc01d6d59>] radeon_gpu_reset+0xd9/0x350 [radeon]
[ 2068.005054]  [<ffffffff815d0156>] ? fence_wait_timeout+0x86/0x170
[ 2068.005323]  [<ffffffffc0206c0e>] radeon_gem_handle_lockup.part.3+0xe/0x20 [radeon]
[ 2068.005418]  [<ffffffffc0207b35>] radeon_gem_wait_idle_ioctl+0xe5/0x130 [radeon]
[ 2068.005496]  [<ffffffffc00e7885>] drm_ioctl+0x155/0x540 [drm]
[ 2068.005545]  [<ffffffff81092f71>] ? __set_task_blocked+0x41/0xa0
[ 2068.005627]  [<ffffffffc0207a50>] ? radeon_gem_busy_ioctl+0xe0/0xe0 [radeon]
[ 2068.005684]  [<ffffffff8102e5c7>] ? do_signal+0x1b7/0x6f0
[ 2068.005750]  [<ffffffffc01d404c>] radeon_drm_ioctl+0x4c/0x80 [radeon]
[ 2068.005803]  [<ffffffff8122f5df>] do_vfs_ioctl+0x2af/0x4b0
[ 2068.005848]  [<ffffffff8122f859>] SyS_ioctl+0x79/0x90
[ 2068.005892]  [<ffffffff818613db>] entry_SYSCALL_64_fastpath+0x22/0xcb
[ 2068.005941] Code: ff c0 48 85 c0 49 89 07 74 6c 41 8d 7e ff 31 d2 48 c1 e7 02 eb 07 49 8b 07 48 83 c2 04 49 8b 74 24 08 8d 4b 01 89 db 48 8d 34 9e <8b> 36 89 34 10 41 23 4c 24 54 48 39 d7 89 cb 75 da 4c 89 ef e8 
[ 2068.006358] RIP  [<ffffffffc0208eba>] radeon_ring_backup+0xda/0x190 [radeon]
[ 2068.006452]  RSP <ffff880232647c48>
[ 2068.006480] CR2: ffffc90401c1fffc
[ 2068.025324] ---[ end trace b8f815bc378acbe0 ]---
[ 2393.673445] perf interrupt took too long (2512 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[18850.422884] perf interrupt took too long (5254 > 5000), lowering kernel.perf_event_max_sample_rate to 25000


Output from Xorg.0.log:
(EE) [mi] EQ overflowing.  Additional events will be discarded until existing events are processed.
(EE)
(EE) Backtrace:
(EE) 0: /usr/lib/xorg/Xorg (xorg_backtrace+0x4e) [0x55d7c20de6ce]
(EE) 1: /usr/lib/xorg/Xorg (mieqEnqueue+0x253) [0x55d7c20c0173]
(EE) 2: /usr/lib/xorg/Xorg (QueuePointerEvents+0x52) [0x55d7c1f988c2]
(EE) 3: /usr/lib/xorg/modules/input/evdev_drv.so (0x7f219f7d2000+0x61f3) [0x7f219f7d81f3]
(EE) 4: /usr/lib/xorg/modules/input/evdev_drv.so (0x7f219f7d2000+0x6a5d) [0x7f219f7d8a5d]
(EE) 5: /usr/lib/xorg/Xorg (0x55d7c1f2c000+0x94268) [0x55d7c1fc0268]
(EE) 6: /usr/lib/xorg/Xorg (0x55d7c1f2c000+0xb9652) [0x55d7c1fe5652]
(EE) 7: /lib/x86_64-linux-gnu/libc.so.6 (0x7f21ada49000+0x354b0) [0x7f21ada7e4b0]
(EE) 8: /lib/x86_64-linux-gnu/libc.so.6 (ioctl+0x5) [0x7f21adb45f45]
(EE) 9: /usr/lib/x86_64-linux-gnu/libdrm.so.2 (drmIoctl+0x28) [0x7f21aee2e478]
(EE) 10: /usr/lib/x86_64-linux-gnu/libdrm.so.2 (drmCommandWrite+0x1b) [0x7f21aee3121b]
(EE) 11: /usr/lib/x86_64-linux-gnu/dri/r600_dri.so (0x7f21a7e9a000+0x7f3c6c) [0x7f21a868dc6c]
(EE) 12: /usr/lib/x86_64-linux-gnu/dri/r600_dri.so (0x7f21a7e9a000+0x7f3f59) [0x7f21a868df59]
(EE) 13: /usr/lib/x86_64-linux-gnu/dri/r600_dri.so (0x7f21a7e9a000+0x7f56b3) [0x7f21a868f6b3]
(EE) 14: /usr/lib/x86_64-linux-gnu/dri/r600_dri.so (0x7f21a7e9a000+0x6e7c55) [0x7f21a8581c55]
(EE) 15: /usr/lib/x86_64-linux-gnu/dri/r600_dri.so (0x7f21a7e9a000+0x259c49) [0x7f21a80f3c49]
(EE) 16: /usr/lib/x86_64-linux-gnu/dri/r600_dri.so (0x7f21a7e9a000+0x1b8133) [0x7f21a8052133]
(EE) 17: /usr/lib/x86_64-linux-gnu/dri/r600_dri.so (0x7f21a7e9a000+0x1b82d2) [0x7f21a80522d2]
(EE) 18: /usr/lib/xorg/modules/libglamoregl.so (0x7f21a96cf000+0x1c7b5) [0x7f21a96eb7b5]
(EE) 19: /usr/lib/xorg/modules/libglamoregl.so (0x7f21a96cf000+0xf271) [0x7f21a96de271]
(EE) 20: /usr/lib/xorg/Xorg (0x55d7c1f2c000+0x1a1758) [0x55d7c20cd758]
(EE) 21: /usr/lib/xorg/Xorg (0x55d7c1f2c000+0xe040a) [0x55d7c200c40a]
(EE) 22: /usr/lib/xorg/Xorg (0x55d7c1f2c000+0x12eb65) [0x55d7c205ab65]
(EE) 23: /usr/lib/xorg/Xorg (0x55d7c1f2c000+0x53d9f) [0x55d7c1f7fd9f]
(EE) 24: /usr/lib/xorg/Xorg (0x55d7c1f2c000+0x57e13) [0x55d7c1f83e13]
(EE) 25: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf0) [0x7f21ada69830]
(EE) 26: /usr/lib/xorg/Xorg (_start+0x29) [0x55d7c1f6e069]
(EE)
(EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack.
(EE) [mi] mieq is *NOT* the cause.  It is a victim.

Please provide pointers as to where to start digging in the driver code to solve this issue.
Comment 3 Octavio Paez 2019-02-14 16:36:34 UTC
Forgot to include output of lspci -v after failure:

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Juniper XT [Radeon HD 5770] (rev ff) (prog-if ff)
	!!! Unknown header type 7f
	Kernel driver in use: radeon
	Kernel modules: radeon

Octavio
Comment 4 Octavio Paez 2019-02-15 16:56:25 UTC
(In reply to Łukasz Skocz from comment #1)
> Created attachment 142846 [details]
> another dmesg output

Hey Lukasz!

Have you found any solution to this issue?
Does your lspci -v looks similar to mine?
Mine is:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Juniper XT [Radeon HD 5770] (rev ff) (prog-if ff)
	!!! Unknown header type 7f
	Kernel driver in use: radeon
	Kernel modules: radeon

That Unknown header type 7f gets me thinking it may be a HW or power issue.

Thanks,

Octavio
Comment 5 Łukasz Skocz 2019-02-15 17:10:27 UTC
> Have you found any solution to this issue?

I have not. I managed to improve the stability somewhat by adding "radeon.msi=0" to the kernel line, now it only crashes on Citra and nothing else (or maybe it's just placebo and i'm simply lucky). I suspect hardware issue as well, my card is just so old at this point. My lspci is like this:

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RV710/M92 [Mobility Radeon HD 4530/4570/545v] (prog-if 00 [VGA controller])
        Subsystem: Dell Mobility Radeon HD 4570 / 545v
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at d0000000 (32-bit, prefetchable) [size=256M]
        I/O ports at 2000 [size=256]
        Memory at fc000000 (32-bit, non-prefetchable) [size=64K]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Kernel driver in use: radeon
        Kernel modules: radeon
Comment 6 ye_nope 2019-04-21 10:23:12 UTC
the issue is triggered whenever i launch the in gmae world map
Comment 7 Martin Peres 2019-11-19 09:34:42 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/860.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.