Bug 92755

Summary: [APITRACE] Shadow of Mordor locks up R600
Product: Mesa Reporter: Christoph Brill <egore>
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium CC: mirh, vitor.hda
Version: 11.0   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 77449    
Attachments: journalctl

Description Christoph Brill 2015-10-31 15:02:27 UTC
Shadow of Mordor locks up when entering the game. The following can be found in journalctl:

Okt 31 15:53:48 hostname kernel: radeon 0000:01:00.0: ring 0 stalled for more than 10493msec
Okt 31 15:53:48 hostname kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000017f8e last fence id 0x0000000000017fa0 on ring 0)
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0: Saved 567 dwords of commands on ring 0.
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0: GPU softreset: 0x00000019
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   GRBM_STATUS               = 0xA00309A0
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000001
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   SRBM_STATUS               = 0x20000AC0
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x01000000
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00011000
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00028506
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80838647
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Okt 31 15:53:49 hostname kernel: [drm] PCIE gen 2 link speeds already enabled
Okt 31 15:53:49 hostname kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000000025E000).
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0: WB enabled
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff8800cf42cc00
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff8800cf42cc0c
Okt 31 15:53:49 hostname kernel: radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0xffffc90001c1c418
Okt 31 15:53:49 hostname kernel: [drm] ring test on 0 succeeded in 1 usecs
Okt 31 15:53:49 hostname kernel: [drm] ring test on 3 succeeded in 2 usecs
Okt 31 15:53:50 hostname kernel: [drm] ring test on 5 succeeded in 1 usecs
Okt 31 15:53:50 hostname kernel: [drm] UVD initialized successfully.
Okt 31 15:53:50 hostname kernel: ------------[ cut here ]------------
Okt 31 15:53:50 hostname kernel: WARNING: CPU: 3 PID: 124 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x66/0x80()
Okt 31 15:53:50 hostname kernel: sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:02.0/0000:01:00.0/power_dpm_state'
Okt 31 15:53:50 hostname kernel: Modules linked in: bnep bluetooth rfkill fuse hwmon_vid ext4 crc16 mbcache jbd2 gspca_ov519 gspca_main joydev snd_hda_codec_realtek mousedev snd_hda_codec_hdmi videodev media input_leds led_class wacom snd_hda_codec_generic snd_hda_intel evdev snd_hda_codec kvm_amd snd_hda_core mac_hid kvm psmouse snd_hwdep snd_pcm r8169 edac_core pcspkr serio_raw edac_mce_amd snd_timer mii k10temp sp5100_tco snd wmi acpi_cpufreq asus_atk0110 i2c_piix4 button shpchp soundcore processor sch_fq_codel vboxnetflt(O) pci_stub vboxpci(O) vboxdrv(O) ip_tables x_tables crc32c_generic btrfs xor hid_generic raid6_pq usbhid hid sr_mod cdrom sd_mod ata_generic pata_acpi atkbd libps2 pata_atiixp ahci libahci firewire_ohci pata_via ehci_pci xhci_pci ohci_pci ohci_hcd xhci_hcd ehci_hcd libata usbcore usb_common scsi_mod
Okt 31 15:53:50 hostname kernel:  firewire_core crc_itu_t i8042 serio radeon i2c_algo_bit drm_kms_helper ttm drm
Okt 31 15:53:50 hostname kernel: CPU: 3 PID: 124 Comm: kworker/u12:3 Tainted: G           O    4.2.5-1-ARCH #1
Okt 31 15:53:50 hostname kernel: Hardware name: System manufacturer System Product Name/M4A88TD-V EVO/USB3, BIOS 1601    09/08/2010
Okt 31 15:53:50 hostname kernel: Workqueue: radeon-crtc radeon_flip_work_func [radeon]
Okt 31 15:53:50 hostname kernel:  0000000000000000 00000000d93e9885 ffff8800cf75fb58 ffffffff81570d0a
Okt 31 15:53:50 hostname kernel:  0000000000000000 ffff8800cf75fbb0 ffff8800cf75fb98 ffffffff810748a6
Okt 31 15:53:50 hostname kernel:  000059d6cf75fb98 ffff88010045c000 ffffffffa01ee5a8 ffff880225d15960
Okt 31 15:53:50 hostname kernel: Call Trace:
Okt 31 15:53:50 hostname kernel:  [<ffffffff81570d0a>] dump_stack+0x4c/0x6e
Okt 31 15:53:50 hostname kernel:  [<ffffffff810748a6>] warn_slowpath_common+0x86/0xc0
Okt 31 15:53:50 hostname kernel:  [<ffffffff81074935>] warn_slowpath_fmt+0x55/0x70
Okt 31 15:53:50 hostname kernel:  [<ffffffff81245808>] ? kernfs_path+0x48/0x60
Okt 31 15:53:50 hostname kernel:  [<ffffffff81248df6>] sysfs_warn_dup+0x66/0x80
Okt 31 15:53:50 hostname kernel:  [<ffffffff81248b07>] sysfs_add_file_mode_ns+0x127/0x180
Okt 31 15:53:50 hostname kernel:  [<ffffffff81248b8a>] sysfs_create_file_ns+0x2a/0x30
Okt 31 15:53:50 hostname kernel:  [<ffffffff813cf532>] device_create_file+0x42/0x90
Okt 31 15:53:50 hostname kernel:  [<ffffffffa011d976>] radeon_pm_late_init+0x76/0x1a0 [radeon]
Okt 31 15:53:50 hostname kernel:  [<ffffffffa00b9e1a>] radeon_gpu_reset+0x26a/0x330 [radeon]
Okt 31 15:53:50 hostname kernel:  [<ffffffff810b4c80>] ? wake_atomic_t_function+0x60/0x60
Okt 31 15:53:50 hostname kernel:  [<ffffffffa00e1190>] radeon_flip_work_func+0x130/0x170 [radeon]
Okt 31 15:53:50 hostname kernel:  [<ffffffff8108c5db>] process_one_work+0x14b/0x440
Okt 31 15:53:50 hostname kernel:  [<ffffffff8108c918>] worker_thread+0x48/0x4a0
Okt 31 15:53:50 hostname kernel:  [<ffffffff8108c8d0>] ? process_one_work+0x440/0x440
Okt 31 15:53:50 hostname kernel:  [<ffffffff8108c8d0>] ? process_one_work+0x440/0x440
Okt 31 15:53:50 hostname kernel:  [<ffffffff81092578>] kthread+0xd8/0xf0
Okt 31 15:53:50 hostname kernel:  [<ffffffff810924a0>] ? kthread_worker_fn+0x170/0x170
Okt 31 15:53:50 hostname kernel:  [<ffffffff8157665f>] ret_from_fork+0x3f/0x70
Okt 31 15:53:50 hostname kernel:  [<ffffffff810924a0>] ? kthread_worker_fn+0x170/0x170
Okt 31 15:53:50 hostname kernel: ---[ end trace 9a616c8d3f7ec6d9 ]---
Okt 31 15:53:50 hostname kernel: [drm:radeon_pm_late_init [radeon]] *ERROR* failed to create device file for dpm state
Okt 31 15:53:50 hostname kernel: ------------[ cut here ]------------
Comment 1 Christoph Brill 2015-10-31 15:03:33 UTC
Created attachment 119314 [details]
journalctl

journalctl as attachment
Comment 2 Christoph Brill 2015-11-18 19:08:43 UTC
Uploaded an apitrace to:

https://drive.google.com/file/d/0BxAIK4wsKm_YS1dWT0FGd2hVX00/view?usp=sharing

Not sure if the trace is complete due to the GPU lockup.
Comment 3 Michel Dänzer 2015-11-19 01:11:55 UTC
Does replaying the apitrace reproduce the lockup?
Comment 4 Christoph Brill 2015-11-19 20:39:33 UTC
(In reply to Michel Dänzer from comment #3)
> Does replaying the apitrace reproduce the lockup?

Yes, it reproducibly locks the GPU.
Comment 5 vitor.hda 2017-10-08 21:16:14 UTC
I think I have a similar problem in R9 280X.
My lockups don't result in a kernel panic and the system simply stops, but I also have the "ring 0/3 stalled for more than Xmsec" messages prior to the lockup.
Eventually I also end up with the following messages:

[drm:atom_op_jump [radeon]] *ERROR* atombios stuck in loop for more than 5secs aborting
[drm:atom_execute_table_locked [radeon]] *ERROR* atombios stuck executing CFB6 (len 62, WS 0, PS 0) @ 0xCFD2
[drm:atom_op_jump [radeon]] *ERROR* atombios stuck in loop for more than 5secs aborting
[drm:atom_execute_table_locked [radeon]] *ERROR* atombios stuck executing CFB6 (len 62, WS 0, PS 0) @ 0xCFD2

Please ask for more information and I'll abide.

Note: currently on Linux 4.13 and using latest Padoka mesa packages.
Comment 6 Julien Isorce 2017-10-16 13:19:50 UTC
(In reply to vitor.hda from comment #5)
> similar problem in R9 280X, I have the "ring 0/3 stalled 
Was it with the attached apitrace ? Otherwise could you generate one (apitrace trace, see https://github.com/apitrace/apitrace/blob/master/docs/USAGE.markdown)? 

> Note: currently on Linux 4.13 and using latest Padoka mesa packages.

Was it with Padoka stable or unstable ? ( see note https://launchpad.net/~paulo-miguel-dias/+archive/ubuntu/mesa ). They use different llvm version (5 vs 6).

With the apitrace in comment #2, there was no issue on a Cap Verde AMD W600 with https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-4.13 and latest git mesa and latest git llvm (which is 6+)
Comment 7 Timothy Arceri 2019-07-04 01:19:55 UTC
I assume this has been fixed by now? Can we close this bug?
Comment 8 GitLab Migration User 2019-09-18 19:19:50 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/558.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.