Bug 60254

Summary:	[r600g] kernel Oops when provoking GPU lock.
Product:	DRI	Reporter:	Andy Furniss <adf.lists>
Component:	DRM/Radeon	Assignee:	Default DRI bug account <dri-devel>
Status:	RESOLVED INVALID	QA Contact:
Severity:	normal
Priority:	medium	CC:	johannes.hirte
Version:	XOrg git
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:

Description Andy Furniss 2013-02-03 21:43:53 UTC

The GPU lock with Rv670 and openarena is nothing new - it seems to have been a feature for almost a year (I haven't used rv670 for most of that time).

On noticing the new gpu reset code in drm-next-3.9-wip I decided to provoke it on my AGP box and got -

Feb  3 20:46:56 nf7 kernel: radeon 0000:02:00.0: GPU lockup CP stall for more than 10000msec
Feb  3 20:46:56 nf7 kernel: radeon 0000:02:00.0: GPU lockup (waiting for 0x00000000000111c8 last fence id 0x00000000000111c6)
Feb  3 20:46:56 nf7 kernel: radeon 0000:02:00.0: f51df600 unpin not necessary
Feb  3 20:46:56 nf7 kernel: radeon 0000:02:00.0: Saved 217 dwords of commands on ring 0.
Feb  3 20:46:56 nf7 kernel: BUG: unable to handle kernel paging request at f87aec0c
Feb  3 20:46:56 nf7 kernel: IP: [<fa033a3e>] radeon_fence_process+0x7e/0x160 [radeon]
Feb  3 20:46:56 nf7 kernel: *pde = 3702f067 *pte = 00000000 
Feb  3 20:46:56 nf7 kernel: Oops: 0000 [#1] PREEMPT 
Feb  3 20:46:56 nf7 kernel: Modules linked in: radeon fbcon font bitblit ttm softcursor drm_kms_helper drm fb fbdev i2c_algo_bit i2c_core cfbcopyarea cfbimgblt cfbfillrect ehci_pci ehci_hcd nvidia_agp agpgart ohci_hcd usbhid usbcore usb_common snd_intel8x0 snd_ac97_codec ac97_bus forcedeth
Feb  3 20:46:56 nf7 kernel: Pid: 2511, comm: openarena.i386 Not tainted 3.8.0-rc3-gc7fb5ff #1    /NF7-S/NF7 (nVidia-nForce2)
Feb  3 20:46:56 nf7 kernel: EIP: 0060:[<fa033a3e>] EFLAGS: 00210246 CPU: 0
Feb  3 20:46:56 nf7 kernel: EIP is at radeon_fence_process+0x7e/0x160 [radeon]
Feb  3 20:46:56 nf7 kernel: EAX: f87aec0c EBX: f73b2848 ECX: f73b2848 EDX: 00000000
Feb  3 20:46:56 nf7 kernel: ESI: 00000000 EDI: f73b2000 EBP: e4765d8c ESP: e4765d58
Feb  3 20:46:56 nf7 kernel:  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Feb  3 20:46:56 nf7 kernel: CR0: 8005003b CR2: f87aec0c CR3: 1e75f000 CR4: 000007d0
Feb  3 20:46:56 nf7 kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Feb  3 20:46:56 nf7 kernel: DR6: ffff0ff0 DR7: 00000400
Feb  3 20:46:56 nf7 kernel: Process openarena.i386 (pid: 2511, ti=e4764000 task=f70ad2b0 task.ti=e4764000)
Feb  3 20:46:56 nf7 kernel: Stack:
Feb  3 20:46:56 nf7 kernel:  c13d4e58 00765d7c 00000018 f73b2848 f73b2888 00000872 00000003 00000872
Feb  3 20:46:56 nf7 kernel:  00000000 0000000c 00000003 f73b2a5c e4765e0c e4765da4 fa034850 f73b2000
Feb  3 20:46:56 nf7 kernel:  f73b2000 f73b2a5c e4765e0c e4765dc4 fa049bb8 f73b28e8 00000000 c12b06cd
Feb  3 20:46:56 nf7 kernel: Call Trace:
Feb  3 20:46:56 nf7 kernel:  [<c13d4e58>] ? sub_preempt_count+0x8/0x80
Feb  3 20:46:56 nf7 kernel:  [<fa034850>] radeon_fence_count_emitted+0x20/0x90 [radeon]
Feb  3 20:46:56 nf7 kernel:  [<fa049bb8>] radeon_ring_backup+0x38/0x100 [radeon]
Feb  3 20:46:56 nf7 kernel:  [<c12b06cd>] ? _dev_info+0x2d/0x30
Feb  3 20:46:56 nf7 kernel:  [<fa020b22>] radeon_gpu_reset+0x62/0x1d0 [radeon]
Feb  3 20:46:56 nf7 kernel:  [<f8be99c6>] ? ttm_bo_unreserve+0x26/0x50 [ttm]
Feb  3 20:46:56 nf7 kernel:  [<fa04821d>] radeon_gem_handle_lockup.part.2+0xd/0x20 [radeon]
Feb  3 20:46:56 nf7 kernel:  [<fa048b13>] radeon_gem_wait_idle_ioctl+0xb3/0xd0 [radeon]
Feb  3 20:46:56 nf7 kernel:  [<fa048a60>] ? radeon_gem_busy_ioctl+0xf0/0xf0 [radeon]
Feb  3 20:46:56 nf7 kernel:  [<f8af9e82>] drm_ioctl+0x402/0x460 [drm]
Feb  3 20:46:56 nf7 kernel:  [<fa048a60>] ? radeon_gem_busy_ioctl+0xf0/0xf0 [radeon]
Feb  3 20:46:56 nf7 kernel:  [<c104f469>] ? ktime_add_safe+0x9/0x60
Feb  3 20:46:56 nf7 kernel:  [<c104fe80>] ? hrtimer_forward+0xa0/0x190
Feb  3 20:46:56 nf7 kernel:  [<f8af9a80>] ? drm_copy_field+0x80/0x80 [drm]
Feb  3 20:46:56 nf7 kernel:  [<c10eb65a>] do_vfs_ioctl+0x7a/0x590
Feb  3 20:46:56 nf7 kernel:  [<c101ef0b>] ? lapic_next_event+0x1b/0x20
Feb  3 20:46:56 nf7 kernel:  [<c106978d>] ? clockevents_program_event+0x9d/0x150
Feb  3 20:46:56 nf7 kernel:  [<c106ab18>] ? tick_program_event+0x28/0x30
Feb  3 20:46:56 nf7 kernel:  [<c1050612>] ? hrtimer_interrupt+0x182/0x2f0
Feb  3 20:46:56 nf7 kernel:  [<c13d4e58>] ? sub_preempt_count+0x8/0x80
Feb  3 20:46:56 nf7 kernel:  [<c1048cf9>] ? __rcu_read_unlock+0x9/0x60
Feb  3 20:46:56 nf7 kernel:  [<c10f4bc7>] ? fget_light+0x77/0xd0
Feb  3 20:46:56 nf7 kernel:  [<c10ebbac>] sys_ioctl+0x3c/0x70
Feb  3 20:46:56 nf7 kernel:  [<c13d7ffa>] sysenter_do_call+0x12/0x22
Feb  3 20:46:56 nf7 kernel: Code: 8b 4d d4 8d 84 51 f0 00 00 00 8b 74 c7 08 89 75 e0 8b 74 c7 0c 8b 45 d8 83 c0 08 80 bf ec 0d 00 00 00 0f 84 bf 00 00 00 8b 40 0c <8b> 00 39 45 e8 8b 4d ec 89 c3 ba 01 00 00 00 0f 47 ce 39 f1 77
Feb  3 20:46:56 nf7 kernel: EIP: [<fa033a3e>] radeon_fence_process+0x7e/0x160 [radeon] SS:ESP 0068:e4765d58
Feb  3 20:46:56 nf7 kernel: CR2: 00000000f87aec0c
Feb  3 20:46:56 nf7 kernel: ---[ end trace 457588a7cbc40235 ]---

Comment 1 Andy Furniss 2013-02-03 21:57:30 UTC

(In reply to comment #0)
> The GPU lock with Rv670 and openarena is nothing new - it seems to have been
> a feature for almost a year (I haven't used rv670 for most of that time).
> 
> On noticing the new gpu reset code in drm-next-3.9-wip I decided to provoke
> it on my AGP box and got -

Hmm I just managed to get the same running drm-fixes so it's not wip maybe it's because I am now using llvm. In the (no llvm) past with other kernels this GPU lock normally went quite well - the game often just continued for a while, until it hit another one.

Comment 2 Andy Furniss 2013-02-04 11:19:25 UTC

(In reply to comment #1)
> (In reply to comment #0)
> > The GPU lock with Rv670 and openarena is nothing new - it seems to have been
> > a feature for almost a year (I haven't used rv670 for most of that time).
> > 
> > On noticing the new gpu reset code in drm-next-3.9-wip I decided to provoke
> > it on my AGP box and got -
> 
> Hmm I just managed to get the same running drm-fixes so it's not wip maybe
> it's because I am now using llvm. In the (no llvm) past with other kernels
> this GPU lock normally went quite well - the game often just continued for a
> while, until it hit another one.

It's nothing to do with llvm seems like it's a feature of more recent kernels.

Comment 3 Johannes Hirte 2014-01-06 14:13:05 UTC

I've observed this too, and it feels like it got worse within 3.13 development process. This is the last one in the logs:

Jan  6 14:54:08 localhost kernel: radeon 0000:01:00.0: GPU lockup CP stall for more than 10034msec
Jan  6 14:54:08 localhost kernel: radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000003dde last fence id 0x0000000000003ddd on ring 0)
Jan  6 14:54:08 localhost kernel: [drm:rv770_stop_dpm] *ERROR* Could not force DPM to low.
Jan  6 14:54:08 localhost kernel: [drm] Disabling audio 0 support
Jan  6 14:54:08 localhost kernel: BUG: unable to handle kernel paging request at ffffc90402080ffc
Jan  6 14:54:08 localhost kernel: IP: [<ffffffff813d8d9e>] radeon_ring_backup+0xbe/0x140
Jan  6 14:54:08 localhost kernel: PGD 11b028067 PUD 0
Jan  6 14:54:08 localhost kernel: Oops: 0000 [#1] PREEMPT SMP
Jan  6 14:54:08 localhost kernel: Modules linked in: nfs lockd sunrpc snd_hda_codec_hdmi snd_hda_codec_realtek ath9k snd_hda_intel ath9k_common snd_hda_codec ath9k_hw snd_hwdep ath snd_pcm mac80211 snd_timer acer_wmi broadcom cfg80211 snd i2c_piix4 rfkill tg3 k10temp wmi soundcore sr_mod cdrom snd_page_alloc acpi_cpufreq ohci_pci ohci_hcd
Jan  6 14:54:08 localhost kernel: CPU: 1 PID: 2836 Comm: kwin Not tainted 3.13.0-rc7-00012-gf0a679a #183
Jan  6 14:54:08 localhost kernel: Hardware name: Packard Bell EasyNote TK81/SJV52_DN, BIOS V2.14 07/27/2011
Jan  6 14:54:08 localhost kernel: task: ffff8800a81a7800 ti: ffff8800a8346000 task.ti: ffff8800a8346000
Jan  6 14:54:08 localhost kernel: RIP: 0010:[<ffffffff813d8d9e>]  [<ffffffff813d8d9e>] radeon_ring_backup+0xbe/0x140
Jan  6 14:54:08 localhost kernel: RSP: 0018:ffff8800a8347ce8  EFLAGS: 00010246
Jan  6 14:54:08 localhost kernel: RAX: 0000000000000000 RBX: ffff88011a6d0f20 RCX: 0000000000000000
Jan  6 14:54:08 localhost kernel: RDX: 00000000000efc04 RSI: ffffc90402080ffc RDI: ffffea000015ffc0
Jan  6 14:54:08 localhost kernel: RBP: 00000000ffffffff R08: ffff880005700000 R09: 00000000fffffffa
Jan  6 14:54:08 localhost kernel: R10: 0000000000000008 R11: 0000000000000100 R12: ffff88011a6d0ef8
Jan  6 14:54:08 localhost kernel: R13: 000000000003bf01 R14: ffff8800a8347d50 R15: 0000000000000000
Jan  6 14:54:08 localhost kernel: FS:  00007f1f6a8717c0(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
Jan  6 14:54:08 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jan  6 14:54:08 localhost kernel: CR2: ffffc90402080ffc CR3: 00000000a8314000 CR4: 00000000000007e0
Jan  6 14:54:08 localhost kernel: Stack:
Jan  6 14:54:08 localhost kernel: ffffffff813c5875 ffff88011a6d0000 ffff88011a6d0f20 ffff8800a8347d50
Jan  6 14:54:08 localhost kernel: 0000000000000000 ffff88011a6d0018 ffffffff813aad3e ffff88011a6d0700
Jan  6 14:54:08 localhost kernel: 00000001a8347df8 ffff88011a6d0f20 0000000000000000 ffff88006d74e048
Jan  6 14:54:08 localhost kernel: Call Trace:
Jan  6 14:54:08 localhost kernel: [<ffffffff813c5875>] ? radeon_gart_table_vram_unpin+0x85/0x120
Jan  6 14:54:08 localhost kernel: [<ffffffff813aad3e>] ? radeon_gpu_reset+0xae/0x250
Jan  6 14:54:08 localhost kernel: [<ffffffff813c5233>] ? radeon_bo_wait+0xf3/0x150
Jan  6 14:54:08 localhost kernel: [<ffffffff813d6dc5>] ? radeon_gem_handle_lockup.part.6+0x5/0x10
Jan  6 14:54:08 localhost kernel: [<ffffffff813841a5>] ? drm_ioctl+0x485/0x580
Jan  6 14:54:08 localhost kernel: [<ffffffff810a51b5>] ? do_futex+0x105/0xc70
Jan  6 14:54:08 localhost kernel: [<ffffffff813a8975>] ? radeon_drm_ioctl+0x55/0xa0
Jan  6 14:54:08 localhost kernel: [<ffffffff8115d6b7>] ? do_vfs_ioctl+0x2c7/0x490
Jan  6 14:54:08 localhost kernel: [<ffffffff810a5d9c>] ? SyS_futex+0x7c/0x170
Jan  6 14:54:08 localhost kernel: [<ffffffff811671df>] ? fget_light+0x8f/0xf0
Jan  6 14:54:08 localhost kernel: [<ffffffff8115d920>] ? SyS_ioctl+0xa0/0xc0
Jan  6 14:54:08 localhost kernel: [<ffffffff81638862>] ? system_call_fastpath+0x16/0x1b
Jan  6 14:54:08 localhost kernel: Code: 49 89 06 74 78 41 8d 55 ff 49 89 c0 31 c9 48 8d 14 95 04 00 00 00 eb 08 0f 1f 44 00 00 4d 8b 06 48 8b 73 08 8d 45 01 48 8d 34 ae <8b> 36 41 89 34 08 23 43 64 48 83 c1 04 48 39 d1 89 c5 75 de 4c
Jan  6 14:54:08 localhost kernel: RIP  [<ffffffff813d8d9e>] radeon_ring_backup+0xbe/0x140
Jan  6 14:54:08 localhost kernel: RSP <ffff8800a8347ce8>
Jan  6 14:54:08 localhost kernel: CR2: ffffc90402080ffc
Jan  6 14:54:08 localhost kernel: ---[ end trace 3e2cca537a43e686 ]---

Hardware is a HD 5470 (ChipID = 0x68e0)

Google points me to several bugreports from Red Hat/Fedora, so this bug seems to be not uncommon.

Comment 4 Andy Furniss 2015-07-30 11:55:17 UTC

Old - no h/w to test so closing.

Johannes if you can still produce this with current kernels etc please re-open.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.