Bug 26887

Summary: fence errors with rs785 and kernel 2.6.33
Product: DRI Reporter: Marc Dietrich <marvin24>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: critical    
Priority: medium CC: marvin24
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
full dmesg output
none
backported patch none

Description Marc Dietrich 2010-03-04 07:52:43 UTC
I'm getting fence errors on RS785 (radeon HD 4200) in dmesg and disabled dri with vanilla 2.6.33. Error below and full dmesg attached.

[    6.179288] [drm] Initialized drm 1.1.0 20060810
[    6.647815] [drm] radeon defaulting to kernel modesetting.
[    6.656630] [drm] radeon kernel modesetting enabled.
[    6.665470] radeon 0000:01:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[    6.674214] radeon 0000:01:05.0: setting latency timer to 64
[    6.675384] [drm] radeon: Initializing kernel modesetting.
[    6.692803] [drm] register mmio base: 0xFE9F0000
[    6.701275] [drm] register mmio size: 65536
[    6.709638] HDA Intel 0000:00:14.2: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    6.718524] ATOM BIOS: 113
[    6.726772] [drm] Clocks initialized !
[    6.736928] [drm] Detected VRAM RAM=128M, BAR=128M
[    6.737990] [drm] RAM width 32bits DDR
[    6.739095] [TTM] Zone  kernel: Available graphics memory: 2029616 kiB.
[    6.740164] [drm] radeon: 128M of VRAM memory ready
[    6.743646] [drm] radeon: 512M of GTT memory ready.
[    6.744714] [drm] radeon: irq initialized.
[    6.745728] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    6.747053] [drm] Loading RS780 Microcode
[    6.748054] platform radeon_cp.0: firmware: requesting radeon/RS780_pfp.bin
[    6.767394] hda-codec: No codec parser is available
[    6.788131]   alloc irq_desc for 20 on node 0
[    6.788133]   alloc kstat_irqs on node 0
[    6.788139] EMU10K1_Audigy 0000:03:05.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[    6.880352] platform radeon_cp.0: firmware: requesting radeon/RS780_me.bin
[    6.964294] platform radeon_cp.0: firmware: requesting radeon/R600_rlc.bin
[    7.059944] [drm] ring test succeeded in 1 usecs
[    7.061030] [drm] radeon: ib pool ready.
[    9.582117] [drm:radeon_fence_wait] *ERROR* fence(ffff88011e2403c0:0x00000001) 510ms timeout going to reset GPU
[    9.583167] radeon 0000:01:05.0: GPU softreset 
[    9.584211] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xA0003030
[    9.585249] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
[    9.586278] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20002040
[    9.792461] radeon 0000:01:05.0: Wait for MC idle timedout !
[    9.793472] radeon 0000:01:05.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
[    9.794533] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001
[    9.795584] radeon 0000:01:05.0:   R_000E60_SRBM_SOFT_RESET=0x00000C02
[    9.796726] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0x00003030
[    9.797704] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
[    9.798674] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20000040
[    9.801112] [drm:radeon_fence_wait] *ERROR* fence(ffff88011e2403c0:0x00000001) 739ms timeout
[    9.802083] [drm:radeon_fence_wait] *ERROR* last signaled fence(0x00000001)
[   10.008268] [drm:r600_ib_test] *ERROR* radeon: ib test failed (sracth(0x8504)=0xCAFEDEAD)
[   10.009301] radeon 0000:01:05.0: IB test failed (-22).
[   10.010248] [drm] Enabling audio support
[   10.010428] [drm] Radeon Display Connectors
[   10.012283] [drm] Connector 0:
[   10.013201] [drm]   VGA
[   10.014107] [drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
[   10.015027] [drm]   Encoders:
[   10.015932] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[   10.016849] [drm] Connector 1:
[   10.017756] [drm]   DVI-D
[   10.018655] [drm]   HPD1
[   10.019549] [drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
[   10.020451] [drm]   Encoders:
[   10.021340] [drm]     DFP3: INTERNAL_KLDSCP_LVTMA
[   10.206070] [drm] fb mappable at 0xF0141000
[   10.206943] [drm] vram apper at 0xF0000000
[   10.207820] [drm] size 5242880
[   10.208682] [drm] fb depth is 24
[   10.209530] [drm]    pitch is 5120
[   10.210447] fb: conflicting fb hw usage radeondrmfb vs VESA VGA - removing generic driver
[   10.211318] Console: switching to colour dummy device 80x25
[   10.211436] Console: switching to colour frame buffer device 160x64
[   10.217744] fb0: radeondrmfb frame buffer device
[   10.217768] registered panic notifier
[   10.217789] [drm] Initialized radeon 2.0.0 20080528 for 0000:01:05.0 on minor 0
Comment 1 Marc Dietrich 2010-03-04 07:53:12 UTC
Created attachment 33759 [details]
full dmesg output
Comment 2 Jerome Glisse 2010-03-05 03:07:44 UTC
Does this happen all the time ?
Comment 3 Marc Dietrich 2010-03-05 03:41:22 UTC
yes - could this be sideport related?
Comment 4 Alex Deucher 2010-03-05 07:37:28 UTC
(In reply to comment #3)
> yes - could this be sideport related?
> 

Not likely.
Comment 5 Marc Dietrich 2010-03-08 00:57:40 UTC
I tried to bisect this and found that 2.6.32 also has this issue (I got this system a few weeks ago only). 2.6.31 shows "ring test failed" and I guess support for rs785 was not added earlier. So this chip seems to never have worked with KMS. 
Comment 6 Marc Dietrich 2010-03-10 01:27:35 UTC
I also tested the kernel from fredora 13a to see if I have a problem with my config, but it also shows fence errors. Other failed tests: Sideport -> UMA, limit memory from 4G to 2G. As this bug happens on a released kernel and also  crashes X sometimes, I changed the severity to critical.
Comment 7 Marc Dietrich 2010-03-12 12:11:32 UTC
tried with nosmp, mem=2G (out of 4) and NO_HZ, NO_PREEMPT - no change. below is the log with glisse drm-radeon-next tree (grr - again slow chip clock default):


[    7.940041] [drm] Initialized drm 1.1.0 20060810
[    8.479532] [drm] radeon defaulting to kernel modesetting.
[    8.482583] [drm] radeon kernel modesetting enabled.
[    8.492371] radeon 0000:01:05.0: PCI INT A -> Link[LNKC] -> GSI 10 (level, low) -> IRQ 10
[    8.495381] radeon 0000:01:05.0: setting latency timer to 64
[    8.496445] [drm] radeon: Initializing kernel modesetting.
[    8.499493] [drm] register mmio base: 0xFE9F0000
[    8.502432] [drm] register mmio size: 65536
[    8.505857] ATOM BIOS: 113
[    8.508694] [drm] Clocks initialized !
[    8.511485] [drm] 3 Power State(s)
[    8.514250] [drm] State 0 Default (default)
[    8.517008] [drm]    1 Clock Mode(s)
[    8.519743] [drm]            0 engine: 300000
[    8.522456] [drm] State 1 Performance 
[    8.525133] [drm]    1 Clock Mode(s)
[    8.527734] [drm]            0 engine: 200000
[    8.530262] [drm] State 2 Default 
[    8.532758] [drm]    1 Clock Mode(s)
[    8.535239] [drm]            0 engine: 500000
[    8.537706] [drm] radeon: power management initialized
[    8.540196] radeon 0000:01:05.0: VRAM: 128M 0xC0000000 - 0xC7FFFFFF (128M used)
[    8.542724] radeon 0000:01:05.0: GTT: 512M 0xA0000000 - 0xBFFFFFFF
[    8.545807] [drm] Detected VRAM RAM=128M, BAR=128M
[    8.546508] [drm] RAM width 32bits DDR
[    8.550749] [TTM] Zone  kernel: Available graphics memory: 1029488 kiB.
[    8.551439] [drm] radeon: 128M of VRAM memory ready
[    8.552113] [drm] radeon: 512M of GTT memory ready.
[    8.552785] [drm] radeon: irq initialized.
[    8.553447] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    8.554599] [drm] Loading RS780 Microcode
[    8.555252] platform radeon_cp.0: firmware: requesting radeon/RS780_pfp.bin
[    8.615296] platform radeon_cp.0: firmware: requesting radeon/RS780_me.bin
[    8.643581] platform radeon_cp.0: firmware: requesting radeon/R600_rlc.bin
[    8.688448] [drm] ring test succeeded in 1 usecs
[    8.689138] [drm] radeon: ib pool ready.
[   14.190109] radeon 0000:01:05.0: GPU lockup CP stall for more than 1000msec
[   14.190738] ------------[ cut here ]------------
[   14.191396] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:234 radeon_fence_wait+0x35d/0x3c0 [radeon]()
[   14.192044] Hardware name: System Product Name
[   14.192671] GPU lockup (waiting for 0x00000001 last fence id 0x00000000)
[   14.193314] Modules linked in: snd_hda_intel(+) radeon(+) snd_emu10k1 snd_rawmidi ttm snd_hda_codec snd_ac97_codec ac97_bus drm_kms_helper snd_pcm snd_seq_device drm snd_util_mem amd64_edac_mod emu10k1_gp snd_timer i2c_algo_bit snd_hwdep firewire_ohci snd kobil_sct edac_core firewire_core shpchp gameport crc_itu_t asus_atk0110 pcspkr soundcore snd_page_alloc button usbserial k10temp edac_mce_amd i2c_piix4 pci_hotplug sr_mod sg cdrom sd_mod ahci fan processor pata_atiixp libata scsi_mod thermal thermal_sys
[   14.196212] Pid: 691, comm: work_for_cpu Not tainted 2.6.33 #3
[   14.196888] Call Trace:
[   14.197575]  [<ffffffff810466a8>] warn_slowpath_common+0x78/0xb0
[   14.198260]  [<ffffffff8104673c>] warn_slowpath_fmt+0x3c/0x40
[   14.198937]  [<ffffffffa033a5fd>] radeon_fence_wait+0x35d/0x3c0 [radeon]
[   14.199616]  [<ffffffff81064070>] ? autoremove_wake_function+0x0/0x40
[   14.200299]  [<ffffffffa0375569>] r600_ib_test+0x189/0x300 [radeon]
[   14.200961]  [<ffffffffa037d6e0>] r600_init+0x2e0/0x360 [radeon]
[   14.201627]  [<ffffffffa03293ad>] radeon_device_init+0x29d/0x370 [radeon]
[   14.202297]  [<ffffffffa032a1ee>] radeon_driver_load_kms+0x9e/0x1d0 [radeon]
[   14.202945]  [<ffffffffa020140e>] drm_get_dev+0x34e/0x560 [drm]
[   14.203593]  [<ffffffff8103c86d>] ? default_wake_function+0xd/0x10
[   14.204227]  [<ffffffff8105f7f0>] ? do_work_for_cpu+0x0/0x30
[   14.204851]  [<ffffffffa0397012>] radeon_pci_probe+0x10/0x270 [radeon]
[   14.205479]  [<ffffffff81225d72>] local_pci_probe+0x12/0x20
[   14.206100]  [<ffffffff8105f803>] do_work_for_cpu+0x13/0x30
[   14.206704]  [<ffffffff81063b7e>] kthread+0x8e/0xa0
[   14.207314]  [<ffffffff81003b94>] kernel_thread_helper+0x4/0x10
[   14.207902]  [<ffffffff81063af0>] ? kthread+0x0/0xa0
[   14.208503]  [<ffffffff81003b90>] ? kernel_thread_helper+0x0/0x10
[   14.209100] ---[ end trace 48fab13bc7a5b259 ]---
[   14.209681] [drm] Disabling audio support
[   14.209708] radeon 0000:01:05.0: GPU softreset 
[   14.210855] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xA0003030
[   14.211443] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
[   14.212028] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20002040
[   14.339731] radeon 0000:01:05.0: Wait for MC idle timedout !
[   14.340318] radeon 0000:01:05.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
[   14.355896] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001
[   14.372490] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xA0003030
[   14.373084] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
[   14.373665] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x2000B040
[   14.375260] radeon 0000:01:05.0: GPU reset succeed
[   14.392243] [drm] Clocks initialized !
[   14.519942] radeon 0000:01:05.0: Wait for MC idle timedout !
[   14.647642] radeon 0000:01:05.0: Wait for MC idle timedout !
[   14.811733] [drm:r600_ring_test] *ERROR* radeon: ring test failed (scratch(0x8508)=0xCAFEDEAD)
[   14.812345] [drm:r600_resume] *ERROR* r600 startup failed on resume
[   14.812949] BUG: unable to handle kernel NULL pointer dereference at (null)
[   14.813336] IP: [<ffffffffa02668a4>] drm_helper_resume_force_mode+0x34/0x240 [drm_kms_helper]
[   14.813336] PGD 37d25067 PUD 37dd7067 PMD 0 
[   14.813336] Oops: 0000 [#1] SMP 
[   14.813336] last sysfs file: /sys/module/snd_hda_intel/initstate
[   14.813336] CPU 0 
[   14.813336] Pid: 691, comm: work_for_cpu Tainted: G        W  2.6.33 #3 M4A785TD-V EVO/System Product Name
[   14.813336] RIP: 0010:[<ffffffffa02668a4>]  [<ffffffffa02668a4>] drm_helper_resume_force_mode+0x34/0x240 [drm_kms_helper]
[   14.813336] RSP: 0018:ffff88007e7cdc40  EFLAGS: 00010293
[   14.813336] RAX: 0000000000000020 RBX: fffffffffffffff8 RCX: ffffc90011861740
[   14.813336] RDX: 0000000000001740 RSI: 00000000411a0015 RDI: ffff88007e7eb800
[   14.813336] RBP: ffff88007e7cdc70 R08: 0000000000001724 R09: 0000000000000000
[   14.813336] R10: 000000000000028d R11: 0000000000000000 R12: ffff88007e7ebca0
[   14.813336] R13: ffff88007e7eb800 R14: ffff88007e7ebcb8 R15: ffff88007ed8e930
[   14.813336] FS:  00007f42af12d790(0000) GS:ffff880001c00000(0000) knlGS:0000000000000000
[   14.813336] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   14.813336] CR2: 0000000000000000 CR3: 0000000037b64000 CR4: 00000000000006f0
[   14.813336] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   14.813336] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   14.813336] Process work_for_cpu (pid: 691, threadinfo ffff88007e7cc000, task ffff88007e8f2540)
[   14.813336] Stack:
[   14.813336]  0000000000000007 ffff88007ed8e000 0000000000000000 ffff88007f4fc340
[   14.813336] <0> 0000000000000000 ffff88007ed8e930 ffff88007e7cdca0 ffffffffa0328dee
[   14.813336] <0> ffff88007e7cdca0 ffff88007e02fbc0 ffff88007ed8e000 ffff88007e7cdcf0
[   14.813336] Call Trace:
[   14.813336]  [<ffffffffa0328dee>] radeon_gpu_reset+0xae/0xb0 [radeon]
[   14.813336]  [<ffffffffa033a62f>] radeon_fence_wait+0x38f/0x3c0 [radeon]
[   14.813336]  [<ffffffff81064070>] ? autoremove_wake_function+0x0/0x40
[   14.813336]  [<ffffffffa0375569>] r600_ib_test+0x189/0x300 [radeon]
[   14.813336]  [<ffffffffa037d6e0>] r600_init+0x2e0/0x360 [radeon]
[   14.813336]  [<ffffffffa03293ad>] radeon_device_init+0x29d/0x370 [radeon]
[   14.813336]  [<ffffffffa032a1ee>] radeon_driver_load_kms+0x9e/0x1d0 [radeon]
[   14.813336]  [<ffffffffa020140e>] drm_get_dev+0x34e/0x560 [drm]
[   14.813336]  [<ffffffff8103c86d>] ? default_wake_function+0xd/0x10
[   14.813336]  [<ffffffff8105f7f0>] ? do_work_for_cpu+0x0/0x30
[   14.813336]  [<ffffffffa0397012>] radeon_pci_probe+0x10/0x270 [radeon]
[   14.813336]  [<ffffffff81225d72>] local_pci_probe+0x12/0x20
[   14.813336]  [<ffffffff8105f803>] do_work_for_cpu+0x13/0x30
[   14.813336]  [<ffffffff81063b7e>] kthread+0x8e/0xa0
[   14.813336]  [<ffffffff81003b94>] kernel_thread_helper+0x4/0x10
[   14.813336]  [<ffffffff81063af0>] ? kthread+0x0/0xa0
[   14.813336]  [<ffffffff81003b90>] ? kernel_thread_helper+0x0/0x10
[   14.813336] Code: 8d b7 b8 04 00 00 41 55 49 89 fd 41 54 4c 8d a7 a0 04 00 00 53 48 83 ec 08 48 8b 9f b8 04 00 00 48 83 eb 08 eb 05 90 48 8d 58 f8 <48> 8b 43 08 48 8d 53 08 49 39 d6 0f 18 08 0f 84 b8 01 00 00 80 
[   14.813336] RIP  [<ffffffffa02668a4>] drm_helper_resume_force_mode+0x34/0x240 [drm_kms_helper]
[   14.813336]  RSP <ffff88007e7cdc40>
[   14.813336] CR2: 0000000000000000
[   14.840412] ---[ end trace 48fab13bc7a5b25a ]---
Comment 8 Marc Dietrich 2010-03-12 12:40:44 UTC
ok - turned out that the oops where pm related. When started with radeon.{dynpm,dynclks}=0 everything works fine!
Unfortunately, I cannot test the GPU reset patches alone, as they do not apply to 2.6.33. Jérôme, could you please supply something relative to 2.6.33?

Thanks!
Comment 9 Marc Dietrich 2010-03-13 08:52:17 UTC
Created attachment 34022 [details] [review]
backported patch

This bug report looks like a soliloquy. Anyway, I backported "drm/radeon/kms: fence cleanup + more reliable GPU lockup detection V4" myself to 2.6.33 and it fixes this problem. Can this be forwarded to upstream->stable?
Comment 10 Chris Sherlock 2011-02-04 19:13:58 UTC
Is it possible that bug 32662 is related to this one?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.