Bug 26887 - fence errors with rs785 and kernel 2.6.33
Summary: fence errors with rs785 and kernel 2.6.33
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-04 07:52 UTC by Marc Dietrich
Modified: 2017-04-28 09:02 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
full dmesg output (49.42 KB, text/plain)
2010-03-04 07:53 UTC, Marc Dietrich
no flags Details
backported patch (20.24 KB, patch)
2010-03-13 08:52 UTC, Marc Dietrich
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Dietrich 2010-03-04 07:52:43 UTC
I'm getting fence errors on RS785 (radeon HD 4200) in dmesg and disabled dri with vanilla 2.6.33. Error below and full dmesg attached.

[    6.179288] [drm] Initialized drm 1.1.0 20060810
[    6.647815] [drm] radeon defaulting to kernel modesetting.
[    6.656630] [drm] radeon kernel modesetting enabled.
[    6.665470] radeon 0000:01:05.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[    6.674214] radeon 0000:01:05.0: setting latency timer to 64
[    6.675384] [drm] radeon: Initializing kernel modesetting.
[    6.692803] [drm] register mmio base: 0xFE9F0000
[    6.701275] [drm] register mmio size: 65536
[    6.709638] HDA Intel 0000:00:14.2: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    6.718524] ATOM BIOS: 113
[    6.726772] [drm] Clocks initialized !
[    6.736928] [drm] Detected VRAM RAM=128M, BAR=128M
[    6.737990] [drm] RAM width 32bits DDR
[    6.739095] [TTM] Zone  kernel: Available graphics memory: 2029616 kiB.
[    6.740164] [drm] radeon: 128M of VRAM memory ready
[    6.743646] [drm] radeon: 512M of GTT memory ready.
[    6.744714] [drm] radeon: irq initialized.
[    6.745728] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    6.747053] [drm] Loading RS780 Microcode
[    6.748054] platform radeon_cp.0: firmware: requesting radeon/RS780_pfp.bin
[    6.767394] hda-codec: No codec parser is available
[    6.788131]   alloc irq_desc for 20 on node 0
[    6.788133]   alloc kstat_irqs on node 0
[    6.788139] EMU10K1_Audigy 0000:03:05.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[    6.880352] platform radeon_cp.0: firmware: requesting radeon/RS780_me.bin
[    6.964294] platform radeon_cp.0: firmware: requesting radeon/R600_rlc.bin
[    7.059944] [drm] ring test succeeded in 1 usecs
[    7.061030] [drm] radeon: ib pool ready.
[    9.582117] [drm:radeon_fence_wait] *ERROR* fence(ffff88011e2403c0:0x00000001) 510ms timeout going to reset GPU
[    9.583167] radeon 0000:01:05.0: GPU softreset 
[    9.584211] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xA0003030
[    9.585249] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
[    9.586278] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20002040
[    9.792461] radeon 0000:01:05.0: Wait for MC idle timedout !
[    9.793472] radeon 0000:01:05.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
[    9.794533] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001
[    9.795584] radeon 0000:01:05.0:   R_000E60_SRBM_SOFT_RESET=0x00000C02
[    9.796726] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0x00003030
[    9.797704] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
[    9.798674] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20000040
[    9.801112] [drm:radeon_fence_wait] *ERROR* fence(ffff88011e2403c0:0x00000001) 739ms timeout
[    9.802083] [drm:radeon_fence_wait] *ERROR* last signaled fence(0x00000001)
[   10.008268] [drm:r600_ib_test] *ERROR* radeon: ib test failed (sracth(0x8504)=0xCAFEDEAD)
[   10.009301] radeon 0000:01:05.0: IB test failed (-22).
[   10.010248] [drm] Enabling audio support
[   10.010428] [drm] Radeon Display Connectors
[   10.012283] [drm] Connector 0:
[   10.013201] [drm]   VGA
[   10.014107] [drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
[   10.015027] [drm]   Encoders:
[   10.015932] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[   10.016849] [drm] Connector 1:
[   10.017756] [drm]   DVI-D
[   10.018655] [drm]   HPD1
[   10.019549] [drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
[   10.020451] [drm]   Encoders:
[   10.021340] [drm]     DFP3: INTERNAL_KLDSCP_LVTMA
[   10.206070] [drm] fb mappable at 0xF0141000
[   10.206943] [drm] vram apper at 0xF0000000
[   10.207820] [drm] size 5242880
[   10.208682] [drm] fb depth is 24
[   10.209530] [drm]    pitch is 5120
[   10.210447] fb: conflicting fb hw usage radeondrmfb vs VESA VGA - removing generic driver
[   10.211318] Console: switching to colour dummy device 80x25
[   10.211436] Console: switching to colour frame buffer device 160x64
[   10.217744] fb0: radeondrmfb frame buffer device
[   10.217768] registered panic notifier
[   10.217789] [drm] Initialized radeon 2.0.0 20080528 for 0000:01:05.0 on minor 0
Comment 1 Marc Dietrich 2010-03-04 07:53:12 UTC
Created attachment 33759 [details]
full dmesg output
Comment 2 Jerome Glisse 2010-03-05 03:07:44 UTC
Does this happen all the time ?
Comment 3 Marc Dietrich 2010-03-05 03:41:22 UTC
yes - could this be sideport related?
Comment 4 Alex Deucher 2010-03-05 07:37:28 UTC
(In reply to comment #3)
> yes - could this be sideport related?
> 

Not likely.
Comment 5 Marc Dietrich 2010-03-08 00:57:40 UTC
I tried to bisect this and found that 2.6.32 also has this issue (I got this system a few weeks ago only). 2.6.31 shows "ring test failed" and I guess support for rs785 was not added earlier. So this chip seems to never have worked with KMS. 
Comment 6 Marc Dietrich 2010-03-10 01:27:35 UTC
I also tested the kernel from fredora 13a to see if I have a problem with my config, but it also shows fence errors. Other failed tests: Sideport -> UMA, limit memory from 4G to 2G. As this bug happens on a released kernel and also  crashes X sometimes, I changed the severity to critical.
Comment 7 Marc Dietrich 2010-03-12 12:11:32 UTC
tried with nosmp, mem=2G (out of 4) and NO_HZ, NO_PREEMPT - no change. below is the log with glisse drm-radeon-next tree (grr - again slow chip clock default):


[    7.940041] [drm] Initialized drm 1.1.0 20060810
[    8.479532] [drm] radeon defaulting to kernel modesetting.
[    8.482583] [drm] radeon kernel modesetting enabled.
[    8.492371] radeon 0000:01:05.0: PCI INT A -> Link[LNKC] -> GSI 10 (level, low) -> IRQ 10
[    8.495381] radeon 0000:01:05.0: setting latency timer to 64
[    8.496445] [drm] radeon: Initializing kernel modesetting.
[    8.499493] [drm] register mmio base: 0xFE9F0000
[    8.502432] [drm] register mmio size: 65536
[    8.505857] ATOM BIOS: 113
[    8.508694] [drm] Clocks initialized !
[    8.511485] [drm] 3 Power State(s)
[    8.514250] [drm] State 0 Default (default)
[    8.517008] [drm]    1 Clock Mode(s)
[    8.519743] [drm]            0 engine: 300000
[    8.522456] [drm] State 1 Performance 
[    8.525133] [drm]    1 Clock Mode(s)
[    8.527734] [drm]            0 engine: 200000
[    8.530262] [drm] State 2 Default 
[    8.532758] [drm]    1 Clock Mode(s)
[    8.535239] [drm]            0 engine: 500000
[    8.537706] [drm] radeon: power management initialized
[    8.540196] radeon 0000:01:05.0: VRAM: 128M 0xC0000000 - 0xC7FFFFFF (128M used)
[    8.542724] radeon 0000:01:05.0: GTT: 512M 0xA0000000 - 0xBFFFFFFF
[    8.545807] [drm] Detected VRAM RAM=128M, BAR=128M
[    8.546508] [drm] RAM width 32bits DDR
[    8.550749] [TTM] Zone  kernel: Available graphics memory: 1029488 kiB.
[    8.551439] [drm] radeon: 128M of VRAM memory ready
[    8.552113] [drm] radeon: 512M of GTT memory ready.
[    8.552785] [drm] radeon: irq initialized.
[    8.553447] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    8.554599] [drm] Loading RS780 Microcode
[    8.555252] platform radeon_cp.0: firmware: requesting radeon/RS780_pfp.bin
[    8.615296] platform radeon_cp.0: firmware: requesting radeon/RS780_me.bin
[    8.643581] platform radeon_cp.0: firmware: requesting radeon/R600_rlc.bin
[    8.688448] [drm] ring test succeeded in 1 usecs
[    8.689138] [drm] radeon: ib pool ready.
[   14.190109] radeon 0000:01:05.0: GPU lockup CP stall for more than 1000msec
[   14.190738] ------------[ cut here ]------------
[   14.191396] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:234 radeon_fence_wait+0x35d/0x3c0 [radeon]()
[   14.192044] Hardware name: System Product Name
[   14.192671] GPU lockup (waiting for 0x00000001 last fence id 0x00000000)
[   14.193314] Modules linked in: snd_hda_intel(+) radeon(+) snd_emu10k1 snd_rawmidi ttm snd_hda_codec snd_ac97_codec ac97_bus drm_kms_helper snd_pcm snd_seq_device drm snd_util_mem amd64_edac_mod emu10k1_gp snd_timer i2c_algo_bit snd_hwdep firewire_ohci snd kobil_sct edac_core firewire_core shpchp gameport crc_itu_t asus_atk0110 pcspkr soundcore snd_page_alloc button usbserial k10temp edac_mce_amd i2c_piix4 pci_hotplug sr_mod sg cdrom sd_mod ahci fan processor pata_atiixp libata scsi_mod thermal thermal_sys
[   14.196212] Pid: 691, comm: work_for_cpu Not tainted 2.6.33 #3
[   14.196888] Call Trace:
[   14.197575]  [<ffffffff810466a8>] warn_slowpath_common+0x78/0xb0
[   14.198260]  [<ffffffff8104673c>] warn_slowpath_fmt+0x3c/0x40
[   14.198937]  [<ffffffffa033a5fd>] radeon_fence_wait+0x35d/0x3c0 [radeon]
[   14.199616]  [<ffffffff81064070>] ? autoremove_wake_function+0x0/0x40
[   14.200299]  [<ffffffffa0375569>] r600_ib_test+0x189/0x300 [radeon]
[   14.200961]  [<ffffffffa037d6e0>] r600_init+0x2e0/0x360 [radeon]
[   14.201627]  [<ffffffffa03293ad>] radeon_device_init+0x29d/0x370 [radeon]
[   14.202297]  [<ffffffffa032a1ee>] radeon_driver_load_kms+0x9e/0x1d0 [radeon]
[   14.202945]  [<ffffffffa020140e>] drm_get_dev+0x34e/0x560 [drm]
[   14.203593]  [<ffffffff8103c86d>] ? default_wake_function+0xd/0x10
[   14.204227]  [<ffffffff8105f7f0>] ? do_work_for_cpu+0x0/0x30
[   14.204851]  [<ffffffffa0397012>] radeon_pci_probe+0x10/0x270 [radeon]
[   14.205479]  [<ffffffff81225d72>] local_pci_probe+0x12/0x20
[   14.206100]  [<ffffffff8105f803>] do_work_for_cpu+0x13/0x30
[   14.206704]  [<ffffffff81063b7e>] kthread+0x8e/0xa0
[   14.207314]  [<ffffffff81003b94>] kernel_thread_helper+0x4/0x10
[   14.207902]  [<ffffffff81063af0>] ? kthread+0x0/0xa0
[   14.208503]  [<ffffffff81003b90>] ? kernel_thread_helper+0x0/0x10
[   14.209100] ---[ end trace 48fab13bc7a5b259 ]---
[   14.209681] [drm] Disabling audio support
[   14.209708] radeon 0000:01:05.0: GPU softreset 
[   14.210855] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xA0003030
[   14.211443] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
[   14.212028] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x20002040
[   14.339731] radeon 0000:01:05.0: Wait for MC idle timedout !
[   14.340318] radeon 0000:01:05.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
[   14.355896] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001
[   14.372490] radeon 0000:01:05.0:   R_008010_GRBM_STATUS=0xA0003030
[   14.373084] radeon 0000:01:05.0:   R_008014_GRBM_STATUS2=0x00000003
[   14.373665] radeon 0000:01:05.0:   R_000E50_SRBM_STATUS=0x2000B040
[   14.375260] radeon 0000:01:05.0: GPU reset succeed
[   14.392243] [drm] Clocks initialized !
[   14.519942] radeon 0000:01:05.0: Wait for MC idle timedout !
[   14.647642] radeon 0000:01:05.0: Wait for MC idle timedout !
[   14.811733] [drm:r600_ring_test] *ERROR* radeon: ring test failed (scratch(0x8508)=0xCAFEDEAD)
[   14.812345] [drm:r600_resume] *ERROR* r600 startup failed on resume
[   14.812949] BUG: unable to handle kernel NULL pointer dereference at (null)
[   14.813336] IP: [<ffffffffa02668a4>] drm_helper_resume_force_mode+0x34/0x240 [drm_kms_helper]
[   14.813336] PGD 37d25067 PUD 37dd7067 PMD 0 
[   14.813336] Oops: 0000 [#1] SMP 
[   14.813336] last sysfs file: /sys/module/snd_hda_intel/initstate
[   14.813336] CPU 0 
[   14.813336] Pid: 691, comm: work_for_cpu Tainted: G        W  2.6.33 #3 M4A785TD-V EVO/System Product Name
[   14.813336] RIP: 0010:[<ffffffffa02668a4>]  [<ffffffffa02668a4>] drm_helper_resume_force_mode+0x34/0x240 [drm_kms_helper]
[   14.813336] RSP: 0018:ffff88007e7cdc40  EFLAGS: 00010293
[   14.813336] RAX: 0000000000000020 RBX: fffffffffffffff8 RCX: ffffc90011861740
[   14.813336] RDX: 0000000000001740 RSI: 00000000411a0015 RDI: ffff88007e7eb800
[   14.813336] RBP: ffff88007e7cdc70 R08: 0000000000001724 R09: 0000000000000000
[   14.813336] R10: 000000000000028d R11: 0000000000000000 R12: ffff88007e7ebca0
[   14.813336] R13: ffff88007e7eb800 R14: ffff88007e7ebcb8 R15: ffff88007ed8e930
[   14.813336] FS:  00007f42af12d790(0000) GS:ffff880001c00000(0000) knlGS:0000000000000000
[   14.813336] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   14.813336] CR2: 0000000000000000 CR3: 0000000037b64000 CR4: 00000000000006f0
[   14.813336] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   14.813336] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   14.813336] Process work_for_cpu (pid: 691, threadinfo ffff88007e7cc000, task ffff88007e8f2540)
[   14.813336] Stack:
[   14.813336]  0000000000000007 ffff88007ed8e000 0000000000000000 ffff88007f4fc340
[   14.813336] <0> 0000000000000000 ffff88007ed8e930 ffff88007e7cdca0 ffffffffa0328dee
[   14.813336] <0> ffff88007e7cdca0 ffff88007e02fbc0 ffff88007ed8e000 ffff88007e7cdcf0
[   14.813336] Call Trace:
[   14.813336]  [<ffffffffa0328dee>] radeon_gpu_reset+0xae/0xb0 [radeon]
[   14.813336]  [<ffffffffa033a62f>] radeon_fence_wait+0x38f/0x3c0 [radeon]
[   14.813336]  [<ffffffff81064070>] ? autoremove_wake_function+0x0/0x40
[   14.813336]  [<ffffffffa0375569>] r600_ib_test+0x189/0x300 [radeon]
[   14.813336]  [<ffffffffa037d6e0>] r600_init+0x2e0/0x360 [radeon]
[   14.813336]  [<ffffffffa03293ad>] radeon_device_init+0x29d/0x370 [radeon]
[   14.813336]  [<ffffffffa032a1ee>] radeon_driver_load_kms+0x9e/0x1d0 [radeon]
[   14.813336]  [<ffffffffa020140e>] drm_get_dev+0x34e/0x560 [drm]
[   14.813336]  [<ffffffff8103c86d>] ? default_wake_function+0xd/0x10
[   14.813336]  [<ffffffff8105f7f0>] ? do_work_for_cpu+0x0/0x30
[   14.813336]  [<ffffffffa0397012>] radeon_pci_probe+0x10/0x270 [radeon]
[   14.813336]  [<ffffffff81225d72>] local_pci_probe+0x12/0x20
[   14.813336]  [<ffffffff8105f803>] do_work_for_cpu+0x13/0x30
[   14.813336]  [<ffffffff81063b7e>] kthread+0x8e/0xa0
[   14.813336]  [<ffffffff81003b94>] kernel_thread_helper+0x4/0x10
[   14.813336]  [<ffffffff81063af0>] ? kthread+0x0/0xa0
[   14.813336]  [<ffffffff81003b90>] ? kernel_thread_helper+0x0/0x10
[   14.813336] Code: 8d b7 b8 04 00 00 41 55 49 89 fd 41 54 4c 8d a7 a0 04 00 00 53 48 83 ec 08 48 8b 9f b8 04 00 00 48 83 eb 08 eb 05 90 48 8d 58 f8 <48> 8b 43 08 48 8d 53 08 49 39 d6 0f 18 08 0f 84 b8 01 00 00 80 
[   14.813336] RIP  [<ffffffffa02668a4>] drm_helper_resume_force_mode+0x34/0x240 [drm_kms_helper]
[   14.813336]  RSP <ffff88007e7cdc40>
[   14.813336] CR2: 0000000000000000
[   14.840412] ---[ end trace 48fab13bc7a5b25a ]---
Comment 8 Marc Dietrich 2010-03-12 12:40:44 UTC
ok - turned out that the oops where pm related. When started with radeon.{dynpm,dynclks}=0 everything works fine!
Unfortunately, I cannot test the GPU reset patches alone, as they do not apply to 2.6.33. Jérôme, could you please supply something relative to 2.6.33?

Thanks!
Comment 9 Marc Dietrich 2010-03-13 08:52:17 UTC
Created attachment 34022 [details] [review]
backported patch

This bug report looks like a soliloquy. Anyway, I backported "drm/radeon/kms: fence cleanup + more reliable GPU lockup detection V4" myself to 2.6.33 and it fixes this problem. Can this be forwarded to upstream->stable?
Comment 10 Chris Sherlock 2011-02-04 19:13:58 UTC
Is it possible that bug 32662 is related to this one?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.