Summary: | fence errors with rs785 and kernel 2.6.33 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Marc Dietrich <marvin24> | ||||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||
Severity: | critical | ||||||||
Priority: | medium | CC: | marvin24 | ||||||
Version: | DRI git | ||||||||
Hardware: | x86-64 (AMD64) | ||||||||
OS: | Linux (All) | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
Marc Dietrich
2010-03-04 07:52:43 UTC
Created attachment 33759 [details]
full dmesg output
Does this happen all the time ? yes - could this be sideport related? (In reply to comment #3) > yes - could this be sideport related? > Not likely. I tried to bisect this and found that 2.6.32 also has this issue (I got this system a few weeks ago only). 2.6.31 shows "ring test failed" and I guess support for rs785 was not added earlier. So this chip seems to never have worked with KMS. I also tested the kernel from fredora 13a to see if I have a problem with my config, but it also shows fence errors. Other failed tests: Sideport -> UMA, limit memory from 4G to 2G. As this bug happens on a released kernel and also crashes X sometimes, I changed the severity to critical. tried with nosmp, mem=2G (out of 4) and NO_HZ, NO_PREEMPT - no change. below is the log with glisse drm-radeon-next tree (grr - again slow chip clock default): [ 7.940041] [drm] Initialized drm 1.1.0 20060810 [ 8.479532] [drm] radeon defaulting to kernel modesetting. [ 8.482583] [drm] radeon kernel modesetting enabled. [ 8.492371] radeon 0000:01:05.0: PCI INT A -> Link[LNKC] -> GSI 10 (level, low) -> IRQ 10 [ 8.495381] radeon 0000:01:05.0: setting latency timer to 64 [ 8.496445] [drm] radeon: Initializing kernel modesetting. [ 8.499493] [drm] register mmio base: 0xFE9F0000 [ 8.502432] [drm] register mmio size: 65536 [ 8.505857] ATOM BIOS: 113 [ 8.508694] [drm] Clocks initialized ! [ 8.511485] [drm] 3 Power State(s) [ 8.514250] [drm] State 0 Default (default) [ 8.517008] [drm] 1 Clock Mode(s) [ 8.519743] [drm] 0 engine: 300000 [ 8.522456] [drm] State 1 Performance [ 8.525133] [drm] 1 Clock Mode(s) [ 8.527734] [drm] 0 engine: 200000 [ 8.530262] [drm] State 2 Default [ 8.532758] [drm] 1 Clock Mode(s) [ 8.535239] [drm] 0 engine: 500000 [ 8.537706] [drm] radeon: power management initialized [ 8.540196] radeon 0000:01:05.0: VRAM: 128M 0xC0000000 - 0xC7FFFFFF (128M used) [ 8.542724] radeon 0000:01:05.0: GTT: 512M 0xA0000000 - 0xBFFFFFFF [ 8.545807] [drm] Detected VRAM RAM=128M, BAR=128M [ 8.546508] [drm] RAM width 32bits DDR [ 8.550749] [TTM] Zone kernel: Available graphics memory: 1029488 kiB. [ 8.551439] [drm] radeon: 128M of VRAM memory ready [ 8.552113] [drm] radeon: 512M of GTT memory ready. [ 8.552785] [drm] radeon: irq initialized. [ 8.553447] [drm] GART: num cpu pages 131072, num gpu pages 131072 [ 8.554599] [drm] Loading RS780 Microcode [ 8.555252] platform radeon_cp.0: firmware: requesting radeon/RS780_pfp.bin [ 8.615296] platform radeon_cp.0: firmware: requesting radeon/RS780_me.bin [ 8.643581] platform radeon_cp.0: firmware: requesting radeon/R600_rlc.bin [ 8.688448] [drm] ring test succeeded in 1 usecs [ 8.689138] [drm] radeon: ib pool ready. [ 14.190109] radeon 0000:01:05.0: GPU lockup CP stall for more than 1000msec [ 14.190738] ------------[ cut here ]------------ [ 14.191396] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:234 radeon_fence_wait+0x35d/0x3c0 [radeon]() [ 14.192044] Hardware name: System Product Name [ 14.192671] GPU lockup (waiting for 0x00000001 last fence id 0x00000000) [ 14.193314] Modules linked in: snd_hda_intel(+) radeon(+) snd_emu10k1 snd_rawmidi ttm snd_hda_codec snd_ac97_codec ac97_bus drm_kms_helper snd_pcm snd_seq_device drm snd_util_mem amd64_edac_mod emu10k1_gp snd_timer i2c_algo_bit snd_hwdep firewire_ohci snd kobil_sct edac_core firewire_core shpchp gameport crc_itu_t asus_atk0110 pcspkr soundcore snd_page_alloc button usbserial k10temp edac_mce_amd i2c_piix4 pci_hotplug sr_mod sg cdrom sd_mod ahci fan processor pata_atiixp libata scsi_mod thermal thermal_sys [ 14.196212] Pid: 691, comm: work_for_cpu Not tainted 2.6.33 #3 [ 14.196888] Call Trace: [ 14.197575] [<ffffffff810466a8>] warn_slowpath_common+0x78/0xb0 [ 14.198260] [<ffffffff8104673c>] warn_slowpath_fmt+0x3c/0x40 [ 14.198937] [<ffffffffa033a5fd>] radeon_fence_wait+0x35d/0x3c0 [radeon] [ 14.199616] [<ffffffff81064070>] ? autoremove_wake_function+0x0/0x40 [ 14.200299] [<ffffffffa0375569>] r600_ib_test+0x189/0x300 [radeon] [ 14.200961] [<ffffffffa037d6e0>] r600_init+0x2e0/0x360 [radeon] [ 14.201627] [<ffffffffa03293ad>] radeon_device_init+0x29d/0x370 [radeon] [ 14.202297] [<ffffffffa032a1ee>] radeon_driver_load_kms+0x9e/0x1d0 [radeon] [ 14.202945] [<ffffffffa020140e>] drm_get_dev+0x34e/0x560 [drm] [ 14.203593] [<ffffffff8103c86d>] ? default_wake_function+0xd/0x10 [ 14.204227] [<ffffffff8105f7f0>] ? do_work_for_cpu+0x0/0x30 [ 14.204851] [<ffffffffa0397012>] radeon_pci_probe+0x10/0x270 [radeon] [ 14.205479] [<ffffffff81225d72>] local_pci_probe+0x12/0x20 [ 14.206100] [<ffffffff8105f803>] do_work_for_cpu+0x13/0x30 [ 14.206704] [<ffffffff81063b7e>] kthread+0x8e/0xa0 [ 14.207314] [<ffffffff81003b94>] kernel_thread_helper+0x4/0x10 [ 14.207902] [<ffffffff81063af0>] ? kthread+0x0/0xa0 [ 14.208503] [<ffffffff81003b90>] ? kernel_thread_helper+0x0/0x10 [ 14.209100] ---[ end trace 48fab13bc7a5b259 ]--- [ 14.209681] [drm] Disabling audio support [ 14.209708] radeon 0000:01:05.0: GPU softreset [ 14.210855] radeon 0000:01:05.0: R_008010_GRBM_STATUS=0xA0003030 [ 14.211443] radeon 0000:01:05.0: R_008014_GRBM_STATUS2=0x00000003 [ 14.212028] radeon 0000:01:05.0: R_000E50_SRBM_STATUS=0x20002040 [ 14.339731] radeon 0000:01:05.0: Wait for MC idle timedout ! [ 14.340318] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00007FEE [ 14.355896] radeon 0000:01:05.0: R_008020_GRBM_SOFT_RESET=0x00000001 [ 14.372490] radeon 0000:01:05.0: R_008010_GRBM_STATUS=0xA0003030 [ 14.373084] radeon 0000:01:05.0: R_008014_GRBM_STATUS2=0x00000003 [ 14.373665] radeon 0000:01:05.0: R_000E50_SRBM_STATUS=0x2000B040 [ 14.375260] radeon 0000:01:05.0: GPU reset succeed [ 14.392243] [drm] Clocks initialized ! [ 14.519942] radeon 0000:01:05.0: Wait for MC idle timedout ! [ 14.647642] radeon 0000:01:05.0: Wait for MC idle timedout ! [ 14.811733] [drm:r600_ring_test] *ERROR* radeon: ring test failed (scratch(0x8508)=0xCAFEDEAD) [ 14.812345] [drm:r600_resume] *ERROR* r600 startup failed on resume [ 14.812949] BUG: unable to handle kernel NULL pointer dereference at (null) [ 14.813336] IP: [<ffffffffa02668a4>] drm_helper_resume_force_mode+0x34/0x240 [drm_kms_helper] [ 14.813336] PGD 37d25067 PUD 37dd7067 PMD 0 [ 14.813336] Oops: 0000 [#1] SMP [ 14.813336] last sysfs file: /sys/module/snd_hda_intel/initstate [ 14.813336] CPU 0 [ 14.813336] Pid: 691, comm: work_for_cpu Tainted: G W 2.6.33 #3 M4A785TD-V EVO/System Product Name [ 14.813336] RIP: 0010:[<ffffffffa02668a4>] [<ffffffffa02668a4>] drm_helper_resume_force_mode+0x34/0x240 [drm_kms_helper] [ 14.813336] RSP: 0018:ffff88007e7cdc40 EFLAGS: 00010293 [ 14.813336] RAX: 0000000000000020 RBX: fffffffffffffff8 RCX: ffffc90011861740 [ 14.813336] RDX: 0000000000001740 RSI: 00000000411a0015 RDI: ffff88007e7eb800 [ 14.813336] RBP: ffff88007e7cdc70 R08: 0000000000001724 R09: 0000000000000000 [ 14.813336] R10: 000000000000028d R11: 0000000000000000 R12: ffff88007e7ebca0 [ 14.813336] R13: ffff88007e7eb800 R14: ffff88007e7ebcb8 R15: ffff88007ed8e930 [ 14.813336] FS: 00007f42af12d790(0000) GS:ffff880001c00000(0000) knlGS:0000000000000000 [ 14.813336] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 14.813336] CR2: 0000000000000000 CR3: 0000000037b64000 CR4: 00000000000006f0 [ 14.813336] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 14.813336] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 14.813336] Process work_for_cpu (pid: 691, threadinfo ffff88007e7cc000, task ffff88007e8f2540) [ 14.813336] Stack: [ 14.813336] 0000000000000007 ffff88007ed8e000 0000000000000000 ffff88007f4fc340 [ 14.813336] <0> 0000000000000000 ffff88007ed8e930 ffff88007e7cdca0 ffffffffa0328dee [ 14.813336] <0> ffff88007e7cdca0 ffff88007e02fbc0 ffff88007ed8e000 ffff88007e7cdcf0 [ 14.813336] Call Trace: [ 14.813336] [<ffffffffa0328dee>] radeon_gpu_reset+0xae/0xb0 [radeon] [ 14.813336] [<ffffffffa033a62f>] radeon_fence_wait+0x38f/0x3c0 [radeon] [ 14.813336] [<ffffffff81064070>] ? autoremove_wake_function+0x0/0x40 [ 14.813336] [<ffffffffa0375569>] r600_ib_test+0x189/0x300 [radeon] [ 14.813336] [<ffffffffa037d6e0>] r600_init+0x2e0/0x360 [radeon] [ 14.813336] [<ffffffffa03293ad>] radeon_device_init+0x29d/0x370 [radeon] [ 14.813336] [<ffffffffa032a1ee>] radeon_driver_load_kms+0x9e/0x1d0 [radeon] [ 14.813336] [<ffffffffa020140e>] drm_get_dev+0x34e/0x560 [drm] [ 14.813336] [<ffffffff8103c86d>] ? default_wake_function+0xd/0x10 [ 14.813336] [<ffffffff8105f7f0>] ? do_work_for_cpu+0x0/0x30 [ 14.813336] [<ffffffffa0397012>] radeon_pci_probe+0x10/0x270 [radeon] [ 14.813336] [<ffffffff81225d72>] local_pci_probe+0x12/0x20 [ 14.813336] [<ffffffff8105f803>] do_work_for_cpu+0x13/0x30 [ 14.813336] [<ffffffff81063b7e>] kthread+0x8e/0xa0 [ 14.813336] [<ffffffff81003b94>] kernel_thread_helper+0x4/0x10 [ 14.813336] [<ffffffff81063af0>] ? kthread+0x0/0xa0 [ 14.813336] [<ffffffff81003b90>] ? kernel_thread_helper+0x0/0x10 [ 14.813336] Code: 8d b7 b8 04 00 00 41 55 49 89 fd 41 54 4c 8d a7 a0 04 00 00 53 48 83 ec 08 48 8b 9f b8 04 00 00 48 83 eb 08 eb 05 90 48 8d 58 f8 <48> 8b 43 08 48 8d 53 08 49 39 d6 0f 18 08 0f 84 b8 01 00 00 80 [ 14.813336] RIP [<ffffffffa02668a4>] drm_helper_resume_force_mode+0x34/0x240 [drm_kms_helper] [ 14.813336] RSP <ffff88007e7cdc40> [ 14.813336] CR2: 0000000000000000 [ 14.840412] ---[ end trace 48fab13bc7a5b25a ]--- ok - turned out that the oops where pm related. When started with radeon.{dynpm,dynclks}=0 everything works fine! Unfortunately, I cannot test the GPU reset patches alone, as they do not apply to 2.6.33. Jérôme, could you please supply something relative to 2.6.33? Thanks! Created attachment 34022 [details] [review] backported patch This bug report looks like a soliloquy. Anyway, I backported "drm/radeon/kms: fence cleanup + more reliable GPU lockup detection V4" myself to 2.6.33 and it fixes this problem. Can this be forwarded to upstream->stable? Is it possible that bug 32662 is related to this one? |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.