Bug 39282 - radeon HD6790 (barts) card produces black+white horizontal stripes on screen when launching Xorg
radeon HD6790 (barts) card produces black+white horizontal stripes on screen ...
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Driver/Radeon
7.6 (2010.12)
x86-64 (AMD64) Linux (All)
: medium normal
Assigned To: xf86-video-ati maintainers
Xorg Project Team
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-07-16 08:23 UTC by Tomasz
Modified: 2012-07-01 03:51 UTC (History)
7 users (show)

See Also:
i915 platform:
i915 features:


Attachments
bug report files (150.90 KB, application/octet-stream)
2011-07-16 08:23 UTC, Tomasz
no flags Details
Xorg log for 6.14.99 driver from freedesktop git repo (47.35 KB, text/plain)
2011-07-24 02:02 UTC, Tomasz
no flags Details
a photo of the error condition. (1.56 MB, image/jpeg)
2011-07-24 02:03 UTC, Tomasz
no flags Details
Dmesg from test run on 13/11/2011 (16.02 KB, application/octet-stream)
2011-11-13 01:28 UTC, Tomasz
no flags Details
possible fix (3.32 KB, patch)
2011-11-21 13:59 UTC, Alex Deucher
no flags Details | Splinter Review
possible fix (3.33 KB, patch)
2011-11-21 14:07 UTC, Alex Deucher
no flags Details | Splinter Review
possible fix (3.58 KB, patch)
2011-11-23 09:39 UTC, Alex Deucher
no flags Details | Splinter Review
possible fix (2.86 KB, patch)
2011-12-12 09:33 UTC, Alex Deucher
no flags Details | Splinter Review
possible fix (3.45 KB, patch)
2011-12-12 12:51 UTC, Alex Deucher
no flags Details | Splinter Review
possible fix (19.88 KB, patch)
2012-03-22 16:41 UTC, Alex Deucher
no flags Details | Splinter Review
possible fix (3.11 KB, patch)
2012-05-23 12:10 UTC, Alex Deucher
no flags Details | Splinter Review
Fix backend map (1.17 KB, patch)
2012-05-24 12:12 UTC, Jerome Glisse
no flags Details | Splinter Review
blank and noisy lines (696.41 KB, image/jpeg)
2012-06-04 14:51 UTC, Goulou
no flags Details
/var/log/messages during the different steps : lockup and backtrace dump later (13.67 KB, text/plain)
2012-06-04 14:52 UTC, Goulou
no flags Details
lspci (2.93 KB, text/plain)
2012-06-04 14:53 UTC, Goulou
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tomasz 2011-07-16 08:23:37 UTC
Created attachment 49177 [details]
bug report files

radeon HG6790 card produces black+white horizontal stripes on screen when launching Xorg

Distribution is Debian Wheezy
xorg 7.6
Radeon driver 6.14.2

Tried kernel 2.6.39-2 (Debian Build), 2.6.39.3 and 3.0.0RC7

on boot the system seemsto load the firmware/microcode and to launch launch into a high resolution console.

Test case 1: Run 'Xorg -retro', no xorg config, 
Produces a set of horizontal black and white stripes with colour speccles. CLearly visable X shapedmouse pointer apprears and can be moved on the screen. Xorg log Xorg.0.log.reto attached.


Test case 2: run startx to start Xorg system
Produces a set of horizontal black and white stripes with colour speccles, after a short time,. the screen blanks and the refreshes to same image. a backtrace is left in the Xorg.o.log. Xorg log Xorg.0.log.crash attached.
Comment 1 Tomasz 2011-07-16 08:26:20 UTC
happy to do more testing of required, but as this is my only workstation, I will need to play hardware shuffles to get the offending card into the system.

Tomasz
Comment 2 Tomasz 2011-07-24 02:01:34 UTC
I pulled xf86-video-ati from git on 24/07/2011 AEST, and built it.

Radeon HD5450 (Cedar) works perfectly.
Radeon HD6790 (barts) fails to give any sensible output on screen

Tried multiple monitors and multiple DVI and HDMI cables to check that I do not have a hardware issue. No change in behaviour

Attached a photo of screen to illustrate fault condition
Attached Xorg.0.log_git_driver for reference

Tomasz
Comment 3 Tomasz 2011-07-24 02:02:48 UTC
Created attachment 49462 [details]
Xorg log for 6.14.99 driver from freedesktop git repo
Comment 4 Tomasz 2011-07-24 02:03:54 UTC
Created attachment 49463 [details]
a photo of the error condition.
Comment 5 Alex Deucher 2011-07-25 15:52:53 UTC
does the patch in bug 38754 help?
Comment 6 Tomasz 2011-07-26 01:28:01 UTC
(In reply to comment #5)
> does the patch in bug 38754 help?

Nope. same behaviour as before.
Comment 7 Tomasz 2011-09-02 04:55:25 UTC
Testing update:
 
As suggested by Dave Airlie, I have applied patch https://patchwork.kernel.org/patch/1120122/ to kernel tree 3.1-rc4 tree, built new kernel and modules and tested behavior.

Issue continues as described before.
Comment 8 Tomasz 2011-09-02 05:09:56 UTC
card lockup trace 

Sep  2 21:36:57 redoubt kernel: [   58.248100] ------------[ cut here ]------------
Sep  2 21:36:57 redoubt kernel: [   58.248114] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 radeon_fence_wait+0x2bc/0x337 [radeon]()
Sep  2 21:36:57 redoubt kernel: [   58.248116] Hardware name: GA-870A-UD3
Sep  2 21:36:57 redoubt kernel: [   58.248117] GPU lockup (waiting for 0x00000010 last fence id 0x0000000D)
Sep  2 21:36:57 redoubt kernel: [   58.248119] Modules linked in: ppdev lp fuse firewire_sbp2 loop snd_hda_codec_hdmi radeon snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep ttm drm_kms_helper snd_pcm drm edac_core i2c_piix4 processor snd_seq parport_pc parport joydev edac_mce_amd snd_timer snd_seq_device snd i2c_algo_bit evdev k10temp serio_raw i2c_core cfbcopyarea power_supply cfbimgblt thermal_sys cfbfillrect hwmon soundcore snd_page_alloc button pcspkr wmi ext3 jbd mbcache usbhid hid sg sr_mod cdrom sd_mod crc_t10dif ata_generic ohci_hcd pata_atiixp ahci libahci firewire_ohci pata_jmicron xhci_hcd firewire_core libata crc_itu_t r8169 mii ehci_hcd floppy scsi_mod usbcore [last unloaded: scsi_wait_scan]
Sep  2 21:36:57 redoubt kernel: [   58.248158] Pid: 1592, comm: Xorg Tainted: G        W   3.1.0-rc4 #1
Sep  2 21:36:57 redoubt kernel: [   58.248159] Call Trace:
Sep  2 21:36:57 redoubt kernel: [   58.248164]  [<ffffffff81045363>] ? warn_slowpath_common+0x78/0x8c
Sep  2 21:36:57 redoubt kernel: [   58.248167]  [<ffffffff8104540f>] ? warn_slowpath_fmt+0x45/0x4a
Sep  2 21:36:57 redoubt kernel: [   58.248177]  [<ffffffffa035148e>] ? radeon_fence_wait+0x2bc/0x337 [radeon]
Sep  2 21:36:57 redoubt kernel: [   58.248181]  [<ffffffff8105eb1f>] ? add_wait_queue+0x3c/0x3c
Sep  2 21:36:57 redoubt kernel: [   58.248186]  [<ffffffffa02852d3>] ? ttm_bo_wait+0xb3/0x171 [ttm]
Sep  2 21:36:57 redoubt kernel: [   58.248190]  [<ffffffffa028618e>] ? ttm_bo_reserve+0x75/0x85 [ttm]
Sep  2 21:36:57 redoubt kernel: [   58.248206]  [<ffffffffa0361920>] ? radeon_bo_wait+0x73/0x9e [radeon]
Sep  2 21:36:57 redoubt kernel: [   58.248216]  [<ffffffffa0361dd8>] ? radeon_gem_wait_idle_ioctl+0x32/0x61 [radeon]
Sep  2 21:36:57 redoubt kernel: [   58.248221]  [<ffffffffa025ff4e>] ? drm_ioctl+0x265/0x334 [drm]
Sep  2 21:36:57 redoubt kernel: [   58.248224]  [<ffffffff8105322f>] ? recalc_sigpending+0x23/0x3c
Sep  2 21:36:57 redoubt kernel: [   58.248234]  [<ffffffffa0361da6>] ? radeon_gem_busy_ioctl+0x7e/0x7e [radeon]
Sep  2 21:36:57 redoubt kernel: [   58.248237]  [<ffffffff8132aa28>] ? sub_preempt_count+0x83/0x94
Sep  2 21:36:57 redoubt kernel: [   58.248240]  [<ffffffff8132801c>] ? _raw_spin_unlock_irq+0x27/0x33
Sep  2 21:36:57 redoubt kernel: [   58.248243]  [<ffffffff8100e339>] ? do_signal+0x51d/0x5f3
Sep  2 21:36:57 redoubt kernel: [   58.248246]  [<ffffffff81105168>] ? do_vfs_ioctl+0x400/0x441
Sep  2 21:36:57 redoubt kernel: [   58.248249]  [<ffffffff8100e6ec>] ? sys_rt_sigreturn+0x19a/0x1ca
Sep  2 21:36:57 redoubt kernel: [   58.248251]  [<ffffffff811051f4>] ? sys_ioctl+0x4b/0x6f
Sep  2 21:36:57 redoubt kernel: [   58.248254]  [<ffffffff8132d4d2>] ? system_call_fastpath+0x16/0x1b
Sep  2 21:36:57 redoubt kernel: [   58.248256] ---[ end trace 4be1ae1601e8388a ]---
Comment 9 Alex Deucher 2011-09-02 07:01:06 UTC
As per:
https://bugzilla.kernel.org/show_bug.cgi?id=42162
Does reverting b03e7495a862b028294f59fc87286d6d78ee7fa1 help?
Comment 10 Tomasz 2011-09-07 04:09:24 UTC
(In reply to comment #9)
> As per:
> https://bugzilla.kernel.org/show_bug.cgi?id=42162
> Does reverting b03e7495a862b028294f59fc87286d6d78ee7fa1 help?

Reverting the commit as requested does not produce change in behavior.
Comment 11 Florian Evers 2011-10-07 07:17:25 UTC
Hello,

I have the same issue here with my setup! :-(

I'm fiddeling with it for some weeks now, but all my tries did not yield a solution. In the moment, I'm using this hw/sw:

MSI HD6790, Barts-LE
Gentoo Linux, AMD64.
* xorg-server-1.10.4
* libdrm-9999 (live from git)
* mesa-9999 (live from git)
* xf86-video-ati-9999 (live from git)
* git-sources-3.1.0-rc9


The system starts without problems, KMS works, the solution of the screen is okay. But when I start "kdm", I get the same error as mentioned in this bug report! Immediately, I see a screen full of garbage with a perfect painted cursor on it. Then, the screen flickers about each 5 secs, but with no visible change. Switching to a concole with CTRL-ALT-F1 is possible.

No crashdump in /var/log/Xorg.0.log, but dmesg contains this crashdump:

[  326.817603] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[  326.817606] ------------[ cut here ]------------
[  326.817614] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 radeon_fence_wait+0x3ae/0x3e0()
[  326.817616] Hardware name:         
[  326.817618] GPU lockup (waiting for 0x00000010 last fence id 0x00000001)
[  326.817622] Pid: 23328, comm: X Not tainted 3.1.0-rc9 #2
[  326.817624] Call Trace:
[  326.817629]  [<ffffffff812fd33e>] ? radeon_fence_wait+0x3ae/0x3e0
[  326.817634]  [<ffffffff8103daf6>] ? warn_slowpath_common+0x76/0xc0
[  326.817638]  [<ffffffff8103dba5>] ? warn_slowpath_fmt+0x45/0x50
[  326.817643]  [<ffffffff812fd33e>] ? radeon_fence_wait+0x3ae/0x3e0
[  326.817647]  [<ffffffff81059d60>] ? abort_exclusive_wait+0xb0/0xb0
[  326.817653]  [<ffffffff812c66dd>] ? ttm_bo_wait+0x10d/0x1c0
[  326.817658]  [<ffffffff813177ff>] ? radeon_gem_wait_idle_ioctl+0x8f/0x110
[  326.817663]  [<ffffffff812afadc>] ? drm_ioctl+0x39c/0x460
[  326.817667]  [<ffffffff8104ba5e>] ? recalc_sigpending+0xe/0x30
[  326.817671]  [<ffffffff81317770>] ? radeon_gem_busy_ioctl+0x140/0x140
[  326.817677]  [<ffffffff8100b9c8>] ? __sanitize_i387_state+0xa8/0x120
[  326.817681]  [<ffffffff8104e618>] ? set_current_blocked+0x38/0x60
[  326.817684]  [<ffffffff81001990>] ? do_signal+0x220/0x790
[  326.817689]  [<ffffffff810f1556>] ? do_vfs_ioctl+0x96/0x500
[  326.817699]  [<ffffffff81002240>] ? sys_rt_sigreturn+0x1e0/0x200
[  326.817701]  [<ffffffff810f1a09>] ? sys_ioctl+0x49/0x80
[  326.817704]  [<ffffffff816a267b>] ? system_call_fastpath+0x16/0x1b
[  326.817705] ---[ end trace 5c70e9f469126299 ]---
[  326.818810] radeon 0000:01:00.0: GPU softreset 
[  326.818812] radeon 0000:01:00.0:   GRBM_STATUS=0xB1403828
[  326.818814] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x28000007
[  326.818816] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x28000007
[  326.818818] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[  326.818829] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[  326.818931] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[  326.818933] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[  326.818934] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[  326.818935] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[  326.819935] radeon 0000:01:00.0: GPU reset succeed
[  326.844365] radeon 0000:01:00.0: WB enabled
[  326.860589] [drm] ring test succeeded in 3 usecs
[  326.860599] [drm] ib test succeeded in 3 usecs
[  332.577146] radeon 0000:01:00.0: GPU lockup CP stall for more than 15776msec
[  332.577150] ------------[ cut here ]------------
[  332.577157] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 radeon_fence_wait+0x3ae/0x3e0()
[  332.577160] Hardware name:         
[  332.577162] GPU lockup (waiting for 0x00000011 last fence id 0x00000010)
[  332.577165] Pid: 23328, comm: X Tainted: G        W   3.1.0-rc9 #2
[  332.577168] Call Trace:
[  332.577173]  [<ffffffff812fd33e>] ? radeon_fence_wait+0x3ae/0x3e0
[  332.577178]  [<ffffffff8103daf6>] ? warn_slowpath_common+0x76/0xc0
[  332.577182]  [<ffffffff8103dba5>] ? warn_slowpath_fmt+0x45/0x50
[  332.577187]  [<ffffffff812fd33e>] ? radeon_fence_wait+0x3ae/0x3e0
[  332.577192]  [<ffffffff815db5a7>] ? unix_stream_recvmsg+0x5f7/0x710
[  332.577196]  [<ffffffff81059d60>] ? abort_exclusive_wait+0xb0/0xb0
[  332.577201]  [<ffffffff81317e00>] ? radeon_ib_get+0x130/0x1f0
[  332.577206]  [<ffffffff81319648>] ? radeon_cs_ioctl+0x98/0x200
[  332.577211]  [<ffffffff812afadc>] ? drm_ioctl+0x39c/0x460
[  332.577216]  [<ffffffff810b3fda>] ? handle_pte_fault+0x8a/0x7b0
[  332.577220]  [<ffffffff813195b0>] ? radeon_cs_finish_pages+0xa0/0xa0
[  332.577226]  [<ffffffff810f1556>] ? do_vfs_ioctl+0x96/0x500
[  332.577230]  [<ffffffff810e051a>] ? vfs_read+0x14a/0x160
[  332.577240]  [<ffffffff810f1a09>] ? sys_ioctl+0x49/0x80
[  332.577243]  [<ffffffff816a267b>] ? system_call_fastpath+0x16/0x1b
[  332.577244] ---[ end trace 5c70e9f46912629a ]---
[  332.578374] radeon 0000:01:00.0: GPU softreset 
[  332.578376] radeon 0000:01:00.0:   GRBM_STATUS=0xB1403828
[  332.578379] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x28000007
[  332.578381] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x28000007
[  332.578383] radeon 0000:01:00.0:   SRBM_STATUS=0x20000AC0
[  332.578408] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[  332.578512] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[  332.578514] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[  332.578515] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[  332.578517] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[  332.579516] radeon 0000:01:00.0: GPU reset succeed
[  332.604238] radeon 0000:01:00.0: WB enabled
[  332.620460] [drm] ring test succeeded in 3 usecs
[  332.620469] [drm] ib test succeeded in 3 usecs
[  348.032992] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[  348.032995] ------------[ cut here ]------------
[  348.033002] WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:267 radeon_fence_wait+0x3ae/0x3e0()
[  348.033004] Hardware name:         
[  348.033006] GPU lockup (waiting for 0x00000013 last fence id 0x00000011)
[  348.033010] Pid: 23328, comm: X Tainted: G        W   3.1.0-rc9 #2
[  348.033012] Call Trace:
[  348.033017]  [<ffffffff812fd33e>] ? radeon_fence_wait+0x3ae/0x3e0
[  348.033022]  [<ffffffff8103daf6>] ? warn_slowpath_common+0x76/0xc0
[  348.033026]  [<ffffffff8103dba5>] ? warn_slowpath_fmt+0x45/0x50
[  348.033031]  [<ffffffff812fd33e>] ? radeon_fence_wait+0x3ae/0x3e0
[  348.033036]  [<ffffffff815db5a7>] ? unix_stream_recvmsg+0x5f7/0x710
[  348.033040]  [<ffffffff81059d60>] ? abort_exclusive_wait+0xb0/0xb0
[  348.033045]  [<ffffffff81317e00>] ? radeon_ib_get+0x130/0x1f0
[  348.033049]  [<ffffffff81319648>] ? radeon_cs_ioctl+0x98/0x200
[  348.033054]  [<ffffffff812afadc>] ? drm_ioctl+0x39c/0x460
[  348.033058]  [<ffffffff813195b0>] ? radeon_cs_finish_pages+0xa0/0xa0
[  348.033063]  [<ffffffff810f1556>] ? do_vfs_ioctl+0x96/0x500
[  348.033067]  [<ffffffff810e051a>] ? vfs_read+0x14a/0x160
[  348.033079]  [<ffffffff810f1a09>] ? sys_ioctl+0x49/0x80
[  348.033082]  [<ffffffff816a267b>] ? system_call_fastpath+0x16/0x1b
[  348.033083] ---[ end trace 5c70e9f46912629b ]---
[  348.034211] radeon 0000:01:00.0: GPU softreset 
[  348.034214] radeon 0000:01:00.0:   GRBM_STATUS=0xB1403828
[  348.034216] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x28000007
[  348.034218] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x28000007
[  348.034220] radeon 0000:01:00.0:   SRBM_STATUS=0x20000AC0
[  348.034245] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[  348.034349] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[  348.034351] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[  348.034352] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[  348.034354] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[  348.035353] radeon 0000:01:00.0: GPU reset succeed
[  348.060071] radeon 0000:01:00.0: WB enabled
[  348.076293] [drm] ring test succeeded in 3 usecs
[  348.076303] [drm] ib test succeeded in 3 usecs

Hope this helps... thank you in advance :-)

Regards,
Florian
Comment 12 Tomasz 2011-10-31 04:25:04 UTC
tested Linux 3.1.0 from git tree on github. 

Symptoms persist as per the bug title. Lots of garbage arranged hosizxontal stripes with perfectly formed mouse cursor...

dmesg/kern.log shows

 Oct 31 21:54:36 redoubt kernel: [   38.118346] [drm:radeon_dp_get_link_status] *ERROR* displayport link status failed
Oct 31 21:54:36 redoubt kernel: [   38.118350] [drm:radeon_dp_link_train_cr] *ERROR* clock recovery failed
Oct 31 21:54:48 redoubt kernel: [   50.044077] radeon 0000:01:00.0: GPU lockup CP stall for more than 10020msec
Oct 31 21:54:48 redoubt kernel: [   50.044080] GPU lockup (waiting for 0x0000000D last fence id 0x00000001)
Oct 31 21:54:48 redoubt kernel: [   50.045142] radeon 0000:01:00.0: GPU softreset 
Oct 31 21:54:48 redoubt kernel: [   50.045144] radeon 0000:01:00.0:   GRBM_STATUS=0xB1403828
Oct 31 21:54:48 redoubt kernel: [   50.045146] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x28000007
Oct 31 21:54:48 redoubt kernel: [   50.045148] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x28000007
Oct 31 21:54:48 redoubt kernel: [   50.045150] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
Oct 31 21:54:48 redoubt kernel: [   50.045162] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
Oct 31 21:54:48 redoubt kernel: [   50.045264] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
Oct 31 21:54:48 redoubt kernel: [   50.045266] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
Oct 31 21:54:48 redoubt kernel: [   50.045268] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
Oct 31 21:54:48 redoubt kernel: [   50.045270] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
Oct 31 21:54:48 redoubt kernel: [   50.046262] radeon 0000:01:00.0: GPU reset succeed
Oct 31 21:54:48 redoubt kernel: [   50.084416] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
Oct 31 21:54:48 redoubt kernel: [   50.084506] radeon 0000:01:00.0: WB enabled
Oct 31 21:54:48 redoubt kernel: [   50.100623] [drm] ring test succeeded in 2 usecs
Oct 31 21:54:48 redoubt kernel: [   50.100637] [drm] ib test succeeded in 3 usecs
Oct 31 21:54:58 redoubt kernel: [   60.708084] radeon 0000:01:00.0: GPU lockup CP stall for more than 10020msec
Oct 31 21:54:58 redoubt kernel: [   60.708088] GPU lockup (waiting for 0x00000010 last fence id 0x0000000D)
Oct 31 21:54:58 redoubt kernel: [   60.709172] radeon 0000:01:00.0: GPU softreset 
Oct 31 21:54:58 redoubt kernel: [   60.709174] radeon 0000:01:00.0:   GRBM_STATUS=0xB1403828
Oct 31 21:54:58 redoubt kernel: [   60.709177] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x28000007
Oct 31 21:54:58 redoubt kernel: [   60.709179] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x28000007
Oct 31 21:54:58 redoubt kernel: [   60.709181] radeon 0000:01:00.0:   SRBM_STATUS=0x20000AC0
Oct 31 21:54:58 redoubt kernel: [   60.709207] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
Oct 31 21:54:58 redoubt kernel: [   60.709311] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
Oct 31 21:54:58 redoubt kernel: [   60.709313] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
Oct 31 21:54:58 redoubt kernel: [   60.709315] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
Oct 31 21:54:58 redoubt kernel: [   60.709318] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
Oct 31 21:54:58 redoubt kernel: [   60.710310] radeon 0000:01:00.0: GPU reset succeed
Oct 31 21:54:59 redoubt kernel: [   60.748618] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
Oct 31 21:54:59 redoubt kernel: [   60.748708] radeon 0000:01:00.0: WB enabled
Oct 31 21:54:59 redoubt kernel: [   60.764820] [drm] ring test succeeded in 2 usecs
Oct 31 21:54:59 redoubt kernel: [   60.764832] [drm] ib test succeeded in 3 usecs


--- Xorg.0.log shows:

   247.589] (**) Option "xkb_rules" "evdev"
[   247.589] (**) Option "xkb_model" "pc105"
[   247.589] (**) Option "xkb_layout" "us"
[   247.589] (II) config/udev: Adding input device PC Speaker (/dev/input/event8)
[   247.589] (II) No input driver/identifier specified (ignoring)
[   253.582] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[   253.582] 
Backtrace:
[   253.583] 0: /usr/bin/X (xorg_backtrace+0x26) [0x7f9c5a79f8f6]
[   253.583] 1: /usr/bin/X (mieqEnqueue+0x191) [0x7f9c5a780201]
[   253.583] 2: /usr/bin/X (0x7f9c5a61b000+0x65224) [0x7f9c5a680224]
[   253.583] 3: /usr/bin/X (xf86PostMotionEventP+0x4a) [0x7f9c5a6bab2a]
[   253.583] 4: /usr/lib/xorg/modules/input/evdev_drv.so (0x7f9c545f2000+0x49ee) [0x7f9c545f69ee]
[   253.583] 5: /usr/bin/X (0x7f9c5a61b000+0x8aca7) [0x7f9c5a6a5ca7]
[   253.583] 6: /usr/bin/X (0x7f9c5a61b000+0xb087e) [0x7f9c5a6cb87e]
[   253.583] 7: /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f9c59943000+0xf020) [0x7f9c59952020]
[   253.583] 8: /lib/x86_64-linux-gnu/libc.so.6 (ioctl+0x7) [0x7f9c587323b7]
[   253.583] 9: /usr/lib/x86_64-linux-gnu/libdrm.so.2 (drmIoctl+0x28) [0x7f9c575727d8]
[   253.583] 10: /usr/lib/x86_64-linux-gnu/libdrm.so.2 (drmCommandWriteRead+0x1c) [0x7f9c57574aec]
[   253.583] 11: /usr/lib/x86_64-linux-gnu/libdrm_radeon.so.1 (0x7f9c56392000+0x1e19) [0x7f9c56393e19]
[   253.583] 12: /usr/lib/x86_64-linux-gnu/libdrm_radeon.so.1 (0x7f9c56392000+0x2034) [0x7f9c56394034]
[   253.583] 13: /usr/lib/xorg/modules/drivers/radeon_drv.so (0x7f9c56598000+0xc49d4) [0x7f9c5665c9d4]
[   253.583] 14: /usr/lib/xorg/modules/libexa.so (0x7f9c5617a000+0x5ce7) [0x7f9c5617fce7]
[   253.583] 15: /usr/lib/xorg/modules/libexa.so (0x7f9c5617a000+0x851a) [0x7f9c5618251a]
[   253.583] 16: /usr/lib/xorg/modules/libexa.so (0x7f9c5617a000+0x5252) [0x7f9c5617f252]
[   253.583] 17: /usr/bin/X (0x7f9c5a61b000+0xd66c0) [0x7f9c5a6f16c0]
[   253.583] 18: /usr/bin/X (ChangeWindowAttributes+0x281) [0x7f9c5a696671]
[   253.583] 19: /usr/bin/X (0x7f9c5a61b000+0x4c4e1) [0x7f9c5a6674e1]
[   253.583] 20: /usr/bin/X (0x7f9c5a61b000+0x51f59) [0x7f9c5a66cf59]
[   253.583] 21: /usr/bin/X (0x7f9c5a61b000+0x411ba) [0x7f9c5a65c1ba]
[   253.583] 22: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xfd) [0x7f9c58682ead]
[   253.583] 23: /usr/bin/X (0x7f9c5a61b000+0x414ad) [0x7f9c5a65c4ad]
[   269.348] (II) AIGLX: Suspending AIGLX clients for VT switch
Comment 13 Tomasz 2011-11-13 01:28:10 UTC
Created attachment 53473 [details]
Dmesg from test run on 13/11/2011
Comment 14 Tomasz 2011-11-13 01:28:44 UTC
 tested latest 3.2-rc1 + git patches and current debian wheezy xorg + drivers. Firmware updated to 0.34 (not sure if it make any difference. Issue still persists, and manifests as very clean looking horizontal black and white stripes - no trace of any crashes/errors in kernel log or Xorg log, but I do see several soft GPU resets.
Comment 15 Florian Evers 2011-11-18 09:20:00 UTC
Hello,

I also have still the same problem here... but now the dmesg-crashdump is gone. I'm using 3.2-rc2 (git-sources) now on a Gentoo amd64 system. Same result: If I start X, I get a garbled screen. No crash visible in Xorg.0.log, only this output in dmesg and systemlog:

[   32.343250] radeon 0000:01:00.0: GPU lockup CP stall for more than 10059msec
[   32.343253] GPU lockup (waiting for 0x00000010 last fence id 0x00000001)
[   32.344363] radeon 0000:01:00.0: GPU softreset 
[   32.344365] radeon 0000:01:00.0:   GRBM_STATUS=0xB1403828
[   32.344367] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x28000007
[   32.344369] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x28000007
[   32.344371] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   32.344381] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[   32.344484] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[   32.344486] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[   32.344488] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[   32.344489] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   32.345490] radeon 0000:01:00.0: GPU reset succeed
[   32.370151] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[   32.370244] radeon 0000:01:00.0: WB enabled
[   32.386475] [drm] ring test succeeded in 3 usecs
[   32.386485] [drm] ib test succeeded in 3 usecs
[   32.646386] usb 1-1.5: unlink qh8-0e01/ffff880212383880 start 4 [1/2 us]
[   43.424562] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[   43.424564] GPU lockup (waiting for 0x00000012 last fence id 0x00000010)
[   43.425698] radeon 0000:01:00.0: GPU softreset 
[   43.425700] radeon 0000:01:00.0:   GRBM_STATUS=0xB1403828
[   43.425703] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x28000007
[   43.425705] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x28000007
[   43.425707] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   43.425731] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[   43.425835] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[   43.425837] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[   43.425839] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[   43.425842] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   43.426842] radeon 0000:01:00.0: GPU reset succeed
[   43.451800] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[   43.451891] radeon 0000:01:00.0: WB enabled
[   43.468122] [drm] ring test succeeded in 3 usecs
[   43.468132] [drm] ib test succeeded in 3 usecs

Just an idea: could this bug be related to the fact that these cards have a stripped-down BARTS core, a BARTS-LE? Perhaps the driver tries to do some "forbidden" things ;-)

From lspci:
01:00.0 VGA compatible controller: ATI Technologies Inc Barts LE [AMD Radeon HD 6700 Series]

Please, if I can help you, do some tests here, whatever... dont hesitate to ask me :-)
Florian
Comment 16 Alex Deucher 2011-11-21 13:59:36 UTC
Created attachment 53756 [details] [review]
possible fix

Does this patch help?
Comment 17 Alex Deucher 2011-11-21 14:07:09 UTC
Created attachment 53757 [details] [review]
possible fix

Try this one instead.
Comment 18 Florian Evers 2011-11-22 00:14:23 UTC
Hi Alex,

thanks alot for your patch. Unfortunately, I does not help, but things changed a little bit:

At first, the visible output is garbled again, and I still see this error in dmesg, but now with a new debug output line due to you patch:

Nov 22 08:56:21 [kernel] [29815.297445] radeon 0000:01:00.0: GPU lockup CP stall for more than 10038msec
Nov 22 08:56:21 [kernel] [29815.297447] GPU lockup (waiting for 0x00000010 last fence id 0x00000001)
Nov 22 08:56:21 [kernel] [29815.298555] radeon 0000:01:00.0: GPU softreset 
Nov 22 08:56:21 [kernel] [29815.298558] radeon 0000:01:00.0:   GRBM_STATUS=0xF5400828
Nov 22 08:56:21 [kernel] [29815.298560] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xE8000005
Nov 22 08:56:21 [kernel] [29815.298562] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xE8000001
Nov 22 08:56:21 [kernel] [29815.298563] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
Nov 22 08:56:21 [kernel] [29815.298574] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
Nov 22 08:56:21 [kernel] [29815.298676] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
Nov 22 08:56:21 [kernel] [29815.298678] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
Nov 22 08:56:21 [kernel] [29815.298680] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
Nov 22 08:56:21 [kernel] [29815.298682] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
Nov 22 08:56:21 [kernel] [29815.299682] radeon 0000:01:00.0: GPU reset succeed
Nov 22 08:56:21 [kernel] [29815.324388] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
Nov 22 08:56:21 [kernel] [29815.324411] [drm:evergreen_gpu_init] *ERROR* bad backend map, using default
Nov 22 08:56:21 [kernel] [29815.324483] radeon 0000:01:00.0: WB enabled
Nov 22 08:56:21 [kernel] [29815.340713] [drm] ring test succeeded in 3 usecs
Nov 22 08:56:21 [kernel] [29815.340723] [drm] ib test succeeded in 3 usecs

... and that over and over again, all 15 secs one entry in dmesg until I stop the xdm service.

Additionally, this was the first time that I saw a crashdump in /var/log/Xorg.log, but I'm not sure if it is related. I saw this crash only once, and was unable to reproduce it a second time:

[ 29893.817] [mi] EQ overflowing. The server is probably stuck in an infinite loop.
[ 29893.817] 
Backtrace:
[ 29893.817] 0: /usr/bin/X (xorg_backtrace+0x28) [0x4600f8]
[ 29893.817] 1: /usr/bin/X (mieqEnqueue+0x1f4) [0x45a634]
[ 29893.817] 2: /usr/bin/X (xf86PostMotionEventM+0x97) [0x480187]
[ 29893.817] 3: /usr/bin/X (xf86PostMotionEventP+0x3c) [0x48027c]
[ 29893.817] 4: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f8a8e2cf000+0x490a) [0x7f8a8e2d390a]
[ 29893.817] 5: /usr/bin/X (0x400000+0x6da17) [0x46da17]
[ 29893.817] 6: /usr/bin/X (0x400000+0x11cf23) [0x51cf23]
[ 29893.817] 7: /lib64/libpthread.so.0 (0x7f8a93384000+0x10310) [0x7f8a93394310]
[ 29893.817] 8: /lib64/libc.so.6 (ioctl+0x7) [0x7f8a9239c4e7]
[ 29893.817] 9: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x7f8a90940d88]
[ 29893.817] 10: /usr/lib64/libdrm.so.2 (drmCommandWriteRead+0x1c) [0x7f8a9094311c]
[ 29893.817] 11: /usr/lib64/libdrm_radeon.so.1 (0x7f8a90011000+0x20f9) [0x7f8a900130f9]
[ 29893.817] 12: /usr/lib64/libdrm_radeon.so.1 (0x7f8a90011000+0x2144) [0x7f8a90013144]
[ 29893.817] 13: /usr/lib64/xorg/modules/drivers/radeon_drv.so (0x7f8a90218000+0xc8593) [0x7f8a902e0593]
[ 29893.817] 14: /usr/lib64/xorg/modules/libexa.so (0x7f8a8fdf8000+0x6457) [0x7f8a8fdfe457]
[ 29893.817] 15: /usr/lib64/xorg/modules/libexa.so (0x7f8a8fdf8000+0x90a2) [0x7f8a8fe010a2]
[ 29893.817] 16: /usr/lib64/xorg/modules/libexa.so (0x7f8a8fdf8000+0x5792) [0x7f8a8fdfd792]
[ 29893.817] 17: /usr/bin/X (0x400000+0xa5b90) [0x4a5b90]
[ 29893.817] 18: /usr/bin/X (ChangeWindowAttributes+0x2c3) [0x4582f3]
[ 29893.817] 19: /usr/bin/X (0x400000+0x2af88) [0x42af88]
[ 29893.817] 20: /usr/bin/X (0x400000+0x30c51) [0x430c51]
[ 29893.817] 21: /usr/bin/X (0x400000+0x24aee) [0x424aee]
[ 29893.817] 22: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7f8a922f0e9d]
[ 29893.817] 23: /usr/bin/X (0x400000+0x24699) [0x424699]
[ 29903.970] (II) AIGLX: Suspending AIGLX clients for VT switch

Hope it helps. Thank you very much :-)
Florian
Comment 19 Tomasz 2011-11-22 01:21:16 UTC
Hi there. Thnaks for your work. 

Issue persists.

Unlike Florian , I have not seen any evidence of X server crash, but here is the output from dmesg (kernel log)

[   59.004074] radeon 0000:01:00.0: GPU lockup CP stall for more than 10036msec
[   59.004077] GPU lockup (waiting for 0x0000000D last fence id 0x00000001)
[   59.005143] radeon 0000:01:00.0: GPU softreset 
[   59.005145] radeon 0000:01:00.0:   GRBM_STATUS=0xF5400828
[   59.005147] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xE8000005
[   59.005149] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xE8000001
[   59.005151] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   59.005162] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[   59.005265] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[   59.005267] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[   59.005268] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[   59.005270] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   59.006265] radeon 0000:01:00.0: GPU reset succeed
[   59.044414] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[   59.044439] [drm:evergreen_gpu_init] *ERROR* bad backend map, using default
[   59.044510] radeon 0000:01:00.0: WB enabled
[   59.060618] [drm] ring test succeeded in 2 usecs
[   59.060629] [drm] ib test succeeded in 3 usecs
[   69.712099] radeon 0000:01:00.0: GPU lockup CP stall for more than 10020msec
[   69.713631] GPU lockup (waiting for 0x00000013 last fence id 0x0000000D)
[   69.714722] radeon 0000:01:00.0: GPU softreset 
[   69.714725] radeon 0000:01:00.0:   GRBM_STATUS=0xF5400828
[   69.714727] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xE8000005
[   69.714730] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xE8000001
[   69.714732] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   69.714760] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[   69.714863] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[   69.714865] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[   69.714868] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[   69.714870] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   69.715864] radeon 0000:01:00.0: GPU reset succeed
[   69.812508] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[   69.812533] [drm:evergreen_gpu_init] *ERROR* bad backend map, using default
[   69.814132] radeon 0000:01:00.0: WB enabled
[   69.830241] [drm] ring test succeeded in 2 usecs
[   69.830253] [drm] ib test succeeded in 3 usecs
Comment 20 Florian Evers 2011-11-23 09:27:36 UTC
Hi,

the "switch" command of your patch choses the default option, resulting in the debug output mentioned above. In consequence, the backend map is considered not to be correct... or unknown as the appropriate case statement is missing.

So, I modified your debug statement to additionally show the expression of the switch statement. On my system, the output is "5".

Needless to say, that there is no "case 0x05:" statement yet :-) Does a "5" make any sense to you?

Regards,
Florian
Comment 21 Alex Deucher 2011-11-23 09:39:31 UTC
Created attachment 53814 [details] [review]
possible fix

Try this patch.
Comment 22 Florian Evers 2011-11-23 09:55:36 UTC
Hi Alex,

thanks alot again! But I'm sorry, bad news...

the debug output is gone now (as expected), but the system behaves equal. Besides that the garbled screen looks a little different now (lots of little squares with different colors instead of stripes), I again see a crash in Xorg.0.log and the above mentioned lockups in dmesg:

[   32.158017] radeon 0000:01:00.0: GPU lockup CP stall for more than 10019msec
[   32.158019] GPU lockup (waiting for 0x00000010 last fence id 0x00000001)
[   32.159131] radeon 0000:01:00.0: GPU softreset 
[   32.159133] radeon 0000:01:00.0:   GRBM_STATUS=0xF5400828
[   32.159135] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x08000003
[   32.159137] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xE8000001
[   32.159139] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   32.159149] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[   32.159252] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[   32.159254] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[   32.159255] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[   32.159257] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   32.160258] radeon 0000:01:00.0: GPU reset succeed
[   32.184957] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[   32.185050] radeon 0000:01:00.0: WB enabled
[   32.201281] [drm] ring test succeeded in 3 usecs
[   32.201291] [drm] ib test succeeded in 3 usecs
[   36.923404] radeon 0000:01:00.0: GPU lockup CP stall for more than 14798msec
[   36.923406] GPU lockup (waiting for 0x00000011 last fence id 0x00000010)
[   36.924538] radeon 0000:01:00.0: GPU softreset 
[   36.924541] radeon 0000:01:00.0:   GRBM_STATUS=0xF5400828
[   36.924543] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x08000003
[   36.924545] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xE8000001
[   36.924547] radeon 0000:01:00.0:   SRBM_STATUS=0x20000AC0
[   36.924572] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[   36.924675] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[   36.924678] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[   36.924680] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[   36.924682] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   36.925682] radeon 0000:01:00.0: GPU reset succeed
[   36.950664] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[   36.950757] radeon 0000:01:00.0: WB enabled
[   36.967010] [drm] ring test succeeded in 3 usecs
[   36.967020] [drm] ib test succeeded in 3 usecs
[   51.879646] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
[   51.879648] GPU lockup (waiting for 0x00000013 last fence id 0x00000011)
[   51.880781] radeon 0000:01:00.0: GPU softreset 
[   51.880783] radeon 0000:01:00.0:   GRBM_STATUS=0xF5501828
[   51.880785] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x28000003
[   51.880788] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xEC000007
[   51.880790] radeon 0000:01:00.0:   SRBM_STATUS=0x20000AC0
[   51.880816] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[   51.880920] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[   51.880922] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[   51.880924] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[   51.880926] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   51.881927] radeon 0000:01:00.0: GPU reset succeed
[   51.906909] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[   51.907002] radeon 0000:01:00.0: WB enabled
[   51.923233] [drm] ring test succeeded in 3 usecs
[   51.923242] [drm] ib test succeeded in 3 usecs

... same story ...

Regards,
Florian
Comment 23 Alex Deucher 2011-11-23 10:42:03 UTC
Try manually setting gb_backend_map to the following values and see if any of them help:

0x00000000
0x11111111
0x22222222
0x33333333
0x44444444
0x55555555
0x66666666
0x77777777
Comment 24 Florian Evers 2011-11-24 01:25:02 UTC
Hi Alex,

thanks again, and now it seems that we are on the right track!

My test results:

0x00000000
0x22222222
0x44444444
0x66666666
-> All these result in immediate lockups.

0x11111111
0x33333333
0x55555555
0x77777777
-> All these result in a correctly started X server :-D
With any of these settings, kdm started correctly, and I was able to log in (KDM->KDE 4.7.3).

0x33333333:
using this setting, I saw another late lockup in combination with a total system freeze while testing some apps. I have a copy of the lockup message here (before the system hung up), but it looks quite similar than the others in this thread, perhaps with some different hex numbers.

0x11111111 and
0x55555555 and
0x77777777:
-> These are not locking the GPU or crashing the system immediately, but this would require more testing to be sure. Unfortunately, 

0x11111111:
This setting produced lots of glitches regarding transparency. The taskbar was invisible, just the icons were rendered. Would not recommend to use that.

0x55555555 and
0x77777777:
No crashes, no lockups, NEARLY perfect. There are some minor gliches with transparency, that appear if you active a context menu (right mouse button). Then, you only see the icons and the text, but not the window itself. If you hover the mouse over it (to trigger repainting), than the window appears as well. glxgears works.

Nice. I'm seeing a light at the end of the tunnel :-)
Great... many thanks... now lets finalize this. Currently I'm working with setting 0x55555555... ignoring the glitches, I'm already able to use the system! :-D

Regards,
Florian
Comment 25 Florian Evers 2011-11-24 05:20:30 UTC
0x77777777:
I saw a garbled screen with GPU lockup after rotating the screen by 90 degrees (krandr->rotate 90 degrees). I went back to 0x55555555, and here it works flawlessly.

Florian
Comment 26 Florian Evers 2011-11-28 01:38:41 UTC
Hi,

after some days of using the value 0x55555555, I can tell you that I did not see any GPU lockups anymore. Unfortunately, the remaining glitches and artifacts make the system unusable for a productive environment.

* If you "use" the system, after some minutes more and more glitches appear. For example, the taskbar converts to "noise-only". Then, more and more widgets start to contain noise, and you'll have to shutdown X and restart it.

* Screenshots do not work. The created PNG is just black.

* The "RMB-context menu" shows only text and icons, but no surrounding window.

* You see some icons in the taskbar, but not all.

* If you place your mouse cursor over an icon in the taskbar, the icon disappears.

* Shutting down KDE crashes "kwin"... might be a different issue.

The goodies:
* KDE works
* Transparency and the desktop effects are working :-)
* I know that my graphics adapter is not defect :-D

Regards,
Florian
Comment 27 Alex Deucher 2011-11-28 08:49:22 UTC
Can you try 0x77553311?
Comment 28 Florian Evers 2011-11-29 00:29:02 UTC
Hi Alex,

0x77553311 does not create a GPU lockup, but there is no visible difference regarding the glitches I saw with 0x55555555.

* screen rotation works :-)
* no screenshots possible :-(
* didn't use the system long enough to discover elements containing just noise
* resizing of some windows (Konqueror, but not Firefox) results in a not-drawn background of some elements... these are drawn just black)
* The taskbar issues remain... some icons are still not rendered, but they are definitely there, other icons disapper while hovering, but reappear if the mouse cursor is leaving.

Florian
Comment 29 Florian Evers 2011-12-02 00:42:46 UTC
Some additional comments.

I never saw noisy widgets again since I use 0x77553311. But all the other glitches remain.

Then, I see random misbehavior of the whole system if I'm logged into X. For example, if I start an update (the Gentoo way, lots of compiling, lots of IO, lots of output), after some minutes the update process breaks with awkward errors such as segfaults. Then, in the syslog I see other non-related processes crashed as well. Looks like a thermal problem or bad RAM, but this only happens if X is started AND if I do all the update stuff using X (lots of scrolling involved, perhaps this causes a bug?). Memtest86+ didn't show any errors as well, and last night I rebuilt the hole system in a tty, without any of these issues.

For me, these misbehaves / system crashes seem to be related to X!

In the meantime, I downgraded to kernel 3.1.4, with the same patch applied (0x77553311). But the issues remain: Glitches and system misbehavior if X is used.

Florian
Comment 30 Florian Evers 2011-12-08 05:34:52 UTC
Hi,

for the moment, I had to switch back to fglrx. My system was not usable anymore, because of spurious misbehavior and some really bad crashes (only if I was working with X).

As the aforementioned issues remain, but the GPU lockup of this bug report is "kind of" solved now, what has to happen next? Is this misbehavior still related to the kernel, or more related to some userspace component, for example mesa?

At least, graphics cards equipped with a BARTS LE chipset (all HD 6790) seem not to be supported now by the free driver. Perhaps that should be stated somewhere.

I'm looking forward to help you with further testing!

Regards,
Florian
Comment 31 Alex Deucher 2011-12-08 06:03:55 UTC
Do you experience the problems with any desktop environments besides kde?
Comment 32 Florian Evers 2011-12-08 06:20:18 UTC
Hi Alex,

I'm currently not using any other DE than KDE. However, I can install GNOME or something similar, for testing purposes, if you think that helps nailing this bug down.

Today, I compiled the new beta2 of KDE 4.8.0. This version shows the same artifacts as KDE 4.7.3, which I used before. Even if I create a new user, the same issues appear again.

If I switch to fglrx, everything is working as expected. Screenshots work, and no glitches appear. And there are no more crashes. So, it's not a hardware/temperature issue, but definitely related to the KMS-based graphics stack.

Florian
Comment 33 Alex Deucher 2011-12-08 06:47:50 UTC
If you could that would be great.  KDE tends to be more problematic on the open source drivers than other desktops.  If another desktop works ok, then the problem may be in the 3D driver rather than the kernel.
Comment 34 Florian Evers 2011-12-08 10:40:41 UTC
Hmm, this is weird. After sucessfully installing Gnome 3.2, I made these observations:

Now I do not "see" the mouse cursor anymore. The mouse cursor is "just invisible", but it is still there. But it is not rendered anymore, not in kdm, and not in gdm. If I start X via "startx" as root (then twm is used), I see a mouse cursor, but it is corrupted. If I log into KDE/Gnome by using kdm or gdm, and I move the invisible mouse around, I see some animations happening on the screen.

Unfortunately, there are still lots of glitches in KDE as well as in Gnome 3.2. I see lots of artifacts. Even in twm (plain startx) things are not rendered 100% correctly. However, if I look at KDE now, things seem to be worse than a few hours ago before I installed Gnome and before I lost my mouse cursor. Any idea whats going on here?

I'm really confused because I do not see the point why or when the mouse cursor disappeared. I still do not use a xorg.conf, the kernel is the same, and the kernel command line is the same. I remember that installing gnome required some changes regarding the use-flags of my Gentoo-based box, but if I look at the list of installed and reinstalled packets I do not see a candidate for this behavior.

Fiddeling with the HWCursor-Option in a xorg.conf didn't work either.

I'm using the newest mesa, libdrm and xf86-video-ati.

Any ideas?
Florian
Comment 35 Alex Deucher 2011-12-08 11:28:45 UTC
If you are getting rendering errors, it's possible the cursor image was rendered as garbage.  Can you try this patch in addition to the backend_map changes?

diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c
index 3e8054c..4d66a05 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -1832,7 +1832,7 @@ static void evergreen_gpu_init(struct radeon_device *rdev)
                rdev->config.evergreen.sc_earlyz_tile_fifo_size = 0x130;
                break;
        case CHIP_BARTS:
-               rdev->config.evergreen.num_ses = 2;
+               rdev->config.evergreen.num_ses = 1;
                rdev->config.evergreen.max_pipes = 4;
                rdev->config.evergreen.max_tile_pipes = 8;
                rdev->config.evergreen.max_simds = 7;
Comment 36 Florian Evers 2011-12-09 00:43:12 UTC
Hi Alex,

with that additional patch, the kernel doesn't even boot, but stops with a totally black screen just after having been started by grub. I never see any printouts of the kernel... so I can not tell you whether there was any interesting output that I was not able to read.

I tested this with gentoo-sources-3.1.4 and with git-sources-3.2.0-rc4. Both kernels show the same result.

Florian
Comment 37 Florian Evers 2011-12-12 08:22:52 UTC
Hi Alex,

today I searched the web to find some specs regarding the BARTS LE chipset of the HD 6790 cards. I found three references that contain useful information:

http://ht4u.net/reviews/2011/amd_radeon_hd_6790_barts_le_im_test/index2.php

http://www.nordichardware.com/news/71-graphics/42743-exclusive-radeon-hd-6790-based-on-barts-le-with-800-sps.html

http://www.tomshardware.de/Radeon-HD6790-Test-Benchmark-Review,testberichte-240757-2.html

I collected all useful specs and can give you a summary now. For a Radeon HD 6790 with BARTS LE, these specs are given:

Stream processors (Vec5): 160
Shader ALUs: 800 = 160*Vec5
Shader type: Vec5 (Co Issue 1:1:1:1:1)
Capabilities per Shader: MADD
Double Precision: no
Texture Units (TMUs): 40
Raster Operation (ROP): 16
Shader Model Version: 5.0
DirectX 11
Output: 2 Render Backends (instead of 4), with 8 ROPs each

SIMD-Cluster (Streaming Multiprocessor) "SM-Units": 10
per SIMD-Cluster: 16 Vec5 ALUs

Per SIMD-Cluster: 1 Quad-TMU (=4 TMUs), 40 in total


If I look at the patched file evergreen.c, I see that the specs do not match yet. For example, the number of SIMDS is defined as 5 in the source, but the references tell me that the BARTS LE has 10.

Additionally, I tried 3.2-rc5 today, showing the same does-not-boot behavior as before. However, setting num_ses to "1" makes sense in terms of calculating max_backends as num_ses * 2, as the BARTS LE has only 2 backends available. Unfortunately, it doesn't work here with a setting of "1" for num_ses.

I can't assign the other variables in the source code, as I have no idea what "max_pipe", "max_tile_pipes" and other statements such as the famous "backend_map" mean...

Hope you can correct some of the settings? :-)

Regards,
Florian
Comment 38 Alex Deucher 2011-12-12 09:05:41 UTC
(In reply to comment #37)
> If I look at the patched file evergreen.c, I see that the specs do not match
> yet. For example, the number of SIMDS is defined as 5 in the source, but the
> references tell me that the BARTS LE has 10.
> 

5 SIMDs per SE (Shader Engine) and it has 2 SEs for a total of 10 SIMDs.

> Additionally, I tried 3.2-rc5 today, showing the same does-not-boot behavior as
> before. However, setting num_ses to "1" makes sense in terms of calculating
> max_backends as num_ses * 2, as the BARTS LE has only 2 backends available.
> Unfortunately, it doesn't work here with a setting of "1" for num_ses.
> 
> I can't assign the other variables in the source code, as I have no idea what
> "max_pipe", "max_tile_pipes" and other statements such as the famous
> "backend_map" mean...
> 

max_pipes is the number of quad pipes and backend map associates quad pipes with render backends.
Comment 39 Alex Deucher 2011-12-12 09:33:14 UTC
Created attachment 54366 [details] [review]
possible fix

Does this patch help?
Comment 40 Florian Evers 2011-12-12 12:36:32 UTC
Hi Alex,

thanks for your patch, but as I'm not at work in the moment and I only have remote access to my box, I was already able to patch the kernel but not able to reboot it until tomorrow in the morning.

However, your patch looks a little bit strange ;-) If I apply it, it duplicates two rows of code resulting in:

                rdev->config.evergreen.max_simds = 7;
                rdev->config.evergreen.max_backends = 4 * rdev->config.evergreen.num_ses;
                rdev->config.evergreen.max_simds = 7;
                rdev->config.evergreen.max_backends = 4 * rdev->config.evergreen.num_ses;

... which is the same thing twice. And, the "old" if statement that selects between a BARTS and a BARTS LE, is gone now. Is this intentional?

Thanks,
Florian
Comment 41 Alex Deucher 2011-12-12 12:41:55 UTC
(In reply to comment #40)
> 
>                 rdev->config.evergreen.max_simds = 7;
>                 rdev->config.evergreen.max_backends = 4 *
> rdev->config.evergreen.num_ses;
>                 rdev->config.evergreen.max_simds = 7;
>                 rdev->config.evergreen.max_backends = 4 *
> rdev->config.evergreen.num_ses;
> 

copy paste error.  It should only be there once.  Shouldn't hurt anything though.

> ... which is the same thing twice. And, the "old" if statement that selects
> between a BARTS and a BARTS LE, is gone now. Is this intentional?

Yes.
Comment 42 Alex Deucher 2011-12-12 12:51:56 UTC
Created attachment 54374 [details] [review]
possible fix

Here's a better version.
Comment 43 Florian Evers 2011-12-13 00:43:16 UTC
Hi Alex,

now I was able to apply your patch again and to reboot. Interestingly, kdm started again without any problems, and the mouse cursor was visible again. However, if I log into the system with a newly created test user (KDE), then the mouse cursor immediately vanishes after login (while loading the KDE components). I already switched to some other mouse cursor scheme, but no chance.

All of the aformentioned glitches remain... some icons are not rendered, some appear/disappear only when hovered, some edges of some windows are missing, and no screenshots are possible.

Other window managers / desktop environments do not render correctly as well yet.

Thanks,
Florian
Comment 44 cyberbat 2012-03-06 14:29:49 UTC
Hello! 
Any progress here?
I fully confirm this bug on Radeon HD 6790:

lspci | grep -i radeon
02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Barts LE [AMD Radeon HD 6700 Series]

getting Mar  6 16:54:29 cybernest kernel: [ 8065.765045] radeon 0000:02:00.0: GPU lockup CP stall for more than 10020msec
Mar  6 16:54:29 cybernest kernel: [ 8065.765051] GPU lockup (waiting for 0x0000000F last fence id 0x00000001)
Mar  6 16:54:29 cybernest kernel: [ 8065.766132] radeon 0000:02:00.0: GPU softreset 
Mar  6 16:54:29 cybernest kernel: [ 8065.766134] radeon 0000:02:00.0:   GRBM_STATUS=0xB1403828
Mar  6 16:54:29 cybernest kernel: [ 8065.766135] radeon 0000:02:00.0:   GRBM_STATUS_SE0=0x28000007
Mar  6 16:54:29 cybernest kernel: [ 8065.766136] radeon 0000:02:00.0:   GRBM_STATUS_SE1=0x28000007
Mar  6 16:54:29 cybernest kernel: [ 8065.766138] radeon 0000:02:00.0:   SRBM_STATUS=0x200000C0
Mar  6 16:54:29 cybernest kernel: [ 8065.766147] radeon 0000:02:00.0:   GRBM_SOFT_RESET=0x00007F6B
Mar  6 16:54:29 cybernest kernel: [ 8065.766250] radeon 0000:02:00.0:   GRBM_STATUS=0x00003828
Mar  6 16:54:29 cybernest kernel: [ 8065.766251] radeon 0000:02:00.0:   GRBM_STATUS_SE0=0x00000007
Mar  6 16:54:29 cybernest kernel: [ 8065.766252] radeon 0000:02:00.0:   GRBM_STATUS_SE1=0x00000007
Mar  6 16:54:29 cybernest kernel: [ 8065.766254] radeon 0000:02:00.0:   SRBM_STATUS=0x200000C0
Mar  6 16:54:29 cybernest kernel: [ 8065.767257] radeon 0000:02:00.0: GPU reset succeed
Mar  6 16:54:29 cybernest kernel: [ 8065.791860] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
Mar  6 16:54:29 cybernest kernel: [ 8065.791955] radeon 0000:02:00.0: WB enabled
Mar  6 16:54:29 cybernest kernel: [ 8065.808304] [drm] ring test succeeded in 3 usecs
Mar  6 16:54:29 cybernest kernel: [ 8065.808313] [drm] ib test succeeded in 3 usecs

in logs.

I have no skill in coding, but I really wish to help with something if I can. I have tried kernels: 3.1.10, 3.2.1, 3.2.6, 3.2.9, 3.3rc6. Issue remains on them all.

I don't like fglrx. :( Please help.
Comment 45 cyberbat 2012-03-06 15:07:58 UTC
I'm using gentoo linux x86-64. I've just tried development version of drivers from git. Problem still exists.
Comment 46 Florian Evers 2012-03-15 15:01:39 UTC
Hi Alex,

do you have any idea what's wrong regarding the Barts LE chipset? If you have a new patch for us, I'm looking forward to help you testing it :-)

As I'm running Gentoo Linux on my box, it's no problem for me to run any version (even fetched from git) regarding kernel, libdrm, ati-drivers and mesa.

Thank you very much,
Florian
Comment 47 Alex Deucher 2012-03-22 16:41:09 UTC
Created attachment 58893 [details] [review]
possible fix

Does this patch help?
Comment 48 Tomasz 2012-03-22 19:24:02 UTC
(In reply to comment #47)
> Created attachment 58893 [details] [review] [review]
> possible fix
> 
> Does this patch help?

patch fails to apply cleanly to 3.3.0 and 3.2.12 trees.
patch applies applies cleanly to todays (as at 2012-03-23 13:15 AEDT) linus git tree

testing when kernel built.
Comment 49 Tomasz 2012-03-22 19:56:39 UTC
(In reply to comment #48)
> (In reply to comment #47)
> > Created attachment 58893 [details] [review] [review] [review]
> > possible fix
> > 
> > Does this patch help?
> 
> patch fails to apply cleanly to 3.3.0 and 3.2.12 trees.
> patch applies applies cleanly to todays (as at 2012-03-23 13:15 AEDT) linus git
> tree
> 
> testing when kernel built.

Testing show same issue as before. Horizontal balck and white lines with some artefacting. Can provide Xorg.log and kernel log if needed
Comment 50 Tomasz 2012-03-24 00:04:30 UTC
(In reply to comment #49)
> (In reply to comment #48)
> > (In reply to comment #47)
> > > Created attachment 58893 [details] [review] [review] [review] [review]
> > > possible fix
> > > 
> > > Does this patch help?
> > 
> > patch fails to apply cleanly to 3.3.0 and 3.2.12 trees.
> > patch applies applies cleanly to todays (as at 2012-03-23 13:15 AEDT) linus git
> > tree
> > 
> > testing when kernel built.
> 
> Testing show same issue as before. Horizontal balck and white lines with some
> artefacting. Can provide Xorg.log and kernel log if needed


from kernel logs - not sure if this helps to debug at all:


[   64.738533] GPU lockup (waiting for 0x0000000F last fence id 0x00000001)
[   64.739604] radeon 0000:01:00.0: GPU softreset 
[   64.739606] radeon 0000:01:00.0:   GRBM_STATUS=0xF5400828
[   64.739608] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xE8000001
[   64.739610] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xE8000001
[   64.739612] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   64.739624] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[   64.739726] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[   64.739728] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[   64.739730] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[   64.739732] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   64.740727] radeon 0000:01:00.0: GPU reset succeed
[   64.778892] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[   64.778986] radeon 0000:01:00.0: WB enabled
[   64.778989] [drm] fence driver on ring 0 use gpu addr 0x40000c00 and cpu addr 0xffff88040699cc00
[   64.795103] [drm] ring test on 0 succeeded in 2 usecs
[   64.795122] [drm] ib test on ring 0 succeeded in 3 usecs
[   75.461595] radeon 0000:01:00.0: GPU lockup CP stall for more than 10024msec
[   75.461599] GPU lockup (waiting for 0x00000012 last fence id 0x0000000F)
[   75.462687] radeon 0000:01:00.0: GPU softreset 
[   75.462690] radeon 0000:01:00.0:   GRBM_STATUS=0xF5401828
[   75.462693] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xE8000003
[   75.462695] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xE8000003
[   75.462697] radeon 0000:01:00.0:   SRBM_STATUS=0x20000AC0
[   75.462725] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[   75.462829] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[   75.462831] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[   75.462834] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[   75.462836] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   75.463830] radeon 0000:01:00.0: GPU reset succeed
[   75.502158] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[   75.502252] radeon 0000:01:00.0: WB enabled
[   75.502254] [drm] fence driver on ring 0 use gpu addr 0x40000c00 and cpu addr 0xffff88040699cc00
[   75.518370] [drm] ring test on 0 succeeded in 2 usecs
[   75.518389] [drm] ib test on ring 0 succeeded in 3 usecs
[   86.264606] radeon 0000:01:00.0: GPU lockup CP stall for more than 10020msec
[   86.264609] GPU lockup (waiting for 0x00000014 last fence id 0x00000012)
[   86.265698] radeon 0000:01:00.0: GPU softreset 
[   86.265700] radeon 0000:01:00.0:   GRBM_STATUS=0xF5401828
[   86.265703] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0xE8000003
[   86.265705] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0xE8000003
[   86.265707] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   86.265735] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
[   86.265839] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
[   86.265841] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
[   86.265843] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
[   86.265845] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
[   86.266840] radeon 0000:01:00.0: GPU reset succeed
[   86.305154] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[   86.305248] radeon 0000:01:00.0: WB enabled
[   86.305250] [drm] fence driver on ring 0 use gpu addr 0x40000c00 and cpu addr 0xffff88040699cc00
Comment 51 Alex Deucher 2012-05-23 12:10:35 UTC
Created attachment 62030 [details] [review]
possible fix

another patch to try.
Comment 52 Jerome Glisse 2012-05-24 12:12:35 UTC
Created attachment 62067 [details] [review]
Fix backend map

This patch should definitely fix your issue, it works for me on my HD6790.
Comment 53 Tomasz 2012-05-24 13:37:40 UTC
(In reply to comment #52)
> Created attachment 62067 [details] [review] [review]
> Fix backend map
> 
> This patch should definitely fix your issue, it works for me on my HD6790.



Before I start building and testing:

Jerome is your patch cumulative with Alex's patch, or separate?
which kernel series is need for this patch?

Tomasz
Comment 54 Alex Deucher 2012-05-24 14:02:47 UTC
(In reply to comment #53)
> Before I start building and testing:
> 
> Jerome is your patch cumulative with Alex's patch, or separate?
> which kernel series is need for this patch?

The patches are independent.  They are against drm-next, but should apply easily to any recent kernel.
Comment 55 Jerome Glisse 2012-05-24 15:04:24 UTC
I don't think Alex's patch is the right thing to do. At least not according to what fglrx is doing.
Comment 56 Tomasz 2012-05-25 16:40:58 UTC
(In reply to comment #55)

Jerome 

> I don't think Alex's patch is the right thing to do. At least not according to
> what fglrx is doing.

kernel 3.4.0 + your patch  = same issue as currently experienced.

From kernel.log:

May 26 08:42:00 redoubt kernel: [  245.453140] [drm:radeon_dp_get_link_status] *ERROR* displayport link status failed
May 26 08:42:00 redoubt kernel: [  245.453144] [drm:radeon_dp_link_train_cr] *ERROR* clock recovery failed
May 26 08:42:13 redoubt kernel: [  259.029140] radeon 0000:01:00.0: GPU lockup CP stall for more than 10020msec
May 26 08:42:13 redoubt kernel: [  259.029145] GPU lockup (waiting for 0x0000000F last fence id 0x00000001)
May 26 08:42:13 redoubt kernel: [  259.030214] radeon 0000:01:00.0: GPU softreset 
May 26 08:42:13 redoubt kernel: [  259.030216] radeon 0000:01:00.0:   GRBM_STATUS=0xB1403828
May 26 08:42:13 redoubt kernel: [  259.030218] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x28000007
May 26 08:42:13 redoubt kernel: [  259.030220] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x28000007
May 26 08:42:13 redoubt kernel: [  259.030222] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
May 26 08:42:13 redoubt kernel: [  259.030233] radeon 0000:01:00.0:   GRBM_SOFT_RESET=0x00007F6B
May 26 08:42:13 redoubt kernel: [  259.030336] radeon 0000:01:00.0:   GRBM_STATUS=0x00003828
May 26 08:42:13 redoubt kernel: [  259.030338] radeon 0000:01:00.0:   GRBM_STATUS_SE0=0x00000007
May 26 08:42:13 redoubt kernel: [  259.030340] radeon 0000:01:00.0:   GRBM_STATUS_SE1=0x00000007
May 26 08:42:13 redoubt kernel: [  259.030342] radeon 0000:01:00.0:   SRBM_STATUS=0x200000C0
May 26 08:42:13 redoubt kernel: [  259.031336] radeon 0000:01:00.0: GPU reset succeed
May 26 08:42:13 redoubt kernel: [  259.052819] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
May 26 08:42:13 redoubt kernel: [  259.052908] radeon 0000:01:00.0: WB enabled
May 26 08:42:13 redoubt kernel: [  259.052910] [drm] fence driver on ring 0 use gpu addr 0x40000c00 and cpu addr 0xffff8804056e6c00
May 26 08:42:13 redoubt kernel: [  259.069067] [drm] ring test on 0 succeeded in 2 usecs
May 26 08:42:13 redoubt kernel: [  259.069085] [drm] ib test on ring 0 succeeded in 3 usecs
Comment 57 Mengjiao Lu 2012-05-25 22:57:00 UTC
I have the same issue with my HD 6790. 

Thanks very much for your patch, but sadly it does not work for me. 
I patched sources of kernel 3.3.7, but the black and white stripes are still there. 


(In reply to comment #52)
> Created attachment 62067 [details] [review] [review]
> Fix backend map
> 
> This patch should definitely fix your issue, it works for me on my HD6790.
Comment 58 Tomasz 2012-05-26 03:14:04 UTC
(In reply to comment #51)
> Created attachment 62030 [details] [review] [review]
> possible fix
> 
> another patch to try.

This patch was tier with kernel 3.4.0 

When system is booting, and going into KMS on radeon driver no output on the screen (backlight goes to full power and screen is blank)

No kernel .log available. machine locks up hard, and need to be power-cycled
Comment 59 Alex Deucher 2012-05-31 15:59:52 UTC
Do the kernel patches here help?
http://people.freedesktop.org/~agd5f/backendmap/working/
Comment 60 Florian Evers 2012-06-03 13:33:41 UTC
Hi Alex,

I'm sorry to say that I'm not able to apply patch 0001-drm-radeon-fix-bank-information-in-tiling-config.patch to a vanilla 3.4.0 kernel here:

x66 linux-3.4.0 # pwd
/usr/src/linux-3.4.0
x66 linux-3.4.0 # patch -p 1 < /root/patches-20120503/0001-drm-radeon-fix-bank-information-in-tiling-config.patch 
patching file drivers/gpu/drm/radeon/evergreen.c
patching file drivers/gpu/drm/radeon/ni.c
Hunk #1 FAILED at 866.
1 out of 1 hunk FAILED -- saving rejects to file drivers/gpu/drm/radeon/ni.c.rej
patching file drivers/gpu/drm/radeon/rv770.c
x66 linux-3.4.0 # cat drivers/gpu/drm/radeon/ni.c.rej
--- drivers/gpu/drm/radeon/ni.c
+++ drivers/gpu/drm/radeon/ni.c
@@ -866,9 +866,12 @@
        /* num banks is 8 on all fusion asics. 0 = 4, 1 = 8, 2 = 16 */
        if (rdev->flags & RADEON_IS_IGP)
                rdev->config.cayman.tile_config |= 1 << 4;
-       else
-               rdev->config.cayman.tile_config |=
-                       ((mc_arb_ramcfg & NOOFBANK_MASK) >> NOOFBANK_SHIFT) << 4;
+       else {
+               if ((mc_arb_ramcfg & NOOFBANK_MASK) >> NOOFBANK_SHIFT)
+                       rdev->config.cayman.tile_config |= 1 << 4;
+               else
+                       rdev->config.cayman.tile_config |= 0 << 4;
+       }
        rdev->config.cayman.tile_config |=
                ((gb_addr_config & PIPE_INTERLEAVE_SIZE_MASK) >> PIPE_INTERLEAVE_SIZE_SHIFT) << 8;
        rdev->config.cayman.tile_config |=
x66 linux-3.4.0 #

The other three patches applied without problems.

Regards,
Florian
Comment 61 Alex Deucher 2012-06-04 05:27:05 UTC
You can either grab this patch:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=1f73cca799d29df80de3e8f1f1c488485467577a
or just use Linus' git master.
Comment 62 Florian Evers 2012-06-04 09:36:51 UTC
It works! :-D :-D

Unfortunately, I failed to apply patch 0004- to any of my linux sources (something went wrong always...). But I was "brave" enough to test git-sources 3.5-rc1 directly, which already includes your 4 patches, and ... *whew*... it works!

Nice to see my second screen coming back to life, being able to use both for one big KDE desktop... really great!

Thank you very much! I'm really happy now :-)
Florian
Comment 63 Goulou 2012-06-04 14:49:42 UTC
Hi all, 

I'm adding my case to this bug, and hope that it will help you in fixing it. If I'm wrong and my case is a separate bug, fill free to tell me so and I will fill one.
I've seen similar "noisy" lines ever since I bought my card and used it with the opensource driver (let's say it's been 1 and half year, sorry for not filling a bug earlier...).
I just tested with kernel-3.5.0-0.rc1.git0.1.fc18.x86_64.rpm, taken from the rawhide packages of Fedora, on a fully updated Fedora17 box. The bug is still present.
The system booted fine, but X started with the "standard" driver (i.e. no acceleration). From the console, lsmod did not show any radeon module loaded. I then manually loaded radeon.ko from the module directory, and the screen went OFF (immediately).
I switched (Ctrl+Alt+F2) to another console, issued "systemctl isolate multi-user.target", then "systemctl isolate graphical.target" in order to reload X, and the picture became as in the following screenshot.
I could still go back to my console, where I saw that the kernel was printing lots of backtrace. According to log timing, the backtrace is printed every 10 seconds, and I receives hundreds of time the same trace (see attachement).
Since I could still use the console, I safely removed the package with rpm, and rebooted.

I'm attaching :
-screenshot of the bug
-/var/log/message, starting at the time of "insmod radeon.ko" (262s). You can see what happen when I entered "systemctl isolate multi-user.target" and "isolate graphical.target", I think it was shortly afterward, and at 302s, see the "GPU lockup CP stall for more than 10000msec")
Then, few seconds later, one of the very many backtrace dump...
Comment 64 Goulou 2012-06-04 14:51:39 UTC
Created attachment 62543 [details]
blank and noisy lines
Comment 65 Goulou 2012-06-04 14:52:23 UTC
Created attachment 62544 [details]
/var/log/messages during the different steps : lockup and backtrace dump later
Comment 66 Goulou 2012-06-04 14:53:02 UTC
Created attachment 62545 [details]
lspci
Comment 67 Mengjiao Lu 2012-06-05 04:36:07 UTC
(In reply to comment #59)
> Do the kernel patches here help?
> http://people.freedesktop.org/~agd5f/backendmap/working/

Hi Alex,

It works great! 
Thanks very much for your fantastic work!

Mengjiao Lu
Comment 68 Jerome Glisse 2012-06-05 07:47:14 UTC
Goulou, one person, one bug, we will mark as duplicate if we think it's. Closing this one as fix should soon pop up in stable kernel.

Goulou you should not load the radeon module on your own especialy not with an Xorg running this is asking for bad things to happen. Default fedora installation should work out of the box make sure you don't have any option such as nomodeset.
Comment 69 Goulou 2012-06-05 08:05:49 UTC
Allright, sorry for the noise.
Comment 70 Tomasz 2012-06-10 07:22:16 UTC
confirmed fixed, with kernel 3.5.0-rc2
Comment 71 Florian Mickler 2012-07-01 03:51:41 UTC
A patch referencing this bug report has been merged in Linux v3.5-rc1:

commit 95c4b23ec4e2fa5604df229ddf134e31d7b3b378
Author: Jerome Glisse <jglisse@redhat.com>
Date:   Thu May 31 19:00:24 2012 -0400

    drm/radeon: fix HD6790, HD6570 backend programming