Summary: | [bisected] Kernel 3.5.0 breaks KMS on Radeon RV250 | ||
---|---|---|---|
Product: | DRI | Reporter: | Andrea <mariofutire> |
Component: | DRM/Radeon | Assignee: | Christian König <ckoenig.leichtzumerken> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | major | ||
Priority: | medium | CC: | skitching |
Version: | unspecified | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
See Also: |
https://bugs.freedesktop.org/show_bug.cgi?id=51344 https://bugzilla.redhat.com/show_bug.cgi?id=845639 https://bugs.freedesktop.org/show_bug.cgi?id=54662 |
||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Created attachment 66188 [details]
Other example of bad rendering
Does kernel patch : http://people.freedesktop.org/~glisse/0001-drm-radeon-extra-type-safe-for-fence-emission.patch Helps ? Also can you test if booting with radeon.no_wb=1 fix the issue ? (In reply to comment #3) > Also can you test if booting with radeon.no_wb=1 fix the issue ? this did not make any difference (tested on v3.6-rc3 where the problems still exists) (In reply to comment #2) > Does kernel patch : > > http://people.freedesktop.org/~glisse/0001-drm-radeon-extra-type-safe-for-fence-emission.patch > > Helps ? no difference. I also had graphics corruption with a Radeon Mobility X1600 (RV350), and bisected to exactly the same two patches: bb63556 -- hangs on start of Plymouth 3b7a2b2 -- plymouth works again, but graphics corrupted. The symptoms were somewhat different than described here: I got shimmery 70s paisley patterns rather than "black rectangles". The problem continues up to 3.5.3. However 3.6.0-rc4 fixes the issue - graphics appear to work fine again. (In reply to comment #6) > The problem continues up to 3.5.3. > However 3.6.0-rc4 fixes the issue - graphics appear to work fine again. Can you bisect to see what commit fixed the issue? (In reply to comment #6) > I also had graphics corruption with a Radeon Mobility X1600 (RV350), and > bisected to exactly the same two patches: > bb63556 -- hangs on start of Plymouth > 3b7a2b2 -- plymouth works again, but graphics corrupted. > > The symptoms were somewhat different than described here: I got shimmery 70s > paisley patterns rather than "black rectangles". > > > The problem continues up to 3.5.3. > However 3.6.0-rc4 fixes the issue - graphics appear to work fine again. Not here. I've just tried 3.6-rc4 and I get the same corruptions. Sorry, have to take back that comment about 3.6-rc4+ working; I'm now getting the "black screen" problem consistently. I was definitely running the right kernel ("uname -a" was reporting 3.6-rc4+) and can only think that I accidentally fat-fingered the keyboard and selected the grub "recovery" option (ie with "nomodeset"). In short: 3.6-rc4+ just boots to a totally black screen for me, due to something merged in the 3.6-rc series. I've bisected this, and raised a separate bug (54662) for it. Interestingly, that commit is *also* about "radeon fence" handling. I presume that this bug (54129) is still also present and lurking underneath the black screen - but obviously I can't test that. Does X load ok if you disable acceleration: Option "NoAccel" "TRUE" in the device section of your xorg.conf? Created attachment 66876 [details] [review] Possible fix Please try the attached patch. Also please supply the output of "sudo cat /sys/kernel/debug/dri/0/radeon_fence_info" with and without this patch. Thx, Christian. > Please try the attached patch.
>
> Also please supply the output of "sudo cat
> /sys/kernel/debug/dri/0/radeon_fence_info" with and without this patch.
>
Ok, good news - the patch resolves both this bug and #54662.
* radeon_fence_info output from standard ubuntu kernel (3.2.0-29):
Last signaled fence 0x000037E7
* radeon_fence_info output from version 876dc9f3^ (ie last version showing the "corrupted graphics" output, before I hit the patch that just makes the screen go black):
--- ring 0 ---
Last signaled fence 0x00000001000000cd
Last emitted 0x00000000000000cd
* radeon_fence_info output from current head version (3.6.0-rc5+) with patch "make 64bit fences more robust" applied:
--- ring 0 ---
Last signaled fence 0x0000000100000cea
Last emitted 0x0000000100000cea
Note that the patch does not apply to 3.5.3, nor 876dc9f3^ so I didn't test it against anything but current master head.
Alex: sorry, but Ubuntu doesn't usually have an xorg.conf file anymore AFAIK. I tried to generate one with "sudo Xorg -configure" but that just reported "Fatal server error: Server is already active", and I'm not sure how else to generate a base xorg.conf file to then modify.
(In reply to comment #12) > Ok, good news - the patch resolves both this bug and #54662. Having a workaround for the problem doesn't explain why the heck the counter is going backwards! Either I'm missing something important in the algorithm or gcc is strangely shuffling the code around, maybe we should add a read memory barrier in radeon_fence_read. Just in case: You're not working on a time machine or encountered a temporal anomaly recently? Christian. >
> Having a workaround for the problem doesn't explain why the heck the counter is
> going backwards!
>
> Just in case: You're not working on a time machine or encountered a temporal
> anomaly recently?
No temporal problems here - breakfast, lunch, dinner still occurring in the regular order :-).
Isn't the problem simply that the top 32 bits of the emitted counter are being discarded on this 32-bit machine, ie that "signalled" is 64-bits, but pre-patch "emitted" was having its upper 32-bits cleared?
http://people.freedesktop.org/~glisse/0001-debug-fence-emission-reception.patch Could you please boot with attached patch and without the fixing patch. Just boot in runlevel 3 so you only have plymouth. Then save dmesg, dmesg > fencedebug.txt and attach it to the bug. It will help to understand what's going on. http://people.freedesktop.org/~glisse/0001-debug-fence-emission-reception.patch Could you please boot with attached patch and without the fixing patch. Just boot in runlevel 3 so you only have plymouth. Then save dmesg, dmesg > fencedebug.txt and attach it to the bug. It will help to understand what's going on. Simon, BTW, you can make a minimal /etc/X11/xorg.conf like this: Section "Device" Identifier "my-radeon-card" Option "NoAccel" "TRUE" EndSection Ok, results of testing the 0001-debug-fence patch are as follows. Kernel version built is master branch of linus' tree, commit 55d512e2 (3.6.0-rc5) plus *only* the debug patch. == test 1 Booting with * /usr/share/X11/xorg.conf.d/50-mydevice.conf setting NoAccel to TRUE * grub kernel commandline of "root=UUID=... ro quiet splash $vt_handoff" resulted in a working graphical system and the full dmesg output is attached as file "dmesg-debug-noaccel.txt". However the important bits appear to be: [ 2.578335] [drm] rfence(R) 0x7557effd [ 2.597213] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000010000000 and cpu addr 0xffca8000 [ 2.682994] [drm] efence 0x00000001 [ 2.976116] [drm] rfence(M) 0x00000001 [ 16.799294] [drm] efence 0x00000002 Note that Ubuntu runlevel 2 by defaults boots to graphics mode, and levels 3..5 are identical to level 2. See: http://www.debianadmin.com/debian-and-ubuntu-linux-run-levels.html Note also that there were no further "[drm]" messages in dmesg even after using the system for a few minutes. == test 2 Booting *without* the xorg.conf.d/50-mydevice.conf file (ie *without* overriding NoAccel) and with the above kernel commandline resulted in a black screen. == test 3 Booting *without* the xorg.conf.d/50-mydevice.conf file (ie *without* overriding NoAccel) and with "3" appended to the kernel commandline also resulted in a black screen. == test 4 Booting with "text" appended to the kernel commandline resulted in plymouth completing and then switching to a working text-mode system. The dmesg output is attached as file "dmesg-debug-text.txt". The important bits are similar to the "noaccel" case. == test 5 Booting with "nomodeset" resulted in a working graphics system. The dmesg output had no "drm" entries, and did not have any of the added "fence" debug output. I hope this sheds some light - and thanks for looking into this issue! Created attachment 67002 [details]
dmesg with debug patch, with xorg conf setting NoAccel=TRUE
Created attachment 67003 [details]
dmesg with debug patch, with "text" appended to kernel commandline
Created attachment 67005 [details] dmesg output with debug patch. normal login into KDE (In reply to comment #16) > http://people.freedesktop.org/~glisse/0001-debug-fence-emission-reception.patch > > Could you please boot with attached patch and without the fixing patch. Just > boot in runlevel 3 so you only have plymouth. Then save dmesg, dmesg > > fencedebug.txt and attach it to the bug. It will help to understand what's > going on. Here is my dmseg. Generated in Fedora 17 after logging into KDE and noticing the usual black artefacts (no runlevel 3). The patch has been applied to v3.6-rc5 without the other patch at comment 11. BTW, if I apply the patch at comment 11, everything seems to work properly. A patch has been merged into Linus' tree for 3.6-rc5+: commit f492c171a38d77fc13a8998a0721f2da50835224 Author: Christian König <deathsimple@vodafone.de> Date: Thu Sep 13 10:33:47 2012 +0200 drm/radeon: make 64bit fences more robust v3 ... The intention of this patch is to make fences as robust as they where before introducing 64bit fences. This is necessary because on older systems it looks like the fence value gets corrupted on initialization. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=51344 Should also fix: https://bugs.freedesktop.org/show_bug.cgi?id=54129 https://bugs.freedesktop.org/show_bug.cgi?id=54662 It does indeed seem to resolve this bug 54129 (and 54662) for me. System boots fine (without messing with NoAccel=TRUE or nomodeset). Suspend/resume also work fine. And dmesg output looks fine. Thanks Christian/Jerome! So far so good. Thank you guys. Any objections to closing this bug now? |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 66187 [details] Screenshot of bad rendering I run Fedora 17 and since they have shipped a kernel 3.5.X I have a lot of artefacts when I log in KDE. Kernel 3.4.6 works ok. My hardware in a Thinkpad laptop with a 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon RV250 [Mobility FireGL 9000] (rev 02) and I load the R200 microcode. Basically, as soon as I log on KDE I have a lot of rectangular areas which are left black. They are 100% reproducible always with the same pattern (at least in the few seconds before I logoff again), and they move around when I click or windows are displayed. If I pass the option radeon.modeset=0 to the kernel (in grub) there are no artefacts, but of course XV support is not there so this is not really an option as video player struggle a lot. I managed to bisect the issue to the following commits: bad ========= 3b7a2b2 drm/radeon: rework fence handling, drop fence list v7 skip ======== bb63556 drm/radeon: convert fence to uint64_t v4 good ======== d6999bc drm/radeon: replace the per ring mutex with a global one "skip" here means that the kernel does not boot: after the linux penguin logo is displayed on the top left of the screen, nothing else happens, even though I am able to reboot pressing Ctrl-Alt-Del. So there are 2 commits that could be responsible. Please, let me know if there is anything I can provide on top of that.