Bug 25406

Summary: fonts garbled after resuming from suspend since 6729b508
Product: xorg Reporter: Petar Velkovski <pvelkovski>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: arekm, daniel, dark.shadow, kai.kasurinen, pvelkovski
Version: 7.4 (2008.09)   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg log
none
Screenshot 1 of the desktop
none
Screenshot taken from an application
none
kernel patch to fix fence flushing none

Description Petar Velkovski 2009-12-02 14:20:12 UTC
After resuming from Suspend, some of the fonts displayed on the desktop are garbeled.

Last intel version to work correctly is xserver-xorg-video-intel_2.9.0+git20091111.dbb68168

I cannot be 100% sure that this bug was introduced in dbb68168, because there were some other rendering problems between dbb68168 and dbb68168 reported and resolved in bug report 25031 at https://bugs.freedesktop.org/show_bug.cgi?id=25031, so I didn't test Suspend/Resume with them.

I am sending a screenshot of the desktop with the font rendering problems after resuming from Suspend. Hovering the mouse pointer over some of the garbled text elements or clicking on them sometimes helps, but not always (I presume those elements that are interactive are rerendred completely when highlighted and that fixes the text display). Sometimes going multiple times in Suspend and reuming is needed to reproduce the problem.

Also today I upgraded to 2.9.99.901~git20091202.ad68881b and this problem is still present.
Comment 1 Petar Velkovski 2009-12-02 14:23:50 UTC
Created attachment 31686 [details]
Xorg log

This Xorg log is taken with xserver-xorg-video-intel 2.9.99.901~git20091202.ad68881b but the bug was present in dbb68168
Comment 2 Petar Velkovski 2009-12-02 14:31:49 UTC
Sorry I had intel driiver versions mixed in my original post:

This is the correct info:

Last good working version: 2.9.0+git20091111.dbb68168
Bug most probably introduced in: 2.9.0+git20091130.2.6729b508
Bug still present in: 2.9.99.901~git20091202.ad68881b
Comment 3 Chris Wilson 2009-12-04 12:22:37 UTC
Petar, couple of questions...

1. The corruption like an incorrect glyph repeated across the entire screen, for example the letter A replaced by a 1? The screenshot will help answer this question.

2. Is suspend and resume a critical component in triggering this bug? Or does the corruption occur (eventually?) without a suspend?
Comment 4 Petar Velkovski 2009-12-04 12:48:08 UTC
Created attachment 31754 [details]
Screenshot 1 of the desktop

You can see the font scrambling at the top left Applications Places System", folder names, and it's really noticeable on the weather widget on the top right (it shows text until it fetches weather info)
Comment 5 Petar Velkovski 2009-12-04 12:50:46 UTC
Created attachment 31755 [details]
Screenshot taken from an application

To show the font garbling more obviously :)
Comment 6 Petar Velkovski 2009-12-04 12:56:38 UTC
This are some of my additional observations:

1. It helps A LOT to start multiple programs and then go to Suspend/Resume (sometimes more than once as mentioned previously) in order to reproduce the bug more easily
2. In Screenshot 1 the fonts in the Guake terminal are displayed normaly!
3. I somehow notice that some of the font letters pixels get "eaten" while other are shifted left or right (might be in other directions too)
Comment 7 Petar Velkovski 2009-12-04 13:00:36 UTC
Oh yes, Suspend/Resume is critical for reproducing the bug!
Comment 8 Chris Wilson 2009-12-07 03:39:51 UTC
Ok, it does appear that it is the blit into the glyph cache that is causing the corruption. (Every instance of a glyph bears the same corruption.)

I wonder if the suspend path is missing a MI_FLUSH?
Comment 9 Petar Velkovski 2009-12-07 11:37:08 UTC
Just tell me when a patch is submitted so that I can remind the xorg-edgers guys to build a new testing package. I can see that they already queued xserver-xorg-video-intel for building.
Comment 10 Petar Velkovski 2009-12-07 12:00:10 UTC
I just made testing with this upgrade:

xserver-xorg-video-intel (2:2.9.0+git20091111.dbb68168-0ubuntu0sarvatt) to 2:2.9.99.901+git20091204.415aab47-0ubuntu0tormod~karmic

and this is what I see:
1. The bug is still present but much harder to reproduce!
2. Now hovering/clicking the corrupted elements fixes the text display (as far as I could find corrupted text on the desktop)

Because of 2, and I say this as a normal user that has no knowledge of driver development, I am under impression that whatever is changed in 415aab47 it doesn't fix the source of the corruption, but just tries to mend things after it happened. But I'm waiting for next build which will happen in 3-4 hours. And I'll test again with that other version.

Comment 11 Petar Velkovski 2009-12-08 18:07:14 UTC
Today I tried with xserver-xorg-video-intel 2.9.99.901+git20091208.47416b1e-0ubuntu0tormod~karmic.

Still no progress. Same situation as in post #10.
Comment 12 Chris Wilson 2009-12-09 03:25:33 UTC
Arkadiusz Miskiewicz did some investigative work and identified

commit 3f11bbec420080151406c203af292e55177e77d1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Nov 29 21:39:41 2009 +0000

    uxa-glyphs: Enable TILING_X on glyph caches.
 
as the cause of the regression on his gm45.

Petar can you confirm reverting that patch "fixes" the issue for you?   
Comment 13 Chris Wilson 2009-12-09 03:27:51 UTC
*** Bug 25411 has been marked as a duplicate of this bug. ***
Comment 14 Petar Velkovski 2009-12-09 09:02:11 UTC
Chris how can I confirm that? Do I need the latest build of the driver? Do I need to install some previous build?
Comment 15 Arkadiusz Miskiewicz 2009-12-09 11:18:42 UTC
To test take the version that you were able to trigger corruption (possibly latest one), apply this patch http://carme.pld-linux.org/~arekm/intel-revert1.patch and rebuild driver.
Comment 16 Petar Velkovski 2009-12-09 11:47:58 UTC
Arkadiusz Miskiewicz, easier said than done. :) I have no idea how puling from git (or using git), patching and building works. And on top of that how to build a  deb package in order to do the testing. If Tormod, the packager that builds the testing packages for Ubuntu can help me with all this, I'll do the testing, if not, you'll have to skip me in the confirmation step. Not that I wouldn't like to do it by myself, but vague instructions given to beginners like me are not helping at all. Sorry if this sounds rude to you, but that is my current situation. :)
Comment 17 Arkadiusz Miskiewicz 2009-12-09 12:08:41 UTC
Maybe this will be enough for you:

git clone git://git.freedesktop.org/git/xorg/driver/xf86-video-intel
cd xf86-video-intel
patch -p1 < intel-revert1.patch
./autogen.sh
make
and in src/.libs/intel_drv.so you will find the driver, replace manually, keep old copy

Of course you will need tons of -devel packages to pass autogen.sh and make steps.
Comment 18 Daniel Vetter 2009-12-09 13:57:37 UTC
I think I have the same problem on my i855GM. The revert seems to fix it.
Kernel is latest drm-intel-next, ddx latest master, libdrm recent master.
Comment 19 Petar Velkovski 2009-12-09 15:40:18 UTC
Arkadiusz Miskiewicz thanks for your instructions., I didn't try them, but I believe that if I did, I would have succeed in building the driver.

Tormod gave me more specific instructions, for how to do it the debian way. So I tried with xserver-xorg-video-intel-2.9.99.901+git20091208.47416b1e, and I reverted the patch 3f11bbec420080151406c203af292e55177e77d1.

So far, I can't reproduce the bug, and I suspended my computer at least 10 times.
Chris I can confirm that reverting your patch "fixes" the issue.
Comment 20 Petar Velkovski 2009-12-10 20:25:33 UTC
Ok, new development. After using my computer for a day or so with it going automatically into suspend after every 30 minutes of inactivity (I most often suspend my system rather than shut it down properly), the font garbling problem reappeared. To quote Chris Wilson in his commit 	37f631d669c165c4fb56ccd7a6fc0a432f453b52: "For unknown reasons, enabling tiling for the glyph cache is causing glyph corruption both across suspend and resume and VT switching, on a wide range of chipsets (reports include both i8xx and gm45)".

I presume the real source of the problem is still unknown. And trying to reproduce the bug again with the steps I described previously was not successful. It only lead to freezing my system, but that might be related to mesa upgrades I'm doing daily as soon as there are new packages in Ubuntu's xorg-edgers repository.

So my conclusion is that commit	3f11bbec420080151406c203af292e55177e77d1 "uxa-glyphs: Enable TILING_X on glyph caches" made it just more visible.

I am interested in Arkadiusz Miskiewicz's and Daniel Vetter's experience if they did extensive testing too.
Comment 21 Petar Velkovski 2009-12-10 20:32:50 UTC
I forgot to mention that my latest testing was done with linux vanilla kernel 2.6.32, instead of 2.6.32rc8 as visible in the Xorg log posted when I opened this bug.
Comment 22 Arkadiusz Miskiewicz 2009-12-14 21:05:11 UTC
The problem didn't occur here since revertion time.
Comment 23 Petar Velkovski 2009-12-14 23:57:21 UTC
Ok, I found another bug report that somehow describes better the font garbling that reappeared on my computer. When I did much thought to it, it reappeared not immediately after resuming from suspend, but a few minutes afterwards. As I am experiencing more display freezes than corruption, I didn't see the corruption again. I found two bug reports, 25598 [965GM] Corruption on resume from hibernation with xf86-video-intel-git, and 25475 [i915] Xorg crash / Execbuf while wedged, that describe more accurately my problems. As for this bug, I believe that it should be closed, so I am closing it. 
Comment 24 Daniel Vetter 2009-12-15 08:33:32 UTC
Reopening, because this bug already gathered a bunch of people. And I have a patch, so I don't want to lose potential testers.
Comment 25 Daniel Vetter 2009-12-15 08:37:53 UTC
Created attachment 32087 [details] [review]
kernel patch to fix fence flushing

Please apply this to the latest git kernel (and don't forget to revert the revert in the ddx for testing).

Thanks, Daniel
Comment 26 Petar Velkovski 2009-12-15 10:52:50 UTC
Daniel, what is ddx? Revert what to what in where? :)
Comment 27 Daniel Vetter 2009-12-15 12:20:08 UTC
> --- Comment #26 from Petar Velkovski <pvelkovski@gmail.com>  2009-12-15 10:52:50 PST ---
> Daniel, what is ddx? Revert what to what in where? :)

with ddx I meant the intel Xorg driver, i.e. xf86-video-intel. In latest
master, Chris Wilson reverted the commit 
"uxa-glyphs: Enable TILING_X on glyph cache"
to work around this bug. So to actually test my patch, you have to undo
this commit, otherwise the code won't be tested at all.
Comment 28 Petar Velkovski 2009-12-15 15:40:25 UTC
Daniel this is what I did in order to remove the previous patch and apply yours:

petar@aurora:~/Desktop/intel/xserver-xorg-video-intel-2.9.99.901+git20091209.093bb9eb$ patch -R -p1 < /home/petar/Desktop/intel/37f631d669c165c4fb56ccd7a6fc0a432f453b52.patch
patching file src/common.h
patching file src/i830.h
patching file uxa/uxa-glyphs.c
petar@aurora:~/Desktop/intel/xserver-xorg-video-intel-2.9.99.901+git20091209.093bb9eb$ patch -p1 < /home/petar/Desktop/intel/
37f631d669c165c4fb56ccd7a6fc0a432f453b52.patch                                           xserver-xorg-video-intel_2.9.99.901+git20091209.093bb9eb-0ubuntu0tormod2~karmic.diff.gz
fence_flushing_fix.patch                                                                 xserver-xorg-video-intel_2.9.99.901+git20091209.093bb9eb-0ubuntu0tormod2~karmic.dsc
xserver-xorg-video-intel-2.9.99.901+git20091209.093bb9eb/                                xserver-xorg-video-intel_2.9.99.901+git20091209.093bb9eb.orig.tar.gz
petar@aurora:~/Desktop/intel/xserver-xorg-video-intel-2.9.99.901+git20091209.093bb9eb$ patch -p1 < /home/petar/Desktop/intel/fence_flushing_fix.patch
can't find file to patch at input line 25
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|commit d05e252b29764d83ecee678a61cb96b7b4eb238d
|Author: Daniel Vetter <daniel.vetter@ffwll.ch>
|Date:   Tue Dec 15 13:36:18 2009 +0100
|
|    drm/i915: fix order of fence release wrt flushing
|    
|    i915_gem_object_unbind had the ordering wrong. The other user,
|    i915_gem_object_put_fence_reg already has the correct ordering.
|    
|    Results was usually corrupted pixmaps, especially garbled font glyphs
|    after a suspend/resume (because this evicts everything).
|    
|    I'm still waiting for the feedback from the bug-reporters, but
|    because this obviously fixes a bug (at least for me) I'm already
|    submitting it.
|    
|    Bugzilla: FIXME
|    CC: stable@kernel.org
|    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
|index 8c463cf..9e81a0d 100644
|--- a/drivers/gpu/drm/i915/i915_gem.c
|+++ b/drivers/gpu/drm/i915/i915_gem.c
--------------------------
File to patch: ^C


What am I doing wrong???
Comment 29 Daniel Vetter 2009-12-15 15:56:17 UTC
Looks like you revert the patch, then apply it again. Then apply my patch
against the wrong source. Anyway, step by step instructions follow:

on your xf86-video-intel snapshot
xf86-video-intel $ patch -R -p1 < path-to-patch/37f63....patch

grab the latest linux kernel git snapshot
linux-kernel $ patch -p1 < path-to-kernel-patch/fence_flushing_fix.patch

btw: If you're somewhat serious about testing the latest and greatest, try
out the git repos. The basic git commands needed to test stuff are simple,
and it makes keeping track of various version/patches/snapshots dead easy.
Comment 30 Petar Velkovski 2009-12-15 16:33:43 UTC
Oh sorry I didn't read that your patch is for the kernel source. I was trying to apply it to the driver source :).I don't know how to build a kernel after I patch it for my system. I'll try to look for help from the Ubuntu developers. If I find someone that can help me, I'll do the testing. Thanks for clearing my mistake for me :)
Comment 31 Daniel Vetter 2009-12-16 00:02:51 UTC
> --- Comment #30 from Petar Velkovski <pvelkovski@gmail.com>  2009-12-15 16:33:43 PST ---
> Oh sorry I didn't read that your patch is for the kernel source. I was trying
> to apply it to the driver source :).I don't know how to build a kernel after I
> patch it for my system. I'll try to look for help from the Ubuntu developers.
> If I find someone that can help me, I'll do the testing. Thanks for clearing my
> mistake for me :)

Small howto:
1) grab latest git snapshot (you have to pick the latest release + a
patch) from kernel.org
1a) apply my patch
2) Copy your distros kernel config, either
your-kernel-sources $ zcat /proc/config.gz .config
or
your-kernel-sources $ cp /boot/config-[distro-kernel-release] .config
3) Update the configuration for the latest sources
your-kernel-sources $ make oldconfig
and build and install.

Hope that helps.
Comment 32 roberth 2009-12-17 08:02:42 UTC
(In reply to comment #30)
> Oh sorry I didn't read that your patch is for the kernel source. I was trying
> to apply it to the driver source :).I don't know how to build a kernel after I
> patch it for my system. I'll try to look for help from the Ubuntu developers.
> If I find someone that can help me, I'll do the testing. Thanks for clearing my
> mistake for me :)
> 

The kernel is available here as long as the patches get pushed to drm-intel-next (which it did)

http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-next/current/
Comment 33 Petar Velkovski 2009-12-17 14:45:44 UTC
If  Robert is absolutely sure that the patch is applied in http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-next/current/
then it doesn't work and the corruption still occurs. I used 2.6.32-997-generic_2.6.32-997.200912171148_i386.deb which means it was built on 17th of December.
As for building the kernel myself, it really is out of my league right now. Sorry that I can't provide more help than this at the moment. I hope Arkadiusz can do better testing than me.
Comment 34 Arkadiusz Miskiewicz 2009-12-17 23:38:12 UTC
fence flush patch doesn't fix the issue here (2.6.33rc1+that patch; intel ddx git master + reverted reversion). Fonts are still corrupted after resume.
Comment 35 Daniel Vetter 2009-12-18 06:07:29 UTC
> --- Comment #34 from Arkadiusz Miskiewicz <arekm@maven.pl>  2009-12-17 23:38:12 PST ---
> fence flush patch doesn't fix the issue here (2.6.33rc1+that patch; intel ddx
> git master + reverted reversion). Fonts are still corrupted after resume.

Ok. I also see still some corruptions right after resume (sometimes), but
the usually slowly disappear (when stuff gets redrawn). So looks like
there's something else going wrong. Some questions for the pattern
hunting:

1) Arkadiusz, what's your hw (couldn't find anything). Petar has a 945, I
an 855.

2) Do the corruptions you're seeing (slowly) disappear after some usage.
Or does it get worse (like slowly all letters in the same font and size
show the exact same corruption)?

3) Does the patch mentioned in comment #15 (now integrated in master of
xf86-video-intel) fully fix the problem, i.e. you haven't seen any
corruptions ever since? On my machine corruptions happen so seldom (and
get fixed quite fast with redrawns) that I can't decide this for certain.

Thanks, Daniel
Comment 36 Arkadiusz Miskiewicz 2009-12-18 06:26:35 UTC
1) GM45 here (thinkpad t400)

2) corruption disappears in areas that are redrawn

3) corruption doesn't happen for me after reverting "uxa-glyphs: Enable TILING_X on glyph caches." patch (in other words current ddx git master works fine for me). Note that each resume produced corruption here. There was no single case when resume didn't cause corruption.

Corruption here looks like this:
http://carme.pld-linux.org/~arekm/intel-resume-bug.png
http://carme.pld-linux.org/~arekm/intel-resume-bug2.png (black areas by gimp ;)
Comment 37 Eric Anholt 2009-12-18 14:57:13 UTC
Comment on attachment 32087 [details] [review]
kernel patch to fix fence flushing

the patch is queued.
Comment 38 Petar Velkovski 2009-12-19 05:54:01 UTC
Do we still need to have this bug open? As I said previously there was a corruption I got only once after reverting "uxa-glyphs: Enable
TILING_X on glyph caches." The problem is that during the period this bug was introduced and then fixed, another more serious bug was introduced in the driver. As I mentioned it before, it's bug 25475 [i915] Xorg crash /
Execbuf while wedged. What is particular about bug 25475 is that it gives an error message that warns about possible screen corruption or display freeze. I am not sure if particularly this bug caused the same font garbling but I must say that I even feel lucky that I got font corruption once, because all the next times it appeared, I got Xorg freezes that demanded system restart. Is anyone else having Xorg freezing problems with the latest git version of the intel driver?
Comment 39 Chris Wilson 2009-12-21 09:08:39 UTC
Given that we've identified a couple of issues that affected tiled buffers, once I'm back in civilisation and the guys have made the stable release, we can apply those, re-enable tiling, and close this bug...
Comment 40 Chris Wilson 2010-01-09 03:51:14 UTC
Glyph tiling has been re-enabled in current -intel. Be sure that you are also running drm-intel-next in order to get the full set of fixes.

In particular, Daniel's patch:
commit 96b47b65594fe2365f73aede060cb5203561fed3
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Dec 15 17:50:00 2009 +0100

    drm/i915: fix order of fence release wrt flushing

which has not made its way to stable yet.
Comment 41 Chris Wilson 2010-03-29 03:47:40 UTC
Well, no complaints in the 3 months that we've re-enabled tiling of the glyph cache...

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.