Bug 67015 - Page rendering corruption in Firefox [SNA]
Summary: Page rendering corruption in Firefox [SNA]
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 67073 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-07-17 22:31 UTC by Clemens Eisserer
Modified: 2013-08-10 10:25 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
screenshot (324.45 KB, image/png)
2013-07-17 22:31 UTC, Clemens Eisserer
no flags Details
xorg log file (31.20 KB, text/plain)
2013-07-18 06:32 UTC, Clemens Eisserer
no flags Details

Description Clemens Eisserer 2013-07-17 22:31:09 UTC
Created attachment 82566 [details]
screenshot

When viewing the page https://bugs.kde.org/show_bug.cgi?id=224447 on my SNB laptop with the official FF-22 build I get page corruptions when another window is on top of Firefox.

Steps to reproduce:
1. Load https://bugs.kde.org/show_bug.cgi?id=224447 in Firefox
2. Scroll to bottom
3. Move/Open window on top of the FF window

Quite often the page is already corrupted after scrolling to the bottom.

I did some bisecting, however I am not really convinced its really the commit to blame: 

48028a7c923fa0d66b01e8e94d4f0742866f78ec is the first bad commit
commit 48028a7c923fa0d66b01e8e94d4f0742866f78ec
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 21 14:29:43 2013 +0100

    sna: Inspect availablity of render before prefering to use the GPU
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

:040000 040000 f56e8b2d4e7c6fdd77080ac66437e56fa6cd3938 a3ba69294e46d078d94dd5f6ad976ec6eaccbfb2 M	src
Comment 1 Chris Wilson 2013-07-17 22:55:30 UTC
Not as silly a bisect as you might think - I did throw a side-effect into that patch to also try and avoid unnecessary migrations. I'm in the middle of a kernel bug right now,

Can you attach your Xorg.0.log (because I am nosy), and describe which window you open? Heck if you get really bored, another of your famous videos :)
Comment 2 Chris Wilson 2013-07-17 23:56:08 UTC
Another thing to check is whether other ff (particular those build with system-cairo) are affected.
Comment 3 Chris Wilson 2013-07-17 23:57:04 UTC
Also check kernel version (Xorg.0.log).
Comment 4 Clemens Eisserer 2013-07-18 06:32:02 UTC
Created attachment 82569 [details]
xorg log file
Comment 5 Clemens Eisserer 2013-07-18 06:34:18 UTC
I wouldn't dare if you hadn't asked:  http://youtu.be/2ZFbrOzK5zw ;)

I am running linux-3.10rc7 + your drm patch.
Comment 6 Clemens Eisserer 2013-07-18 06:36:03 UTC
just tested, issue reproduceable with official FF22 build as well as the FF build shipped with feodra-19.
Comment 7 Chris Wilson 2013-07-18 08:26:35 UTC
What the...

Ok, that is a little more obvious than I was expecting. I hadn't see anything like that so was expecting some subtle corruption dependent on a precise sequence of steps and applications.
Comment 8 Chris Wilson 2013-07-18 08:34:46 UTC
Ok, I don't think this is the same bug - but if you do get the opportunity to test with 3.11-rc2 or drm-intel-nightly, that would be much appreciated.
Comment 9 Chris Wilson 2013-07-18 09:25:53 UTC
Ok, reminds me of the 8192+ errors.
Comment 10 Clemens Eisserer 2013-07-18 12:04:28 UTC
Ok, I'll test 3.11rc2 as soon as fedora rawhide builds are available.
Comment 11 Chris Wilson 2013-07-18 12:05:27 UTC
Indeed, I get the same bisect as you do. But I think, having experimented with adjusting each chunk, that it is just changing the placement of an operation and so hitting an underlying bug.
Comment 12 Chris Wilson 2013-07-18 12:37:25 UTC
The active ingredient in that patch is that we defer the pixmap migration until after we check whether the BLT unit can handle the operation. In this case, it cannot so we then try with the RENDER unit - the difference now is that the source and destination pixmaps are both categorised as being on the CPU. Given that, we prefer to do the operation on the CPU.

But it all should still be rendered correctly...
Comment 13 Chris Wilson 2013-07-18 17:36:40 UTC
Clemens can you try this little hack to see if it helps your system:

diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c
index 77233cd..c84ca92 100644
--- a/src/sna/sna_accel.c
+++ b/src/sna/sna_accel.c
@@ -62,7 +62,7 @@
 
 #define DEFAULT_TILING I915_TILING_X
 
-#define USE_INPLACE 1
+#define USE_INPLACE 0
 #define USE_WIDE_SPANS 0 /* -1 force CPU, 1 force GPU */
 #define USE_ZERO_SPANS 1 /* -1 force CPU, 1 force GPU */
 #define USE_CPU_BO 1


So far I can only reproduce this on SNB. And am quite baffled. Everytime I think I have a lead, it unravels.
Comment 14 Clemens Eisserer 2013-07-18 17:40:58 UTC
Jap, setting USE_INPLACE to 0 makes the issue disappear here ...
Comment 15 Chris Wilson 2013-07-18 21:03:26 UTC
Clemens, are your machines reasonably similar in sw config? Can you easily check whether you see this on any other machine? I'm still only seeing it on the one Fedora SNB I have, which is concerning.
Comment 16 Clemens Eisserer 2013-07-18 21:47:46 UTC
I wasn't able either to reproduce it on gen4 with an identical sw config as on my SNB laptop
Comment 17 Clemens Eisserer 2013-07-18 21:48:25 UTC
sorry, ment gen5 (the i540M)
Comment 18 Chris Wilson 2013-07-18 21:50:51 UTC
Ok, the issue seems to rely on the prehistoric cairo used by mozilla.
Comment 19 Clemens Eisserer 2013-07-18 21:52:27 UTC
doesn't the firefox build shipped with Fedora use the system's libcairo?
I can reproduce it with official as well as fedora-builds on SNB.
Comment 20 Chris Wilson 2013-07-18 21:56:05 UTC
No, fedora is still using mozcairo and haven't noticed that mozilla finally got around to making system-cairo functional again. (Debian and gentoo have been patching back system-cairo support into their versions.)
Comment 21 Clemens Eisserer 2013-07-18 22:01:14 UTC
could it be that they switched back to system-cairo with F19?
I can't find anything relating to mozcairo on my system and /usr/lib64/xulrunner/libxul.so links /lib64/libcairo.so.2.
Comment 22 Chris Wilson 2013-07-18 22:02:49 UTC
It's statically linked in, I can tell it's an old cairo as I recognise the sequence of XRender requests. :)
Comment 23 Clemens Eisserer 2013-07-18 22:03:44 UTC
convinced ;)
Comment 24 Chris Wilson 2013-07-19 09:45:48 UTC
*** Bug 67073 has been marked as a duplicate of this bug. ***
Comment 25 Chris Wilson 2013-07-19 16:03:03 UTC
Note that the issue is masked by:

commit fb058de4e617d7e5058674859993ec635a8d779e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jul 18 13:37:12 2013 +0100

    sna: Treat a source with a CPU bo as being attached.

And then requires

diff --git a/src/sna/gen6_render.c b/src/sna/gen6_render.c
index 9b45d79..5885af3 100644
--- a/src/sna/gen6_render.c
+++ b/src/sna/gen6_render.c
@@ -1955,6 +1955,7 @@ has_alphamap(PicturePtr p)
 static bool
 need_upload(PicturePtr p)
 {
+	return true;
 	return p->pDrawable && unattached(p->pDrawable) && untransformed(p);
 }
 

to reproduce.
Comment 26 Chris Wilson 2013-07-19 16:58:20 UTC
I've just bumped the kernel up to the lastest -nightly and the corruption is gone.
Comment 27 Christoph Reiter 2013-08-10 10:12:29 UTC
I see it with 3.10.

Another page which triggers it: http://pygments.org/docs/lexers/
Comment 28 Chris Wilson 2013-08-10 10:17:39 UTC
This is a kernel bug, I think should be resolved by 3.10.5.
Comment 29 Christoph Reiter 2013-08-10 10:25:32 UTC
confirming, 3.10.5 works :) thanks.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.