Bug 66163 - Sporadic image corruptions in firefox for images which were not cached
Summary: Sporadic image corruptions in firefox for images which were not cached
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
: 66665 66755 (view as bug list)
Depends on:
Reported: 2013-06-25 16:28 UTC by Clemens Eisserer
Modified: 2013-07-10 08:51 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:

screenshot (56.45 KB, image/jpeg)
2013-06-25 16:28 UTC, Clemens Eisserer
no flags Details

Description Clemens Eisserer 2013-06-25 16:28:32 UTC
When firefox does its partially-display-while-loading on images which are not in cache, I sometimes get corruptions which look like tiling errors on my SNB laptop.

Steps to reproduce:
1. Display disk-cache for FF completely (0 mb)
2. Load: https://www.shroudoftheavatar.com/
3. While the page is loading, scroll up and down using the scrollbar
4. Some images will be corrupted, as shown in the attached screenshot.

I bisected the issue down to: 

commit fd375da5caf34f93a4e87670bb0c70fec5b4c55c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Jun 23 17:06:11 2013 +0100

    sna: Allow tiled uploads to accumulate damage
    And for the upload to create the bo as required.
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

:040000 040000 cfdf19f7ec5ccaec769ac5f0103cb369d7d08b38 1bec59ec587054734ae208a400d70053f57be728 M	src
Comment 1 Clemens Eisserer 2013-06-25 16:28:51 UTC
Created attachment 81418 [details]
Comment 2 Chris Wilson 2013-06-25 16:37:19 UTC
Hmm. Presumably that's then hitting the bo create path. Let's see if I can reproduce before asking you to do random stuff...
Comment 3 Chris Wilson 2013-06-25 17:09:31 UTC
What is the latest ddx you tested?
Comment 4 Chris Wilson 2013-06-25 17:11:59 UTC
Saw the corruption briefly, I think this is fixed by

commit 2e2c448a77ab9dce4807b159708290cd7ad26b5c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jun 25 14:22:28 2013 +0100

    sna: Don't flag IGNORE_CPU for partial overwritten blocks
Comment 5 Clemens Eisserer 2013-06-25 17:16:01 UTC
latest tested ddx version was: 2.21.10-16-g2e2c448
seems to be something else :/
Comment 6 Chris Wilson 2013-06-25 18:00:43 UTC
Bah, it just seems that I am unreliable in reproducing the corruption.
Comment 7 Chris Wilson 2013-06-26 09:05:52 UTC
So I think I have this fixed by

commit fc5b9a96194583c67705d7d05afc3e04e97e3338
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jun 25 22:58:31 2013 +0100

    sna: Clear mapped state after performing manual tiling

At least I have not seen the corruption since. But I have doubts in my ability to reproduce the corruption at will...
Comment 8 Clemens Eisserer 2013-06-27 07:35:39 UTC
I can still reproduce the corruption with 2.21.10-33-gd7be3df :/
Comment 9 Chris Wilson 2013-06-27 08:47:33 UTC
Yet another tweak... Please be the last!
Comment 10 Chris Wilson 2013-06-27 14:01:42 UTC
The other thing to note is whether the reproduction scenario is the same. There is always a possibility that I need to look elsewhere now.
Comment 11 Clemens Eisserer 2013-06-28 11:16:54 UTC
Could be the same issue:

When opening the pdf located at https://mega.co.nz/#!Ytkx0bJJ!FBBmMk0NCtw4DiSWmrnOTxzuWPRvG1EAHFQXOb4bO3s with acrobat-reader 9.5.3, setting the zoom level to 10% and resizing the window to about ~4k in both directions, some icons in the left toolbar start flickering: http://youtu.be/NuKtGBNFx6U
Comment 12 Chris Wilson 2013-06-28 11:35:00 UTC
Looks like a related bug, indeed. Lets see if I have any better luck reproducing this in acroread.
Comment 13 Chris Wilson 2013-06-28 12:09:29 UTC
They copy the icons to a fresh pixmap before doing a getimage... I guess that is to avoid the error of reading an offscreen window, but it causes a stall as the original source is idle (on the GPU but coherent with the CPU), becomes active for the copy and then we need to wait for the copy to finish before completing the getimage. If I was smart I could elide that intermediate copy...

Digression. Back to bug hunting.
Comment 14 Clemens Eisserer 2013-06-28 12:16:25 UTC
Can you reproduce the flickering too? Looks somehow like tiling issues...

Interesting ;)
It is quite intriguing what ends up in the X protocol stream sometimes :/
To some degree part of X's reputation of being slow can be attributed to toolkit/app developers not giving a second thought what they are really doing...
Comment 15 Chris Wilson 2013-06-28 12:39:16 UTC
Not the flickering - though I think I see occasional tiling errors if it refreshes the entire window, but that is so rare I'm sure what is going on.

What I can see going wrong when the window size >= 4096 is that the GetImage it performs for the page icon has the wrong values, as if it is missing the sync for the PolyFillRect (and PolyLine) called on the new pixmap to set the background and edge. That could explain a lot of bugs if I can narrow it down.
Comment 16 Chris Wilson 2013-06-28 15:54:36 UTC

commit 37f961a11c72af94763df34915e79b6847e9e6a7
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 28 16:50:46 2013 +0100

    drm/i915: Only clear write-domains after a successful wait-seqno
    In the introduction of the non-blocking wait, I cut'n'pasted the wait
    completion code from normal locked path. Unfortunately, this neglected
    that the normal path returned early if the wait returned early. The
    result is that read-only waits may return whilst the GPU is still
    writing to the bo.
    Fixes regression from
    commit 3236f57a0162391f84b93f39fc1882c49a8998c7 [v3.7]
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Fri Aug 24 09:35:09 2012 +0100
        drm/i915: Use a non-blocking wait for set-to-domain ioctl
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66163
    Cc: stable@vger.kernel.org
Comment 17 Chris Wilson 2013-06-28 16:51:20 UTC
dinq commit 22fd5ca947b58901927d100d2b1aa0f1672b3435
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 28 16:54:08 2013 +0100

    drm/i915: Only clear write-domains after a successful wait-seqno
Comment 18 Clemens Eisserer 2013-07-04 06:59:20 UTC
I can confirm it is fixed in 2.21.11 + kernel-patch.
Thanks :)
Comment 19 Chris Wilson 2013-07-07 12:37:45 UTC
*** Bug 66665 has been marked as a duplicate of this bug. ***
Comment 20 Chris Wilson 2013-07-10 08:51:02 UTC
*** Bug 66755 has been marked as a duplicate of this bug. ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.