Bug 42426 - Artifacts when firefox does partial image rendering [SNA]
Summary: Artifacts when firefox does partial image rendering [SNA]
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-31 05:56 UTC by Clemens Eisserer
Modified: 2013-05-09 13:44 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
screenshot (146.69 KB, image/png)
2011-10-31 05:58 UTC, Clemens Eisserer
no flags Details
debug=full log (2.53 MB, application/x-bzip2)
2012-03-12 11:40 UTC, Clemens Eisserer
no flags Details
debug=full log with patches (2.83 MB, application/x-bzip2)
2012-03-13 12:17 UTC, Clemens Eisserer
no flags Details
screenshot of misrendered fudzilla (229.64 KB, image/png)
2012-03-13 12:31 UTC, Clemens Eisserer
no flags Details
another screenshot of fudzilla (235.33 KB, image/png)
2012-03-13 12:32 UTC, Clemens Eisserer
no flags Details
Firefox artifacts w/SNA (144.01 KB, image/png)
2012-04-14 08:51 UTC, cyshei
no flags Details

Description Clemens Eisserer 2011-10-31 05:56:35 UTC

    
Comment 1 Clemens Eisserer 2011-10-31 05:58:18 UTC
When firefox loads an image, it renders it a few times while loading, which seems to be broken with SNA. The screenshot attached shows firefox-7.0.1, displaying a partially loaded image from bugzilla.


intel i945GM
libdrm-2.4.27
pixman-0.23.8
linux-3.1.0
xorg 1.11.1
Comment 2 Clemens Eisserer 2011-10-31 05:58:54 UTC
Created attachment 52948 [details]
screenshot
Comment 3 Chris Wilson 2011-10-31 07:19:11 UTC
I haven't seen this one myself yet, so it might require some complicating interactions to trigger a particular path. How reproducible are you finding this bug?
Comment 4 Clemens Eisserer 2011-10-31 09:20:18 UTC
I just go to https://bugs.freedesktop.org/attachment.cgi?id=52947 and the resulting image is garbled.

Just tried, I only get the corruption when the window is too small and firefox automatically scales the image down to fit it to window.
Comment 5 Chris Wilson 2011-10-31 09:37:58 UTC
Using iceweasel 7.0.1 with a ridiculously small window and watching the images progressively load... no glitch. Might be related to 945GM rather than PNV I suppose, strikes me as unlikely though.
Comment 6 Clemens Eisserer 2011-10-31 09:44:43 UTC
I was using official mozilla builds which have a bundled version of cairo + pixman - could that be the cause?
Comment 7 Chris Wilson 2011-10-31 09:54:13 UTC
Mozilla do have a tendency to replace CAIRO_EXTEND_NONE with CAIRO_EXTEND_PAD which gives the effect you see if the composite operation is not properly clipped. I'm not sure if they actually gone as far as patching their copy of cairo to divert EXTEND_NONE in the xlib backend, but it does suggest at least one line of investigation.
Comment 8 Clemens Eisserer 2011-10-31 12:29:04 UTC
Thanks for the information.

I can confirm now that the artifacts only happen with the official mozilla builds, with firefox-7 shipped by fedora everything is fine.

> if the composite operation is not properly clipped.
I wonder where the clip is lost.

Thanks, Clemens
Comment 9 Chris Wilson 2012-01-26 12:26:24 UTC
Hmm, original bug is before the great damage break and then eventual fix. Though I can hope that this too was in fact a damage bug that eventually got fixed.

Clemens, are you still seeing this bug?
Comment 10 Chris Wilson 2012-03-02 04:45:27 UTC
I'm going to assume that I've fixed this one in the great damage overhaul. Call me an optimist, but I trust Clemens to point out my failures ;)
Comment 11 Clemens Eisserer 2012-03-04 12:06:00 UTC
Sorry for the delay - my SSD died so it took some time to get everything back to normal.

Unfourtunatly the bug is still not fixed, I can experience it with:
- Firefox 10.0.2 (only the official builds trigger this issue)
- intel 2.18.0-27-gaaed9e9
- Fedora 16 with latest updates installed
- i945GM
Comment 12 Clemens Eisserer 2012-03-04 12:06:59 UTC
Forgot to mention, the issue only pops up when firefox does automatic re-scaling the image to make the image fit the window.
Comment 13 Chris Wilson 2012-03-12 07:10:11 UTC
Still not seeing this using a beta from firefox.com. Any chance you might be able to reproduce this with --enable-debug=full? I'm pretty sure I know what the bug will look like, but then I've already checked for the obvious...
Comment 14 Clemens Eisserer 2012-03-12 11:40:30 UTC
Created attachment 58338 [details]
debug=full log
Comment 15 Clemens Eisserer 2012-03-12 11:41:37 UTC
I triggered the partial image loading after the two VT switches, hope that helps to locate where in the log it happend.
Comment 16 Chris Wilson 2012-03-12 15:00:19 UTC
Haven't found it yet, but I have spotted several instances where it has deviated from the intended design. Thanks!
Comment 17 Chris Wilson 2012-03-13 11:51:48 UTC
I had some fun and applied an assortment of patches that'll hopefully improve the handling of firefox on your system. I haven't yet identified the bug, but I would appreciate it if you could capture another debug=full log (to verify that the changes I made actually help!). :)
Comment 18 Clemens Eisserer 2012-03-13 12:17:13 UTC
Created attachment 58391 [details]
debug=full log with patches
Comment 19 Clemens Eisserer 2012-03-13 12:18:18 UTC
Thanks :)

I triggered the partial image loading after the VT switch again.
Comment 20 Clemens Eisserer 2012-03-13 12:31:03 UTC
Hmm, with those patches applied, I get corruptions when loading www.fudzilla.com in firefox.
Comment 21 Clemens Eisserer 2012-03-13 12:31:33 UTC
Created attachment 58392 [details]
screenshot of misrendered fudzilla
Comment 22 Clemens Eisserer 2012-03-13 12:32:47 UTC
Created attachment 58393 [details]
another screenshot of fudzilla
Comment 23 Chris Wilson 2012-03-13 12:36:59 UTC
Hmm, that looks different to the corruption I was fearing! Typical, the bugs are never where you expect them. ;-)
Comment 24 Chris Wilson 2012-03-13 12:50:35 UTC
FYI, it's the "retain upload buffer for the duration of the batch" that's the culprit. And once again it misses all the asserts that I thought would catch such corruption. :(
Comment 25 Clemens Eisserer 2012-03-13 12:54:50 UTC
I don't know if this has anything to do with it, but I just ran into the following crash (with --enable-debug turned on):

rogram received signal SIGABRT, Aborted.
0x0084e416 in __kernel_vsyscall ()
(gdb) bt
#0  0x0084e416 in __kernel_vsyscall ()
#1  0x41e7798f in raise () from /lib/libc.so.6
#2  0x41e792d5 in abort () from /lib/libc.so.6
#3  0x41e706a5 in __assert_fail_base () from /lib/libc.so.6
#4  0x41e70757 in __assert_fail () from /lib/libc.so.6
#5  0x00572c3f in sna_copy_boxes (src=0x900a2a8, dst=0x95684c0, gc=0x9452c18, 
    box=0xbfc84694, n=1, dx=5, dy=0, reverse=0, upsidedown=0, bitplane=0, 
    closure=0x0) at sna_accel.c:3405
#6  0x081aa4bb in miCopyRegion ()
#7  0x081aa9a0 in miDoCopy ()
#8  0x0057d8bb in sna_copy_area (src=0x900a2a8, dst=0x95684c0, gc=0x9452c18, 
    src_x=5, src_y=0, width=12, height=16, dst_x=0, dst_y=0)
    at sna_accel.c:3885
#9  0x08159c57 in ?? ()
#10 0x08071c86 in ?? ()
#11 0x08076195 in ?? ()
#12 0x0806439a in ?? ()
#13 0x41e616b3 in __libc_start_main () from /lib/libc.so.6
#14 0x080646c9 in _start ()
Comment 26 Chris Wilson 2012-03-13 13:04:26 UTC
Related, but not the whole story since the corruption triggers before we get as far as that copy. Thanks for the bt.
Comment 27 Chris Wilson 2012-03-13 15:19:09 UTC
Finally found that little problem:

commit d23ee0380b61e0dfd3ed56b8b4a15fd0b7956491
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Mar 13 22:00:25 2012 +0000

    sna: Reuse the cached upload as a source GPU bo
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=42426
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

(Worse still I had acknowledged the possibility without thinking it through and adding that bit of code :(

Still tons of assertions in place in case people, or I, do silly things...

And back to the hunt.
Comment 28 Clemens Eisserer 2012-03-14 12:14:58 UTC
With the latest patches I don't get any corruptions, Xorg crashes every 2 hours or so.

Last time it crashed,  it hit an assert:

Program received signal SIGABRT, Aborted.
0x00db9416 in __kernel_vsyscall ()
(gdb) bt
#0  0x00db9416 in __kernel_vsyscall ()
#1  0x41e7798f in raise () from /lib/libc.so.6
#2  0x41e792d5 in abort () from /lib/libc.so.6
#3  0x41e706a5 in __assert_fail_base () from /lib/libc.so.6
#4  0x41e70757 in __assert_fail () from /lib/libc.so.6
#5  0x0016d22e in kgem_create_buffer (kgem=0x9b343a0, size=61248, flags=4, 
    ret=0xbfbfccfc) at kgem.c:3500
#6  0x001a4366 in sna_read_boxes (sna=0x9b34260, src_bo=0x9b65020, src_dx=0, 
    src_dy=0, dst=0x9b64f90, dst_dx=0, dst_dy=0, box=0xbfbfce14, nbox=1)
    at sna_io.c:334
#7  0x0017f52a in sna_drawable_move_region_to_cpu (drawable=0x9d5e1c8, 
    region=0xbfbfce14, flags=2) at sna_accel.c:1531
#8  0x001801c5 in sna_get_image (drawable=0x9d5e1c8, x=0, y=0, w=1276, h=12, 
    format=2, mask=4294967295, dst=0x9f59e08 "") at sna_accel.c:11449
#9  0x081bc44f in ?? ()
#10 0x08115ed4 in ?? ()
#11 0x08072ff5 in ?? ()
#12 0x08076195 in ?? ()
#13 0x0806439a in ?? ()
#14 0x41e616b3 in __libc_start_main () from /lib/libc.so.6
#15 0x080646c9 in _start ()
Comment 29 Chris Wilson 2012-03-14 12:20:30 UTC
Hmm, that's scary, that's one of the most widely checked invariants. Is it consistently crashing in the same spot?
Comment 30 Clemens Eisserer 2012-03-14 12:32:43 UTC
Just had it crashing once with a debugger attached, I'll give it a try...
Comment 31 Clemens Eisserer 2012-03-14 13:02:19 UTC
another one:

(gdb) bt
....
#4  0x41e70757 in __assert_fail () from /lib/libc.so.6
#5  0x001ff5fe in sna_accel_inactive (sna=0x8a28260) at sna_accel.c:11830
#6  sna_accel_block_handler (sna=0x8a28260) at sna_accel.c:12059
#7  0x0020e57a in sna_block_handler (i=0, data=0x8a28260, timeout=0xbf9e866c, 
    read_mask=0x8235c20) at sna_driver.c:593
#8  0x0807a488 in BlockHandler ()
#9  0x080a58f1 in WaitForSomething ()
....
Comment 32 Chris Wilson 2012-03-14 13:22:29 UTC
That assertion is much easier to explain:

commit d0e05b4294b2f150a41dd95d52c2e6ee8479283d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Mar 14 20:19:30 2012 +0000

    sna: Don't mark cached upload buffers for inactivity expiration
    
    As these do not follow the normal rules of damage tracking, we have to
    be careful not to force migration.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=42426
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Please keep the debugger running ;)
Comment 33 Clemens Eisserer 2012-03-14 13:42:03 UTC
Now I hit an assert an X startup:

#4  0x41e70757 in __assert_fail () from /lib/libc.so.6
#5  0x001f7455 in kgem_bo_reference (bo=<optimized out>) at kgem.h:273
#6  0x002278ae in kgem_bo_reference (bo=0x90300e0) at kgem.h:273
#7  kgem_create_linear (kgem=0x8fd95d8, size=1, flags=<optimized out>)
    at kgem.c:2279
#8  0x0022984c in _kgem_submit (kgem=0x8fd95d8) at kgem.c:1758
#9  0x0024f353 in kgem_bo_submit (bo=0x900a258, kgem=0x8fd95d8) at kgem.h:255
#10 kgem_bo_flush (bo=0x900a258, kgem=0x8fd95d8) at kgem.h:261
#11 sna_accel_flush (sna=0x8fd9498) at sna_accel.c:11775
#12 sna_accel_block_handler (sna=0x8fd9498) at sna_accel.c:12052
#13 0x0025e8fa in sna_block_handler (i=0, data=0x8fd9498, timeout=0xbfc8e65c, 
    read_mask=0x8235c20) at sna_driver.c:593
#14 0x0807a488 in BlockHandler ()
#15 0x080a58f1 in WaitForSomething ()
Comment 34 Chris Wilson 2012-03-14 13:54:16 UTC
Apologies, pushed an obviously correct patch after an earlier version failed at the first hurdled!
Comment 35 Clemens Eisserer 2012-03-14 14:03:28 UTC
Now I get a SIGSEV when using skype:

#0  0x41f8786c in __memset_sse2 () from /lib/libc.so.6
##1  0x00256192 in trapezoids_fallback (op=5 '\005', src=0x8a37890, 
#    dst=0x8a37ce8, maskFormat=0x8528460, xSrc=2, ySrc=0, ntrap=9, 
#        traps=0x887e344) at sna_trapezoids.c:2473
#        #2  0x0025ca51 in sna_composite_trapezoids (op=5 '\005', src=0x8a37890, 
#            dst=0x8a37ce8, maskFormat=0x8528460, xSrc=2, ySrc=0, ntrap=9, 
#                traps=0x887e344) at sna_trapezoids.c:4542
#                #3  0x08149b26 in CompositeTrapezoids ()
Comment 36 Chris Wilson 2012-03-14 14:31:44 UTC
Looking on the bright-side, at least you hitting these code paths written especially for you ;)

commit 6890592cd2b2d6f0d06c530f5e770fdc98577d4f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Mar 14 21:30:13 2012 +0000

    sna/traps: Explicitly create an unattach pixmap for fallback
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=42426
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 37 Clemens Eisserer 2012-03-14 15:56:49 UTC
Something different, but I am too lazy to open a new report.
I got this one, when resizing Thunar (xfce's file manager) to +2048 px:

#0  0x00c15416 in __kernel_vsyscall ()
#1  0x41e7798f in raise () from /lib/libc.so.6
#2  0x41e792d5 in abort () from /lib/libc.so.6
#3  0x41e706a5 in __assert_fail_base () from /lib/libc.so.6
#4  0x41e70757 in __assert_fail () from /lib/libc.so.6
#5  0x002ca797 in __sna_damage_subtract (region=0xbf9e0c28, damage=0x97ddfa0)
    at sna_damage.c:1013
#6  _sna_damage_subtract (damage=0x97ddfa0, region=0xbf9e0c28)
    at sna_damage.c:1084
#7  0x002af9a8 in sna_damage_subtract (region=0xbf9e0c28, damage=0x94bbfec)
    at sna_damage.h:155
#8  sna_drawable_move_region_to_cpu (drawable=0x98c5950, region=0xbf9e0c28, 
    flags=5) at sna_accel.c:1422
#9  0x002b82aa in sna_poly_fill_rect (draw=0x98c5950, gc=0x94e1fd0, n=1, 
    rect=0xa0a7330) at sna_accel.c:9935
#10 0x08157fbb in ?? ()
#11 0x081ace3e in miPaintWindow ()
#12 0x081acff1 in miWindowExposures ()
#13 0x080e19c8 in ?? ()
#14 0x081c4b7b in miHandleValidateExposures ()
#15 0x080a1d48 in UnmapWindow ()

(I also got screen corruption which didn't show up on screenshots)
Comment 38 Chris Wilson 2012-03-14 16:52:56 UTC
It would be too convenient if an assertion in the damage code was behind those damage related symptoms. Those assertions are a bit stale, and I've updated them to match the current reality a bit better, though I think they are just the tip of the iceberg.
Comment 39 Chris Wilson 2012-03-24 15:19:37 UTC
Clemens, now that I have the 2048+ pixel bug, are you happy the original bug with firefox seems resolved (even if I have no idea just what fixed it!)?
Comment 40 Clemens Eisserer 2012-03-25 04:08:53 UTC
Unfourtunatly, the original issue is still there on my machine.
I recorded a short video (sorry about the music), maybe it helps to understand whats going on: http://www.youtube.com/watch?v=VW5uGkXq76c
Comment 41 cyshei 2012-04-14 08:51:31 UTC
Created attachment 59978 [details]
Firefox artifacts w/SNA

I also seem to be hitting this, with xorg-server 1.12.0, mesa 8.0.1, libdrm 2.4.33, and xf86-video-intel 2.18.0.  I tried installing the latest version of xf86-video-intel from git, but that somehow segfaulted X on startup.

Disabling SNA fixes the problem, which is what I am currently doing.
Comment 42 Chris Wilson 2012-04-14 08:59:12 UTC
(In reply to comment #41)
> Created attachment 59978 [details]
> Firefox artifacts w/SNA
> 
> I also seem to be hitting this, with xorg-server 1.12.0, mesa 8.0.1, libdrm
> 2.4.33, and xf86-video-intel 2.18.0.  I tried installing the latest version of
> xf86-video-intel from git, but that somehow segfaulted X on startup.

That corruption was a different bug, already fixed. Show me the segfault and I'll tell you what you did wrong ;-)
Comment 43 cyshei 2012-04-14 09:12:47 UTC
(In reply to comment #42)
> (In reply to comment #41)
> > Created attachment 59978 [details]
> > Firefox artifacts w/SNA
> > 
> > I also seem to be hitting this, with xorg-server 1.12.0, mesa 8.0.1, libdrm
> > 2.4.33, and xf86-video-intel 2.18.0.  I tried installing the latest version of
> > xf86-video-intel from git, but that somehow segfaulted X on startup.
> 
> That corruption was a different bug, already fixed. Show me the segfault and
> I'll tell you what you did wrong ;-)

Great!  I just tried again and got things working, so I suspect that it was just the particular version of the tree that I checked out a couple of days ago (same ebuild in Gentoo, so absolutely no difference in build steps).  Things are looking great with SNA enabled now, thanks! :)
Comment 44 Chris Wilson 2012-05-24 09:00:49 UTC
I've honestly had no insights into this bug in the past 6 months (weep). Just on the off-chance, is it still present?
Comment 45 Chris Wilson 2012-07-02 11:41:25 UTC
Clemens are you are able to reproduce this one on your i5 setup?
Comment 46 Clemens Eisserer 2012-07-02 11:49:25 UTC
Yes, still happens on my i5 with firefox 13.0.1
Comment 47 Chris Wilson 2012-07-02 11:56:03 UTC
At least that implies the bug is in the generic handlers, and also that it is not being invoked as a w/a for 2k gen3 limitations (unless you are only seeing it on extremely large images >8192 pixels high or wide).
Comment 48 Clemens Eisserer 2012-07-02 12:48:51 UTC
No, it happens with the sample image linked in comment #4, which is just 732px×696px - in case the browser window is smaller than the image and the image isn't cached.
Comment 49 Clemens Eisserer 2012-08-13 21:25:33 UTC
happens also on my sandy bridge powered notebook I use at work.
Comment 50 Chris Wilson 2012-08-13 21:36:19 UTC
I too have seen it once, but couldn't repeat it. Not too sure where even to suggest hunting for it (somewhere in src/sna/sna_render.c, that's the only sure thing), so I remain bemused until I can reproduce it reliably.
Comment 51 Chris Wilson 2012-10-21 18:07:20 UTC
Clemens, have you seen this recently at all? No explanation yet as it never seems to happen when I go hunting for it.
Comment 52 Clemens Eisserer 2012-10-25 16:26:07 UTC
Unfourtunatly I still experience this issue quite frequently.
Comment 53 Chris Wilson 2012-12-12 16:15:16 UTC
Not seen this as all since the last update. :| Still perplexed...
Comment 54 Clemens Eisserer 2012-12-13 17:59:02 UTC
just experienced it with Firefox-17.0.1 (official build) and 2.20.15
Comment 55 Jan Alexander Steffens (heftig) 2012-12-30 10:01:59 UTC
Thinkpad X220 (SNB), Linux 3.7.1, intel 2.20.17 (SNA), Firefox 19.0a2

Also seeing the corruption from screenshot #1. Happens while Firefox is loading an image. Once the image is completely loaded, Firefox immediately repaints it completely, without error.

I think this happens when Firefox paints multiple lines in a single cycle, in which the first line painted is erroneously repeated for the rest of them. To reproduce, maybe try loading an image over a slow network connection with a lot of jitter.

The bug is visible independent of activated Firefox layer acceleration.
Comment 56 Jan Alexander Steffens (heftig) 2013-04-23 21:21:15 UTC
Still happening here,
Thinkpad X220 (SNB), Linux 3.8.8, intel 2.21.6 (SNA), Firefox 22.0a2.

Firefox was built without system-pixman or system-cairo. Reverting Mozilla's avoid-extend-none.patch to Cairo doesn't do anything.

The problem doesn't show up with the first three screenshots attached to this bug (attachments 52948, 58392 and 58393), but it does appear while loading the last screenshot (attachment 59978 [details]). Maybe it depends on the aspect ratio of the image?
Comment 57 Chris Wilson 2013-05-09 13:08:38 UTC
How about now?

commit 2217f6356b53263b6ce8f92b5c29c0614d4ef2a5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu May 9 13:46:11 2013 +0100

    sna/trapezoids: Fix the determination of the trapezoid origin
    
    "src-x and src-y register the pattern to
    the floor of the top x and y coordinate of the left edge of the
    first trapezoid,"
    
    Bugzilla: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1178020
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 58 Jan Alexander Steffens (heftig) 2013-05-09 13:44:55 UTC
Seems to be fine now. :)

Thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.