Summary: | [ilk] Corrupted rendering of page previews in Firefox with >xf86-video-intel-2.20.18 | ||
---|---|---|---|
Product: | xorg | Reporter: | Coacher <itumaykin+freedesktop> |
Component: | Driver/intel | Assignee: | Chris Wilson <chris> |
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | ccr |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
See Also: | https://launchpad.net/bugs/1189850 | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Created attachment 75707 [details]
glxinfo -l -t
Created attachment 75708 [details]
Screenshot with example of corrupted rendering
I still haven't been able to reproduce this one yet. Do you have a foolproof (and remember just how big a fool I am!) recipe? This issue happens occasionally, but I don't have a 100% reproducible way to show it. One of the most sucessfull attempts to reproduce it is: 1. make all `speed dial` buttons (previews on about:newtab) in Firefox filled with something reasonably heavy, not plain-text pages (on my machine it is a couple of youtube pages, web interface to SAGE, couple of redmines, etc.) 2. close all tabs except one and this last one tab should be about:newtab page 3. middle-click all the previews as fast as you can one by one, so the pages begin to load in background 3. now hit Ctrl+W till you close everything including that about:newtab page where you've started. You shouldn't wait until all pages you've opened on step 3 are loaded. 4. now open about:newtab again and with a good chance some of the preview will be corrupted. Sometimes there is no corruption, but some preview is displayed on the wrong position, for example two different sites share the same preview image. Another way to reproduce: 1. make at least one `speed dial` button (preview on about:newtab) in Firefox filled with any kind of preview, just any site you want 2. close all tabs except one and this last one tab should be about:newtab page 3. go to http://www.dreamworksanimation.com/ and add it to bookmarks, then close tab (sorry, bookmarking is the only way I know to make a specific site to show up in previews) 4. open about:newtab again and remove any preview image from it by pressing [X] 5. open bookmarks and drag dreamworksanimation bookmark you've made on step 3 into the freed on step 4 place 6. now visit http://www.dreamworksanimation.com/ so Firefox will generate preview 7. close tab and open again about:newtab. The preview for dreamworksanimation should be corupted Sorry if the descriptions are a bit messy. Also I don't have any other issues with firefox sites rendering, just issues with rendering previews. I wish there was an easier way to reproduce it. I did git bisecting between 2.20.18 and 2.20.19 and the result is this commit: dc643ef753bcfb69685f1eb10828d0c8f830c30e is the first bad commit commit dc643ef753bcfb69685f1eb10828d0c8f830c30e Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Jan 17 12:27:55 2013 +0000 sna: Apply read-only synchronization hints for move-to-cpu Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> :040000 040000 0f53950ba9a9756a39722f12c322c2d629c1a2a4 d5ff0a7307cc718ee94c78ee2fb1c9bf6158ed91 M src As this bug is not 100% reproducible it could slipped out of my sight during some bisect runs, however it is something to start with. What do you think? Could this sommit lead to the rendering problems I have? There was a related bug, fixed with commit 19bd005056a2083de64753681b96716996e4237d Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Feb 22 12:05:04 2013 +0000 sna: Avoid migrating and making the GPU bo busy prior to mmapping it References: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1131134 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> that I thought was already in 2.21.3 and so you had tested it. It is actually in master, so can you try compiling from git and checking if that fixes the issue? I'll admit to not fully explaining how that prevented the corruption, as the damage should had been migrated and then the kernel should have stalled upon the read... But it did have an effect and prevented a similar issue that bisected to the same commit. (In reply to comment #6) > It is > actually in master, so can you try compiling from git and checking if that > fixes the issue? I've just tested master and the issue is still there. Created attachment 75871 [details] [review] xf86-video-intel-2.21.3-revert-dc643ef753bcfb69685f1eb10828d0c8f830c30e.patch With this patch applied on top of xf86-video-intel-2.21.3 the problem is gone (at least I tried hard to reproduce it, but failed). This patch is simply reverting dc643ef753bcfb69685f1eb10828d0c8f830c30e commit mentioned above. Can you try converting each of those kgem_bo_sync__cpu_full() back to kgem_bo_sync__cpu() individually and see if we can narrow it down to one particular path? Created attachment 75892 [details] [review] Force CPU synchronisation after writes Another test to try. (In reply to comment #11) > Created attachment 75892 [details] [review] [review] > Force CPU synchronisation after writes > > Another test to try. With this patch applied on top of 2.21.3 the problem seems to be fixed. Created attachment 75920 [details] [review] kgem_bo_sync__cpu_full-revert-bad.patch (In reply to comment #10) > Can you try converting each of those kgem_bo_sync__cpu_full() back to > kgem_bo_sync__cpu() individually and see if we can narrow it down to one > particular path? With this patch on top of 2.21.3 I've hit the bug almost immediately. In this case I've left first kgem_bo_sync__cpu_full() as is and converted only second one. Created attachment 75921 [details] [review] kgem_bo_sync__cpu_full-revert-good.patch (In reply to comment #10) > Can you try converting each of those kgem_bo_sync__cpu_full() back to > kgem_bo_sync__cpu() individually and see if we can narrow it down to one > particular path? With this patch on top of 2.21.3 I was unable to reproduce the bug anymore. In this case I've converted first kgem_bo_sync__cpu_full() and left second one as is. I've looked through all callers to see if I can find one that missed the MOVE_WRITE to no avail. I've double checked the kernel to see if there is a loop hole, again to no avail. So I'm a little bit lost to see where the missed synchronisation is coming from, and I haven't yet thought of a good test to force/catch an error. In the meantime, I've applied one minor tweak to xf86-video-intel.git, commit 60ec35b8d25ecfabf1744ea7bc81109d7f2a90e2 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Mar 5 11:14:37 2013 +0000 sna: Be explicit when checking for an idle bo after CPU synchronisation Do you mind giving that a quick test? Also one other test is to try with the drm-intel-next kernel. (In reply to comment #15) > I've looked through all callers to see if I can find one that missed the > MOVE_WRITE to no avail. I've double checked the kernel to see if there is a > loop hole, again to no avail. So I'm a little bit lost to see where the > missed synchronisation is coming from, and I haven't yet thought of a good > test to force/catch an error. > > In the meantime, I've applied one minor tweak to xf86-video-intel.git, > > commit 60ec35b8d25ecfabf1744ea7bc81109d7f2a90e2 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Tue Mar 5 11:14:37 2013 +0000 > > sna: Be explicit when checking for an idle bo after CPU synchronisation > > Do you mind giving that a quick test? OK, I'll test it later today (In reply to comment #16) > Also one other test is to try with the drm-intel-next kernel. Could you please give me a quick link to their git repo? Would 3.9-rc1 would be enough? Our upstream is http://cgit.freedesktop.org/~danvet/drm-intel If you are using ubuntu, you can find pre-packaged kernels here http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/current/ Created attachment 76040 [details] [review] Disable read-read optimisations And one last request, can you please test that this patch as a temporary solution? (In reply to comment #20) > Created attachment 76040 [details] [review] [review] > Disable read-read optimisations > > And one last request, can you please test that this patch as a temporary > solution? This patch also fixes the issue. It was tested on 3.7.10 kernel as well as all previous patches. Now gonna try with drm-intel-next. (In reply to comment #21) > (In reply to comment #20) > > Created attachment 76040 [details] [review] [review] [review] > > Disable read-read optimisations > > > > And one last request, can you please test that this patch as a temporary > > solution? > > This patch also fixes the issue. It was tested on 3.7.10 kernel as well as > all previous patches. Now gonna try with drm-intel-next. Thanks. In the meantime, I'm going to push the temporary workaround - obviously I still hope to find the real bug. (In reply to comment #16) > Also one other test is to try with the drm-intel-next kernel. Ok, just tried out today's drm-intel-next kernel and was unable to reproduce this bug anymore. This sounds like good news. (In reply to comment #23) > (In reply to comment #16) > > Also one other test is to try with the drm-intel-next kernel. > > Ok, just tried out today's drm-intel-next kernel and was unable to reproduce > this bug anymore. This sounds like good news. Oh, wait, I forgot to rebuild xf86-video-intel without patch. Sorry. Will try vanilla now /o\ Can you confirm that result with vanilla xf86-video-intel? (In reply to comment #25) > /o\ Can you confirm that result with vanilla xf86-video-intel? Sorry to disappoint you, but the issue is reproducible with vanilla xf86-video-intel and drm-intel-next. (In reply to comment #22) > Thanks. In the meantime, I'm going to push the temporary workaround - > obviously I still hope to find the real bug. Is there a way I can help? Attach some debug info or test something? (In reply to comment #27) > (In reply to comment #22) > > Thanks. In the meantime, I'm going to push the temporary workaround - > > obviously I still hope to find the real bug. > > Is there a way I can help? Attach some debug info or test something? If you change the define in src/sna/sna_accel.c: diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c index ae6d3c1..5edad51 100644 --- a/src/sna/sna_accel.c +++ b/src/sna/sna_accel.c @@ -57,7 +57,7 @@ #define FORCE_INPLACE 0 #define FORCE_FALLBACK 0 #define FORCE_FLUSH 0 -#define FORCE_FULL_SYNC 1 /* https://bugs.freedesktop.org/show_bug.cgi?id=61628 */ +#define FORCE_FULL_SYNC 0 #define DEFAULT_TILING I915_TILING_X that restores the buggy behaviour. If you can keep running with that patch and with --enable-debug to check if any assertions are triggered and see how things progress. (In reply to comment #28) > If you can keep running with that patch > and with --enable-debug to check if any assertions are triggered and see how > things progress. OK, I've did what you've said, powered on and started to watch Xorg.0.log. The first thing I did was to open Firefox and trigger this issue several times - no output. Then I've tried to simulate some typical workflow i.e. opened programs I use on a daily basis and do some things inside them like checking mail, browsing a couple of webpages - still no output. Then I've decided to close them and return to Firefox and again triggered this issue several times and opened a couple of heavy tabs with flash and suddenly caught this: (EE) [mi] EQ overflowing. Additional events will be discarded until existing events are processed. (EE) (EE) Backtrace: (EE) 0: /usr/bin/X (xorg_backtrace+0x34) [0x5969b4] (EE) 1: /usr/bin/X (mieqEnqueue+0x263) [0x5776c3] (EE) 2: /usr/bin/X (0x400000+0x4fcd4) [0x44fcd4] (EE) 3: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f236e1d0000+0x6208) [0x7f236e1d6208] (EE) 4: /usr/bin/X (0x400000+0x7a477) [0x47a477] (EE) 5: /usr/bin/X (0x400000+0xa5527) [0x4a5527] (EE) 6: /lib64/libpthread.so.0 (0x3a9c400000+0x10bf0) [0x3a9c410bf0] (EE) 7: /lib64/libc.so.6 (ioctl+0x7) [0x3a9bce3437] (EE) 8: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x3fd3c040d8] (EE) 9: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f236fd9c000+0x1c1a0) [0x7f236fdb81a0] (EE) 10: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f236fd9c000+0x1d9f7) [0x7f236fdb99f7] (EE) 11: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f236fd9c000+0x4fe3a) [0x7f236fdebe3a] (EE) 12: /usr/bin/X (BlockHandler+0x44) [0x43f224] (EE) 13: /usr/bin/X (WaitForSomething+0x11d) [0x593e7d] (EE) 14: /usr/bin/X (0x400000+0x3ade2) [0x43ade2] (EE) 15: /usr/bin/X (0x400000+0x29b5a) [0x429b5a] (EE) 16: /lib64/libc.so.6 (__libc_start_main+0xed) [0x3a9bc2460d] (EE) 17: /usr/bin/X (0x400000+0x29eb1) [0x429eb1] (EE) (EE) [mi] These backtraces from mieqEnqueue may point to a culprit higher up the stack. (EE) [mi] mieq is *NOT* the cause. It is a victim. [ 8739.251] [mi] Increasing EQ size to 512 to prevent dropped events. [ 8739.251] [mi] EQ processing has resumed after 64 dropped events. [ 8739.251] [mi] This may be caused my a misbehaving driver monopolizing the server's resources. After that I've tried to reproduce this trace again opening same tabs and triggering issue again and again, but without any luck. Is this stack trace useful in any way? Hmm, I expect dmesg to contain a GPU hang and /sys/kernel/debug/0/i915_error_state to be populated, mind attaching it? (In reply to comment #30) > Hmm, I expect dmesg to contain a GPU hang and > /sys/kernel/debug/0/i915_error_state to be populated, mind attaching it? Too bad I turned off my machine later after I've caught that stack trace, so I can't give you the dump of i915_error_state, but I was checking both dmesg and xsession-errors and there was nothing unusual and no signs of error output from i915. I'll try to catch it again and if I do I'll attach dmesg and dump of i915_error_state here. *** Bug 61610 has been marked as a duplicate of this bug. *** (In reply to comment #28) > If you change the define in src/sna/sna_accel.c: > > diff --git a/src/sna/sna_accel.c b/src/sna/sna_accel.c > index ae6d3c1..5edad51 100644 > --- a/src/sna/sna_accel.c > +++ b/src/sna/sna_accel.c > @@ -57,7 +57,7 @@ > #define FORCE_INPLACE 0 > #define FORCE_FALLBACK 0 > #define FORCE_FLUSH 0 > -#define FORCE_FULL_SYNC 1 /* > https://bugs.freedesktop.org/show_bug.cgi?id=61628 */ > +#define FORCE_FULL_SYNC 0 > > #define DEFAULT_TILING I915_TILING_X > > that restores the buggy behaviour. If you can keep running with that patch > and with --enable-debug to check if any assertions are triggered and see how > things progress. I've been running this way ever since you've asked me, but that stack trace was the only one I was able to trigger, though improper rendering happened a lot. I am positive that when I caught that trace there were no errors in dmesg. Now, 2.21.4 is out and I will continue trying to catch something, though since it happens only in firefox maybe there is issue somewhere else? What versions of firefox, cairo and gtk do you have? Also I've noticed this message in .xsession-errors whenever I move previews in Firefox: (firefox:3574): GdkPixbuf-CRITICAL **: gdk_pixbuf_new: assertion `width > 0' failed This happens both with FORCE_FULL_SYNC 0 and 1. I've been primarily using iceweasel (based on ff10) with the system cairo as that is many times faster for gfx. But I've also been using the bloated ff from ubuntu and fedora on different systems (and they use the ancient cairo embedded into firefox). There are a lot of differences in cairo between those versions, so it would not surprise me if it was a bug specific to an older cairo. But I've hoped to have seen it by now as well. :| I've just tested binary Firefox's versions from their site. I've tried latest versions of 16,17,18 and 19 branches and I was able to trigger the issue in all of them. Will play with cairo versions now, my current cairo is 1.10.2 with some distro patches on top. Just note well that all firefox post version-10 use their builtin version of cairo. In order to use system cairo, firefox needs a patch to remove its reliance upon non-upstreamed API. Tested firefox-19.0.2 with all available versions of cairo from repos: 1.10.2, 1.12.8, 1.12.10, 1.12.12. Issue is reproducible with all versions. (In reply to comment #36) > Just note well that all firefox post version-10 use their builtin version of > cairo. In order to use system cairo, firefox needs a patch to remove its > reliance upon non-upstreamed API. Thanks for info, though I am using Gentoo and use Firefox built from sources on my machine and it is distro-patched to link against system-wide cairo so it's fine. Hmmm, that's news to me. Do you have a link to the patches they apply against firefox? Or a simple test is something like: http://ie.microsoft.com/testdrive/Performance/ParticleAcceleration/ which should be CPU bound in Xorg and not firefox. Also I've noticed that "disable read-read optimisations" patch practically does the same as converting kgem_bo_sync__cpu_full back to kgem_bo_sync__cpu (I may be wrong here though it looks this way to me). I will not question this as you are developer and know best, though as tests shown only one particular branch of kgem_bo_sync__cpu_full triggers this issue, see kgem_bo_sync__cpu_full-revert-bad.patch. Maybe you could add some asserts in that branch, I will apply them and give you some more info? (In reply to comment #38) > Hmmm, that's news to me. Do you have a link to the patches they apply > against firefox? http://mirror.yandex.ru/gentoo-distfiles/distfiles/firefox-19.0-patches-0.3.tar.xz > Or a simple test is something like: > http://ie.microsoft.com/testdrive/Performance/ParticleAcceleration/ which > should be CPU bound in Xorg and not firefox. Well, I've visited this link and see some spherical thingy made of particles. What should I check? (In reply to comment #40) > Well, I've visited this link and see some spherical thingy made of > particles. What should I check? Just look at top; For this particular benchmark, it should be ratelimited by the Xorg process not firefox - or better look at sudo perf top, if firefox is hitting pixman functions, it is a bad firefox. Seems like gentoo has the right patch though, it should be fine. Now if only the other distros also used that patch :( (In reply to comment #42) > Seems like gentoo has the right patch though, it should be fine. Now if only > the other distros also used that patch :( So, should I check top or not? Because I am a bit confused what exactly means "ratelimited by the Xorg process not firefox". I am building perf right now though. (In reply to comment #41) > (In reply to comment #40) > > Well, I've visited this link and see some spherical thingy made of > > particles. What should I check? > > Just look at top; For this particular benchmark, it should be ratelimited by > the Xorg process not firefox - or better look at sudo perf top, if firefox > is hitting pixman functions, it is a bad firefox. When running this demo in firefox `# perf top` says "42% libpixman-1.so.0.29.2" and this line sits on top of the list. Does that mean bad firefox? :( Only if that pixman time is inside firefox and not Xorg... Have gentoo also disabled server-side gradients in cairo? (In reply to comment #45) > Only if that pixman time is inside firefox and not Xorg... I am not familiar with this tool. How do I check this? > Have gentoo also > disabled server-side gradients in cairo? Yes, part of changelog: 10 Sep 2010; Samuli Suominen <ssuominen@gentoo.org> +cairo-1.10.0-r2.ebuild, +files/cairo-1.10.0-buggy_gradients.patch: Do not use server-side gradients. It hurts performance, and causes bad rendering on at least nvidia. Bug 336696. And this patch is still applied on top of cairo version I am running now. Though maintainers added option to disable it in the latest version in tree. It enabled by default though, so I tested this version also with disabled gradients. Should I check without it? http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/x11-libs/cairo/files/cairo-1.10.0-buggy_gradients.patch?revision=1.4&view=markup Link to the mentioned gradients patch Yeah, that gradient patch dramatically hurts performance on Nvidia and Intel systems, whilst having little impact on EXA systems. Kill that patch with fire. (In reply to comment #48) > Yeah, that gradient patch dramatically hurts performance on Nvidia and Intel > systems, whilst having little impact on EXA systems. Kill that patch with > fire. Tested without this patch, but the issue is still presented. What do you think about comment #39? And how can I check if pixman time shown in `perf top` belongs to Xorg or Firefox? (see comment #46) If you have the ncurses gui, the second column shows you the "comm" i.e. the process name. Similarly in the perf report. I'm trying to install gentoo to see if that helps (the prospect of a modern ff using system cairo is very appealing). (In reply to comment #51) > If you have the ncurses gui, the second column shows you the "comm" i.e. the > process name. Similarly in the perf report. Oh, finally, I was able to get it. Yes, that pixman rendering belongs to Firefox process, not Xorg. Though there is somehow no "comm" column in my perf-top, ncurses gui allows to zoom into threads and that's the solution. > I'm trying to install gentoo to see if that helps (the prospect of a modern > ff using system cairo is very appealing). That't nice to hear :) We have a handbook which covers most of the aspects of installation, but if you'll get stuck somewhere feel free to send me an e-mail, I'll be glad to help you. Reading http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/www-client/firefox/firefox-19.0.2.ebuild?view=markup it seems that the use of system-cairo has been dropped. Which is a shame. On the positive news though the latest unstable cairo has dropped the buggy gradients patch (unless legacy-drivers is set). (In reply to comment #53) > Reading > http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/www-client/firefox/ > firefox-19.0.2.ebuild?view=markup it seems that the use of system-cairo has > been dropped. Which is a shame. Well, you've seen the patches applied on top of firefox and support for system cairo is there. Out of curiosity I've run some initial steps of firefox build and here's a bit filtered result: grep cairo /var/tmp/portage/www-client/firefox-19.0.2/temp/build.log * 6009_fix_system_cairo_support.patch ... --enable-system-cairo system_libs --enable-default-toolkit=cairo-gtk2 mozilla.org default --enable-system-cairo --enable-default-toolkit=cairo-gtk2 checking for cairo >= 1.10... yes checking CAIRO_CFLAGS... -I/usr/include/cairo -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libdrm -I/usr/include/libpng15 checking CAIRO_LIBS... -lcairo checking for cairo-tee >= 1.10... yes checking CAIRO_TEE_CFLAGS... -I/usr/include/cairo -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libdrm -I/usr/include/libpng15 checking CAIRO_TEE_LIBS... -lcairo checking for cairo-xlib-xrender >= 1.10... yes checking CAIRO_XRENDER_CFLAGS... -I/usr/include/cairo -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libdrm -I/usr/include/libpng15 checking CAIRO_XRENDER_LIBS... -lcairo -lXrender -lX11 and this is output from already built firefox I am running now: ldd /usr/lib/firefox/libxul.so | grep cairo libcairo.so.2 => /usr/lib64/libcairo.so.2 (0x00007f205d497000) libpangocairo-1.0.so.0 => /usr/lib64/libpangocairo-1.0.so.0 (0x00007f2059cc2000) So, system-wide cairo enabled at build time and it is really there as shown by ldd. (In reply to comment #53) > Reading > http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/www-client/firefox/ > firefox-19.0.2.ebuild?view=markup it seems that the use of system-cairo has > been dropped. Which is a shame. You are not seeing thing like "we're enabling system cairo here ..." directly in ebuild because it is done inside mozcoreconf-2.eclass which inherited by mozconfig-3.eclass which inherited by firefox ebuild. Inheriting eclass can be thought of as pretty close equavivalent of using #include directive in C. (In reply to comment #53) > Reading > http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/www-client/firefox/ > firefox-19.0.2.ebuild?view=markup it seems that the use of system-cairo has > been dropped. Which is a shame. And the last one, you can find sources of eclasses in your $PORTDIR/eclass dir which is most probably /usr/portage/eclass. P.S. sorry for a burst of comments. Ok, I have ff-19 built at last using gentoo ~amd64 on a lowly ilk. It seems to be doing the right thing regarding using system-cairo and server-side gradients. Next step is to piece together enough components to see if I can reproduce the bug. (In reply to comment #57) > Ok, I have ff-19 built at last using gentoo ~amd64 on a lowly ilk. It seems > to be doing the right thing regarding using system-cairo and server-side > gradients. Next step is to piece together enough components to see if I can > reproduce the bug. Ok, tell me what info I should provide and I'll post it. As s first step, my firefox and xf86-video-intel USE-flags are: x11-drivers/xf86-video-intel-2.21.4 was built with the following: USE="dri sna udev xvmc -glamor -uxa" www-client/firefox-19.0.2 was built with the following: USE="alsa dbus gstreamer jit libnotify minimal (multilib) pgo system-jpeg wifi -bindist -custom-cflags -custom-optimization -debug (-selinux) -startup-notification -system-sqlite" ABI_X86="64" LINGUAS="ru -af -ak -ar -as -ast -be -bg -bn_BD -bn_IN -br -bs -ca -cs -csb -cy -da -de -el -en_GB -en_ZA -eo -es_AR -es_CL -es_ES -es_MX -et -eu -fa -fi -fr -fy_NL -ga_IE -gd -gl -gu_IN -he -hi_IN -hr -hu -hy_AM -id -is -it -ja -kk -km -kn -ko -ku -lg -lt -lv -mai -mk -ml -mr -nb_NO -nl -nn_NO -nso -or -pa_IN -pl -pt_BR -pt_PT -rm -ro -si -sk -sl -son -sq -sr -sv_SE -ta -ta_LK -te -th -tr -uk -vi -zh_CN -zh_TW -zu" CFLAGS="-march=core2 -mtune=generic -pipe -mno-avx" CXXFLAGS="-march=core2 -mtune=generic -pipe -mno-avx" I was able to reproduce that stack trace from Xorg log and intel driver is not an issue here at all. I found out that the cause of this is the fast spinning mouse wheel. I have a mouse with a wheel which can be scrolled like in 'free roam' mode, without that 'clicks', you know. And if I scroll too fast that stack appears. As before dmesg is clean from any i915 errors and no error state was caught. So, that stack is not related to the bug at all. I'm still using the optimized flushes on all of my machines and have yet to encounter corruption. :| Well, I am still experiencing this issue even with latest intel driver :( Are you running Gentoo now? What is your setup? Could you please give me the output of `emerge --info firefox` and `emerge --info xf86-video-intel`? I haven't tried Firefox 20 yet though. Could it be the issue in Firefox itself? Same issue with firefox 20 and xf86-video-intel 2.21.5 Hello. At last, there is some positive dynamic! Though I still from time to time see corrupted rendering of certain elements on some pages, but at least I haven't seen for a while any completely corrupted previews like it was before. Portions of previews could be corrupted, but only those parts which are rendered corrupted while browsing. So now there are no previews consisiting of complete garbage. (Both previews and pages are rendered via same drawWindow function in firefox as far as I can tell from sources) Updates that introduced(?) these changes: libdrm 2.4.43 -> 2.4.44 xorg-server 1.13.1 -> 1.13.4 GTK+ 2.24.16 -> 2.24.17 agg 2.5 -> 2.5-r2 (nothing big, maintainer changed couple of build options; added in the list because I use gnash in Firefox which uses agg, so maybe somehow connected) There were other updates, but these are the only changes that are possibly related to the effects I see. I was (and currently do) running xf86-video-intel-2.21.6 with disabled FORCE_SYNC all the time. That's unexpected - those updates should have had no impact upon this issue. :| (In reply to comment #64) > That's unexpected - those updates should have had no impact upon this issue. > :| Nevertheless, the overall look and feel in firefox was improved somehow. Now I've updated mesa to 9.1.2 and kernel to 3.9.0 and these positive effects are preserved. The situation is much better now than it was when I opened this bug: I don't have random huge screen corruptions in firefox both in thumbnails and during normal browsing. Though I can still trigger this issue and get corrupted page preview, it doesn't interfere with browsing. All other applications are unaffected. Since, things are quite good now, maybe it is a good idea to enable back that optimizations? What do you think? It looks like I am the only one who has this issue:( Ok, having made a new release, it is time to see if anyone else is seeing this bug: commit 8e42637050275945200797538a34c13c90b295cc Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue May 21 11:13:03 2013 +0100 sna: Re-enable read-read optimisations (In reply to comment #66) > Ok, having made a new release, it is time to see if anyone else is seeing > this bug: > > commit 8e42637050275945200797538a34c13c90b295cc > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Tue May 21 11:13:03 2013 +0100 > > sna: Re-enable read-read optimisations Thank you. I'll update this bug with any new info if I notice any changes bad or good. (In reply to comment #68) > It's back: > https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/ > 1189850 Thanks for the link. I've tried today's xf86-video-intel git with the commit which is marked as a solution via link you provided. I can confirm that I was unable to reproduce this issue, but I cannot say for sure as with recent changes this bug on my machine apperars much more rarely than before. It can reappear later, but I hope it won't. I'll provide any new info here if any. commit 22fd5ca947b58901927d100d2b1aa0f1672b3435 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 28 16:54:08 2013 +0100 drm/i915: Only clear write-domains after a successful wait-seqno In the introduction of the non-blocking wait, I cut'n'pasted the wait completion code from normal locked path. Unfortunately, this neglected that the normal path returned early if the wait returned early. The result is that read-only waits may return whilst the GPU is still writing to the bo. Fixes regression from commit 3236f57a0162391f84b93f39fc1882c49a8998c7 [v3.7] Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Aug 24 09:35:09 2012 +0100 drm/i915: Use a non-blocking wait for set-to-domain ioctl Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66163 Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> This bug just reappeared with xf86-video-intel-2.21.10. Next thing I am going to try is this commit you've posted above. (In reply to comment #70) > commit 22fd5ca947b58901927d100d2b1aa0f1672b3435 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Fri Jun 28 16:54:08 2013 +0100 > > drm/i915: Only clear write-domains after a successful wait-seqno > > In the introduction of the non-blocking wait, I cut'n'pasted the wait > completion code from normal locked path. Unfortunately, this neglected > that the normal path returned early if the wait returned early. The > result is that read-only waits may return whilst the GPU is still > writing to the bo. > > Fixes regression from > commit 3236f57a0162391f84b93f39fc1882c49a8998c7 [v3.7] > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Fri Aug 24 09:35:09 2012 +0100 > > drm/i915: Use a non-blocking wait for set-to-domain ioctl > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66163 > Cc: stable@vger.kernel.org > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Yes, this commit fixes the issue for me (on 3.10 kernel with this patch only). Thanks a lot for your help! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 75706 [details] lspci -vvv Hello. Since I've upgraded from 2.20.18 version of intel driver page previews in Firefox are rendered improperly (see attached screenshot). Tested versions of intel driver are 2.20.{18,19} and 2.21.{0,2,3}, Firefox's versions are 17.0-19.0. I don't think it is a Firefox issues it is completely gone when downgrading back to 2.20.18. My system is Gentoo amd64, currently with latest Firefox and intel driver. My current kernel version is 3.8 and it is vanilla. I am using SNA acceleration. If there is any additional info that would be helpful I am ready to provide it.