Related bugs filed earlier: https://bugs.archlinux.org/task/36105 https://bugzilla.mozilla.org/show_bug.cgi?id=892567 Steps to reproduce: Opening a jpeg like http://blather.michaelwlucas.com/wp-content/uploads/2013/07/ao2e-index.jpg first gets decoded correctly and once it's done the bottom half of the image gets corrupted. In one test the corrupted part displayed small snapshots of the Firefox window. When I clicked reload twice the kernelcrashed and reset the machine. The corruption happens with Firefox 22 to 25 and with 24 and 25 I was able to reliably make it crash while at least 22 doesn't crash the kernel. A workaround is to either set Firefox's gfx.xrender.enabled to false or disable SNA in /etc/X11/xorg.conf.d. SNA used to work without corrupted images or any other issues before a couple months ago and the bug must have slipped in one of the kernel or xf86-video-intel release of the last 3 months or so. Also of interest is that with SNA enabled presumably caused by the same bug Gtk+ 2.x widgets sometimes sporadically draw the inner part of button or scroll bars rectangles corrupted but that never caused an instant crash with hardware reset. This looks like a critical memory corruption bug in the xorg or kernel code. Software and hardware info: * Core2Duo in Mac Mini 2,1 running in 32bit mode * linux 3.10.2 * xf86-video-intel 2.21.11 * mesa 9.1.5 * Intel 945GM Actual results: Corrupted image and kernel crash with reset upon hitting reload button. Expected results: Image should be decoded and displayed correctly without resetting the machine. From Xorg.log with SNA enabled: [ 73.948] (==) Depth 24 pixmap format is 32 bpp [ 73.948] (II) intel(0): SNA initialized with Alviso (gen3) backend [ 73.948] (==) intel(0): Backing store disabled [ 73.948] (==) intel(0): Silken mouse enabled [ 73.948] (II) intel(0): HW Cursor enabled [ 73.948] (II) intel(0): RandR 1.2 enabled, ignore the following RandR disabled message. [ 73.948] (==) intel(0): DPMS enabled [ 73.948] (II) intel(0): [XvMC] i915_xvmc driver initialized. [ 73.948] (II) intel(0): [DRI2] Setup complete [ 73.948] (II) intel(0): [DRI2] DRI driver: i915 [ 73.948] (II) intel(0): direct rendering: DRI2 Enabled [ 73.948] (==) intel(0): hotplug detection: "enabled" [ 73.949] (--) RandR disabled [ 73.968] (II) AIGLX: enabled GLX_MESA_copy_sub_buffer [ 73.968] (II) AIGLX: enabled GLX_INTEL_swap_event [ 73.968] (II) AIGLX: enabled GLX_ARB_create_context [ 73.968] (II) AIGLX: enabled GLX_ARB_create_context_profile [ 73.968] (II) AIGLX: enabled GLX_EXT_create_context_es2_profile [ 73.968] (II) AIGLX: enabled GLX_SGI_swap_control and GLX_MESA_swap_control [ 73.968] (II) AIGLX: GLX_EXT_texture_from_pixmap backed by buffer objects [ 73.969] (II) AIGLX: Loaded and initialized i915 [ 73.969] (II) GLX: Initialized DRI2 GL provider for screen 0 From Xorg.log without SNA enabled: [ 4731.653] (==) Depth 24 pixmap format is 32 bpp [ 4731.653] (II) intel(0): [DRI2] Setup complete [ 4731.653] (II) intel(0): [DRI2] DRI driver: i915 [ 4731.653] (II) UXA(0): Driver registered support for the following operations: [ 4731.653] (II) solid [ 4731.653] (II) copy [ 4731.653] (II) composite (RENDER acceleration) [ 4731.654] (II) put_image [ 4731.654] (II) get_image [ 4731.654] (==) intel(0): Backing store disabled [ 4731.654] (==) intel(0): Silken mouse enabled [ 4731.654] (II) intel(0): Initializing HW Cursor [ 4731.654] (II) intel(0): RandR 1.2 enabled, ignore the following RandR disabled message. [ 4731.654] (==) intel(0): DPMS enabled [ 4731.654] (==) intel(0): Intel XvMC decoder disabled [ 4731.654] (II) intel(0): Set up textured video [ 4731.654] (II) intel(0): Set up overlay video [ 4731.654] (II) intel(0): direct rendering: DRI2 Enabled [ 4731.654] (==) intel(0): hotplug detection: "enabled" [ 4731.683] (--) RandR disabled [ 4731.703] (II) AIGLX: enabled GLX_MESA_copy_sub_buffer [ 4731.703] (II) AIGLX: enabled GLX_INTEL_swap_event [ 4731.703] (II) AIGLX: enabled GLX_ARB_create_context [ 4731.703] (II) AIGLX: enabled GLX_ARB_create_context_profile [ 4731.703] (II) AIGLX: enabled GLX_EXT_create_context_es2_profile [ 4731.703] (II) AIGLX: enabled GLX_SGI_swap_control and GLX_MESA_swap_control [ 4731.703] (II) AIGLX: GLX_EXT_texture_from_pixmap backed by buffer objects [ 4731.703] (II) AIGLX: Loaded and initialized i915 [ 4731.703] (II) GLX: Initialized DRI2 GL provider for screen 0
Since you're saying that things worked roughly 3 months ago can you please test whether going back to an old kernel or and old ddx restores correct behaviour? Just so we know in which component the bug is.
And we really do need the log files from the crashes (/var/log/messages hopefully has the kernel information, along with /var/log/Xorg.0.log[.old] containing any crash information). Note that linux-3.10.2 (and kernels post-3.7) has a known image corruption bug exposed by SNA.
(In reply to comment #1) > Since you're saying that things worked roughly 3 months ago can you please > test whether going back to an old kernel or and old ddx restores correct > behaviour? Just so we know in which component the bug is. I don't know if I can do that but I'll see what's possible.
(In reply to comment #2) > And we really do need the log files from the crashes (/var/log/messages > hopefully has the kernel information, along with /var/log/Xorg.0.log[.old] Sadly there's no log at all because the crash is an instant hardware reset of the machine when that 3.1MB jpeg is reloaded a couple times. > containing any crash information). Note that linux-3.10.2 (and kernels > post-3.7) has a known image corruption bug exposed by SNA. I might be able to try a 3.6 kernel but I'm not sure I can try an older ddx. Something must have changed in Firefox post 22 which makes the corruption reset the machine quicker or more reliably. With Firefox 22 a short test of reloading that jpeg multiple times only corrupted the image without a reset.
What does the corruption look like? Is it capturable in a screenshot, or only on the display i.e. a photograph?
(In reply to comment #5) > What does the corruption look like? Is it capturable in a screenshot, or > only on the display i.e. a photograph? The Gtk2 corruption with Raleigh theme engine manifests itself as scrollbars or buttons having captcha like random noise inside the inner rectangle where Raleigh would just display a single color. It looks like a random pattern. It was sporadic and didn't stay for long when the gtk widget was redrawn AFAIR. The jpeg Firefox corruption looked similarly random most of the time but in one or two cases I've seen it display multiple boxes with a snapshot of Firefox's window in a tiled layout. Is there no way to capture logs that indicate what led to the corruption by setting an environment variable?
No, we have no way of tracing back to find a specific instance of corruption - the tracing is all or nothing. And is voluminous. The latter corruption in jpeg images sounds like it should be fixed by the read-write bug fixes in 3.10.5. That might also explain the corruption elsewhere...
(In reply to comment #7) > The latter corruption in jpeg images sounds like it should be fixed by the > read-write bug fixes in 3.10.5. That might also explain the corruption > elsewhere... Same problem with linux 3.10.6, xf86-video-intel 2.21.14 and mesa 9.1.6. I didn't try to make it reset the machine after confirming the corrupted jpeg image.
I still cannot see where you recorded the kernel crash? Do you still have the log messages for that?
(In reply to comment #9) > I still cannot see where you recorded the kernel crash? Do you still have > the log messages for that? I don't have log messages from the crash because when it crashes the machine hard resets instantly.
This bug happens here too with Firefox 23. I use Linux 3.10.0 (x86-64) with latest git intel driver, xorg 1.14.2, Mesa 8.0.5. What's interesting is that it only happens if I compile Firefox 23 with -mno-avx (gcc 4.8.2). If I don't use -mno-avx, the jpeg displays correctly, but Firefox gets extremely slow. If you need some testing, just ask.
(In reply to comment #11) > This bug happens here too with Firefox 23. > > I use Linux 3.10.0 (x86-64) with latest git intel driver, xorg 1.14.2, Mesa > 8.0.5. > > What's interesting is that it only happens if I compile Firefox 23 with > -mno-avx (gcc 4.8.2). If I don't use -mno-avx, the jpeg displays correctly, > but Firefox gets extremely slow. > > If you need some testing, just ask. You didn't mention your hardware, so please add an Xorg.0.log. Since you have AVX, I think you have an entirely different bug? Do you see the same hard lockup up? Do you have any error messages?
(In reply to comment #12) > You didn't mention your hardware, so please add an Xorg.0.log. Since you > have AVX, I think you have an entirely different bug? Do you see the same > hard lockup up? Do you have any error messages? Ok, I attached my Xorg.0.log. My hardware: Core i7 2700k using the internal GPU I don't have any hard lookup, but the jpeg image gets blank (white) no matter how many times I try to reload it. Regarding AVX, I don't know if it is a different bug, but the results are the same.
Created attachment 84611 [details] Xorg.0.log The requested Xorg.0.log.
Dâniel, I think you have a bug in your compilation of firefox with avx. Either some handwritten assembly is incorrect or gcc's code generation is wrong, we could trace the X drawing commands, but I don't think that is where the bug lies.
(In reply to comment #15) > Dâniel, I think you have a bug in your compilation of firefox with avx. > Either some handwritten assembly is incorrect or gcc's code generation is > wrong, we could trace the X drawing commands, but I don't think that is > where the bug lies. Yes Chris, I reported it some time ago: https://bugzilla.mozilla.org/show_bug.cgi?id=864610 I think they introduced this bug on Firefox 20 (since until 19 version it never happened). Anyway thank you.
No crash here with: - Firefox 24 - Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) - Linux 3.11.1 - xf86-video-intel 2.21.15 with SNA
(In reply to comment #17) > No crash here with: > > - Firefox 24 > - Intel Corporation 2nd Generation Core Processor Family Integrated Graphics > Controller (rev 09) Is this the same chip as 945GM? > - Linux 3.11.1 > - xf86-video-intel 2.21.15 with SNA I still see the corruption but didn't try to make it crash for obvious reasons. Did you or do you at least see the corruption in the image's lower half? I do and have to explicitly enable UXA. AFAICT a stable kernel from April 2013 must have been one of the last where it was all correct.
(In reply to comment #18) > (In reply to comment #17) > > No crash here with: > > > > - Firefox 24 > > - Intel Corporation 2nd Generation Core Processor Family Integrated Graphics > > Controller (rev 09) > > Is this the same chip as 945GM? > > > - Linux 3.11.1 > > - xf86-video-intel 2.21.15 with SNA > > I still see the corruption but didn't try to make it crash for obvious > reasons. Kernel and DDX are the same version here. > Did you or do you at least see the corruption in the image's lower half? > I do and have to explicitly enable UXA. AFAICT a stable kernel from April > 2013 must have been one of the last where it was all correct.
We still haven't had sufficient information to be able to even guess at what the problem might be. A screenshot or photograph of the corruption would be a good first step, if you cannot capture the error messages from the lockup.
Per https://bugs.archlinux.org/task/36105#comment114630 I built a version of Firefox 24.0 linking more system versions of libraries and the corrupted jpeg doesn't seem to happen so far. I was using ftp.mozilla.org binaries and those reproduce the corruption reliably. mozconfig: . $topsrcdir/browser/config/mozconfig ac_add_options --enable-official-branding ac_add_options --with-system-jpeg ac_add_options --with-system-zlib ac_add_options --with-system-bz2 ac_add_options --with-system-png ac_add_options --with-system-libevent ac_add_options --enable-system-sqlite ac_add_options --enable-system-cairo ac_add_options --enable-system-pixman ac_add_options --disable-tests ac_add_options --disable-crashreporter ac_add_options --disable-updater ac_add_options --disable-installer mk_add_options PROFILE_GEN_SCRIPT='EXTRA_TEST_ARGS=10 $(MAKE) -C $(MOZ_OBJDIR) pgo-profile-run' This is a non PGO build due to memory limitations on this machine but I copied that PGO line from ArchLinux's mozconfig as found. package versions: cairo 1.12.16-1 pixman 0.30.2-1 libjpeg-turbo 1.3.0-2 libpng 1.6.5-1 zlib 1.2.8-1 bzip2 1.0.6-4 zlib 1.2.8-1 libevent 2.0.21-2 sqlite 3.8.0.2-1 This is how ArchLinux's Firefox package is built with the exception of --enable-system-cairo: https://projects.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/firefox
The custom Firefox build doesn't provoke the corruption but as suspected by the Intel devs there are corruption issues just hiding. I don't know if it's related but I've seen a Qt3 application draw text destined for its statusbar outside of the window and right into the X root or over other windows managed by the wm.
(In reply to comment #22) > The custom Firefox build doesn't provoke the corruption but as suspected by > the Intel devs there are corruption issues just hiding. I don't know if it's > related but I've seen a Qt3 application draw text destined for its statusbar > outside of the window and right into the X root or over other windows > managed by the wm. I'll keep an eye out for this after enabling UXA.
Created attachment 86908 [details] Firefox ao2e-index.jpg
(In reply to comment #20) > We still haven't had sufficient information to be able to even guess at what > the problem might be. A screenshot or photograph of the corruption would be > a good first step, if you cannot capture the error messages from the lockup. Attached a screenshot of what it usually looks like. On reload of the page the bottom half of the picture changes to another random pattern and as said before sometimes to 4 little screenshots of the Firefox user interface.
News?
The image corruption should be fixed in xf86-video-intel.git (I believe commit 8f6e227ba8127a2ca034271f2a660c24abbe056f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Nov 4 12:57:01 2013 +0000 sna: Apply the BLT source offset for individual copies is the right fix amongst several related fixes.) But we have no information about the kernel crash, so we can't begin a diagnosis, but must just hope that it gets randomly fixed... I suspect it may not have been a kernel crash, but a page-fault-of-doom which should also have been mitigated recently.
(In reply to comment #27) > The image corruption should be fixed in xf86-video-intel.git (I believe Thanks Chris > commit 8f6e227ba8127a2ca034271f2a660c24abbe056f > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Mon Nov 4 12:57:01 2013 +0000 > > sna: Apply the BLT source offset for individual copies > > is the right fix amongst several related fixes.) Will see if I can try this before the next release but do you know when the next xf86-video-intel release will be? > But we have no information about the kernel crash, so we can't begin a > diagnosis, but must just hope that it gets randomly fixed... I suspect it > may not have been a kernel crash, but a page-fault-of-doom which should also > have been mitigated recently. I was once able to easily make it hard reset (crash) the machine by reloading the corrupted image a couple times. So I believe it's at least triggered by the big jpeg image corruption quickly. Is it the right idea to wait and test for the image corruption fix first and try to reproduce the Qt3 out-of-window corrupted text drawing SNA bug next or would you say that's something else and needs a new bug?
Image decoding bug looks fixed in xf86-video-intel-2.99.906 but I still have to check the Qt3 out-of-window text drawing/corruption.
With SNA enabled I once again see an SDVOB kernel message: [drm] Setting output timings on SDVOB failed. Is this ok to ignore?
(In reply to comment #30) > With SNA enabled I once again see an SDVOB kernel message: [drm] Setting > output timings on SDVOB failed. Is this ok to ignore? Yeah, that's just an internal detail of your ADD card (the bit that we speak SDVO to and that actually controls the display interface on your machine). It should just work fine even with that warning.
I'm closing this for lack of information about the earlier kernel crash, without which we can not begin to diagnose it. Hopefully it too has resolved itself. Please do reopen if you can provide any information to point us in the right direction.
Both issues are fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.