Description
Conley Moorhous
2013-10-16 12:11:02 UTC
Created attachment 87727 [details]
Video showing the issue
uploaded with correct content type
Created attachment 87728 [details]
Problematic picture #1
Created attachment 87729 [details]
Problematic picture #2
Please attach you Xorg.0.log (after the image load fails would be best). Created attachment 87733 [details]
Xorg.0.log
It doesn't look like it reports anything about it. I took the log right after opening both pictures in EOG.
Just checked using f20 on a snb machine (closest I had to your setup), works as expected. Can you please recompile with --enable-debug=full and attached the log file from as minimal a session as possible that reproduces the error? And/or attempt a bisection? It's somewhere between http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=2851b7747bc8e143aa5c6209b8800eeccb629058 which works correctly and http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=55cd67485ff34a28ab8eaa7b1b6958b96c072317 which does not. I attached a clean log with debugging enabled for both a minimal environment (i3 window manager) and Gnome 3, what I normally use. I would bisect further but .903 seems to either not work or fail to compile, so I cannot tell when something fails. Commit 55cd67 is the earliest of .903 that I can compile/use. Files were too big. Here they are: Gnome 3: http://filebin.ca/yd0NFchS2GE/Xorg.0.log i3: http://filebin.ca/yd2Lx80ESnf/Xorg.i3.log (In reply to comment #8) > Files were too big. Here they are: > > Gnome 3: http://filebin.ca/yd0NFchS2GE/Xorg.0.log > > i3: http://filebin.ca/yd2Lx80ESnf/Xorg.i3.log To be clear, these have the eog failure? (In reply to comment #9) > (In reply to comment #8) > > Files were too big. Here they are: > > > > Gnome 3: http://filebin.ca/yd0NFchS2GE/Xorg.0.log > > > > i3: http://filebin.ca/yd2Lx80ESnf/Xorg.i3.log > > To be clear, these have the eog failure? Yes, those both have eog loading an image that gets corrupted (like in the video). I deleted the Xorg.0.log, restarted, logged in and then loaded the image. Then I copied the Xorg.0.log to my home folder and did the same process for the other WM/DE. I ask because I could not see anything that corresponded with the eog trying to render the image. Can you please paste the sha1sum of the logs (if you still have them) so that I can check that I downloaded them intact? f762afbcffceef735391fb39acdabefdaaec81db Xorg.0.log 63d3f1757a64b35f2bfff1dd870ec8e8237ba095 Xorg.i3.log There's the sha1sums. Is there a way for me to capture better debugging data somehow? If the information I am looking for is not in the debug logs, it doesn't exist. (The log files should explain if the image rendering were aborted for any reason etc.) Ok, I have the same checksums as you. But nothing that corresponds to eog. Please do try to generate another debug log of eog misrendering and check that the last messages in the log are X shutting down (e.g. Server terminated successfully. Closing log file). If you xz (or lzma) to compress them, they compress extremely well. Created attachment 87747 [details]
Gnome 3 log
sha1sum: c1240281d853f41a55a26cd0eb958a82353e9d35
Created attachment 87748 [details]
i3 log
sha1sum: a99106161c2f977254cf59780edc0b0817976e26
Yeah they really do compress down! Okay, I made everything sterile this time and made sure to shut down X once I opened the images. Hopefully something useful is in there. I'm still struggling to reconstruct anything in those logs that look like eog (or anything else for that matter) attempting to render a large image. :| As a sanity check, can you please attach a debug log for a working version? Created attachment 87769 [details]
Log of a picture being opened correctly by EOG
Here is EOG opening an image correctly in i3. I will tail the log file to attempt to see what is added when the image fails to load.
Created attachment 87797 [details]
Gnome 3 failure log
[conley@styrka ~]$ sha1sum Xorg.failure.log.xz
913918803724c56aa274c570f93e7536fe20a194 Xorg.failure.log.xz
Okay, I switched to Xorg, opened EOG, and then switched back to VT. The relevant output, wherein I open both failing pictures in EOG, starts at 1941.638. Once I opened both pictures I switched back to VT and shut down the server.
Comment on attachment 87769 [details]
Log of a picture being opened correctly by EOG
[conley@styrka ~]$ sha1sum Xorg.failure.log.xz
913918803724c56aa274c570f93e7536fe20a194 Xorg.failure.log.xz
Comment on attachment 87769 [details]
Log of a picture being opened correctly by EOG
Wrong attachment, sorry. Let me just send you all 100 e-mails!
Sorry, this must be really frustrating - it is for me! I can not find anything that corresponds to the image being rendered in the failure logs. (And the success logs are using UXA which doesn't emit any debugging information.) I'm not sure where to go from here as without that information I can not begin to diagnose the problem. It is so very strange. Created attachment 87856 [details]
Gnome 3 Successful render of picture w/SNA enabled
Half of the frustration is due to me, no doubt! Here is a log actually using SNA instead of UXA. Relevant output begins after 146.116. Don't worry too much about this if nothing pertinent shows up; I'm sure it will eventually get sorted out. We might just have to wait until 3.0 is released to figure out what is going on.
Comment on attachment 87856 [details]
Gnome 3 Successful render of picture w/SNA enabled
[conley@styrka ~]$ sha1sum Xorg.Gnome3success.log.xz
692d086fe3a11da00238c5e03ff65148a6152430 Xorg.Gnome3success.log.xz
Here's some new info: https://bugs.archlinux.org/task/37225#comment115652 It looks like it isn't the git driver necessarily. Let's try something slightly different. Can you please capture eog failing to load an image using xtrace [http://xtrace.alioth.debian.org/]? And for comparison, where it succeeds using UXA. Created attachment 88092 [details]
xtrace logs of success and failure with 2.21.15/2.99.905 uxa/sna
[conley@styrka ~]$ sha1sum logs.tgz
e90f8e581f6a56e6f073651c88dbbed4b34df63e logs.tgz
For each log, I started a trace, opened EOG, opened Mt. Hood picture, opened Rotstein Pass picture, and then closed EOG.
Ok, I've found the upload path in the xtrace: 000:<:0aa0: 16: MIT-SHM-Request(130,1): Attach shmseg=0x020001d4 shmid=0x0ad08001 readonly=false(0x00) 000:<:0aa1: 28: MIT-SHM-Request(130,5): CreatePixmap pid=0x020001d5 drawable=0x020001d2 width=1920 height=1200 depth=24 shmseg=0x020001d4 offset=0x01000000 000:<:0aa2: 20: RENDER-Request(139,4): CreatePicture pid=0x020001d6 drawable=0x020001d5 format=0x0000002a values={} 000:<:0aa3: 36: RENDER-Request(139,8): Composite op=Src(0x01) src=0x020001d6 mask=None(0x00000000) dst=0x020001d3 xSrc=0 ySrc=0 xMask=0 yMask=0 xDst=0 yDst=0 width=1920 height=1200 000:<:0aa4: 8: RENDER-Request(139,7): FreePicture picture=0x020001d6 000:<:0aa5: 8: Request(54): FreePixmap drawable=0x020001d5 However, there is no record of a SHM pixmap being created in the debug logs. Ah, that path is only for has_userptr... Hmm. Ok, it still works on my systems but the significant difference is that the failing case uses sna_blt_composite -> sna_replace__xor -> memcpy_xor. Can you please verify that diff --git a/src/sna/blt.c b/src/sna/blt.c index 4c27678..3191ead 100644 --- a/src/sna/blt.c +++ b/src/sna/blt.c @@ -1004,6 +1004,13 @@ memcpy_xor(const void *src, void *dst, int bpp, if (width * 4 == dst_stride && dst_stride == src_stride) { width *= height; height = 1; + + { + uint32_t *d = (uint32_t *)dst_bytes; + while (width) + *d++ = 0xffcc00cc; + return; + } } if (have_sse2()) { turns your black picture magenta? Spotted one issue, but I don't think you trigger it according to your debug log. commit f0da01aa907d488ae32dfda206ea8a66564bc430 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Oct 25 14:22:05 2013 +0100 sna: Remove stale mappings when replacing GPU bo References: https://bugs.freedesktop.org/show_bug.cgi?id=70527 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Created attachment 88112 [details]
Logs with turn-magenta patch applied
[conley@styrka ~]$ sha1sum magenta-logs.tgz
081ddb9ef53e90ca74fce52412318d675d25f0e7 magenta-logs.tgz
It actually causes X to crash for me; here is the Xorg.log and xtrace
Correction: # first bad commit: [51c87f9acaa0c4eaaea581092088e1d524396f35] sna/io: Propagate failure to XOR uploads This commit causes X to crash whenever I open EOG and try to open a picture. Before then, things work fine, but when I try to open a picture, the screen goes black and I'm returned to the bootup logs. What logs from which commits would help? Created attachment 88115 [details]
Video of magenta
commit 51590e55c0187862174a7dd5775915216b3137a6
Comment on attachment 88115 [details]
Video of magenta
Confirmation that commit 51590e55c0187862174a7dd5775915216b3137a6 with the magenta applied patch does indeed work.
(In reply to comment #32) > Correction: > > # first bad commit: [51c87f9acaa0c4eaaea581092088e1d524396f35] sna/io: > Propagate failure to XOR uploads > > This commit causes X to crash whenever I open EOG and try to open a picture. > Before then, things work fine, but when I try to open a picture, the screen > goes black and I'm returned to the bootup logs. What logs from which commits > would help? Hmm, The crashing full debug log and the stderr capture (something like /var/log/gdm/:0.log). Created attachment 88129 [details]
Various logs from Xorg crashing
[conley@styrka ~]$ sha1sum Xorg-crash-logs.tgz
1dfa9918097d5a7bde35fb04e41879c8ce73c869 Xorg-crash-logs.tgz
Various logs from /var/log/ folder (including Xorg and gdm) after a Xorg core dump due to attempting to open a picture in EOG
I don't see anything related to a crash in there. And I am still none the wiser as to the original bug. Being optimistic has anything changed very recently? Well, now I get a crash when opening any (and I mean any) image in EOG. However, all images (including the problem ones) work fine in Shotwell. Also, the latest driver really speeds up Gnome 3 for me :D Log-in is now super fast, whereas it was really slow before. There are some frequent short hangs though. Either way, looking forward to 3.0!! Anyway, could this somehow be an EOG bug? Or a bug of the rendering library it uses, perhaps. I don't know how it renders images. But it looks like shotwell uses libjpeg via libgphoto2. Who crashes? X or eog? eog shouldn't be impacted that much by changes in the ddx, but X should never die. Any errors reported in Xorg.0.log would be very useful to know. Created attachment 88482 [details]
Xorg crash logs from opening picture in EOG
[conley@styrka ~]$ sha1sum 2.99.905-latest-X-crash.tgz
d53b99bd6fc0d3062d90edf69949bddb87ded2ab 2.99.905-latest-X-crash.tgz
X crashes, though I don't see anything in the logs really. Basically, it goes back to the kernel messages from bootup (I have quiet set, so just the system fsck, etc.) and then it sits there while it reloads gdm. So I'm assuming that's an X crash?
I am pretty sure those are not the interesting logs... Can you check for Xorg.0.log.old and :0.log.1? I know what you mean. However, the old logs contain nothing out of the usual; they both end with a successful server termination. I deleted my Xorg.0.log right before restarting to make sure that the relevant output would go to that file. http://www.speedyshare.com/jQx8E/download/old-logs.tgz Old logs, if you want to check them out. Exactly, I think the issue here is that the DE is dieing resulting in a clean shutdown and restart of X. Nothing in dmesg or ~/.xsession-errors? Created attachment 88542 [details]
Xorg segfaulting
[conley@styrka ~]$ sha1sum Xorg-logs-segfault.tgz
7a7afb61a95c07f990796d1b82f46256e9d4a5e1 Xorg-logs-segfault.tgz
Aha! It's in the "old" logs! X is segfaulting right after calling sna_write_boxes__xor
Created attachment 88543 [details]
Xorg Segfault (other logs)
[conley@styrka ~]$ sha1sum Xorg-other-logs-segfault.tgz
35e1605be0f8b78362cb9c57ece3c92668816d99 Xorg-other-logs-segfault.tgz
Uploading in case these are useful.
That leaves two possibilities: the code generated for memcpy_xor is broken or the source pointer is invalid. Try diff --git a/src/sna/blt.c b/src/sna/blt.c index 4c27678..43e082a 100644 --- a/src/sna/blt.c +++ b/src/sna/blt.c @@ -39,7 +39,7 @@ #include <xmmintrin.h> #if __x86_64__ -#define have_sse2() 1 +#define have_sse2() 0 #else enum { MMX = 0x1, and see what happens. Sefaults just the same: [ 204.497] search_linear_cache: num_pages=4544, flags=2, use_active? 0, use_large=0 [max=65536] [ 204.497] search_linear_cache: inactive and cache bucket empty [ 204.497] search_linear_cache: active cache bucket empty [ 204.497] search_linear_cache: num_pages=4544, flags=7, use_active? 0, use_large=0 [max=65536] [ 204.497] search_linear_cache: inactive and cache bucket empty [ 204.497] search_linear_cache: active cache bucket empty [ 204.497] create_snoopable_buffer: created CPU (LLC) handle=70 for buffer, size 4544 [ 204.497] kgem_bo_map__cpu(handle=70, size=18612224, map=(nil):(nil)) [ 204.497] kgem_trim_vma_cache: type=1, count=-32750 (bucket: 12) [ 204.497] kgem_bo_map__cpu: caching CPU vma for 70 [ 204.497] kgem_create_buffer(pages=4544 [4544]) new handle=70, used=9216000, write=1 [ 204.497] kgem_create_proxy: target handle=70 [proxy? -1], offset=0, length=9216000, io=1 [ 204.497] sna_write_boxes__xor: box(0, 0), (1920, 1200), src=(0, 0), dst=(0, 0) [ 204.497] memcpy_xor: src=(0, 0), dst=(0, 0), size=1920x1200, pitch=7680/7680, bpp=32, and=ffffffff, xor=ff000000 [ 204.562] (EE) [ 204.562] (EE) Backtrace: [ 204.581] (EE) 0: /usr/bin/Xorg (xorg_backtrace+0x3d) [0x57fa3d] [ 204.582] (EE) 1: /usr/bin/Xorg (0x400000+0x1837a9) [0x5837a9] [ 204.582] (EE) 2: /usr/lib/libpthread.so.0 (0x7f97a1f40000+0xf870) [0x7f97a1f4f870] [ 204.582] (EE) 3: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f979f7d1000+0x24d0d) [0x7f979f7f5d0d] [ 204.582] (EE) 4: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f979f7d1000+0xb7b2c) [0x7f979f888b2c] [ 204.582] (EE) 5: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f979f7d1000+0xb977b) [0x7f979f88a77b] [ 204.582] (EE) 6: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f979f7d1000+0x84cdf) [0x7f979f855cdf] [ 204.582] (EE) 7: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f979f7d1000+0x8da6f) [0x7f979f85ea6f] [ 204.582] (EE) 8: /usr/bin/Xorg (0x400000+0x10bd02) [0x50bd02] [ 204.582] (EE) 9: /usr/bin/Xorg (0x400000+0x1055fa) [0x5055fa] [ 204.582] (EE) 10: /usr/bin/Xorg (0x400000+0x3746e) [0x43746e] [ 204.582] (EE) 11: /usr/bin/Xorg (0x400000+0x269ea) [0x4269ea] [ 204.582] (EE) 12: /usr/lib/libc.so.6 (__libc_start_main+0xf5) [0x7f97a0fc3bc5] [ 204.582] (EE) 13: /usr/bin/Xorg (0x400000+0x26d31) [0x426d31] [ 204.582] (EE) [ 204.582] (EE) Segmentation fault at address 0x7f979a382000 [ 204.582] (EE) Fatal server error: [ 204.582] (EE) Caught signal 11 (Segmentation fault). Server aborting [ 204.582] (EE) [ 204.582] (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. [ 204.582] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. [ 204.582] (EE) [ 204.582] (II) AIGLX: Suspending AIGLX clients for VT switch [ 204.582] sna_leave_vt [ 204.916] (EE) Server terminated with error (1). Closing log file. Pardon, THIS is with the patch applied. Last comment is from before. [ 5859.991] search_linear_cache: num_pages=4544, flags=2, use_active? 0, use_large=0 [max=65536] [ 5859.991] search_linear_cache: inactive and cache bucket empty [ 5859.991] search_linear_cache: active cache bucket empty [ 5859.991] search_linear_cache: num_pages=4544, flags=7, use_active? 0, use_large=0 [max=65536] [ 5859.991] search_linear_cache: inactive and cache bucket empty [ 5859.991] search_linear_cache: active cache bucket empty [ 5859.991] create_snoopable_buffer: created CPU (LLC) handle=68 for buffer, size 4544 [ 5859.991] kgem_bo_map__cpu(handle=68, size=18612224, map=(nil):(nil)) [ 5859.991] kgem_trim_vma_cache: type=1, count=-32755 (bucket: 12) [ 5859.991] kgem_bo_map__cpu: caching CPU vma for 68 [ 5859.991] kgem_create_buffer(pages=4544 [4544]) new handle=68, used=9216000, write=1 [ 5859.991] kgem_create_proxy: target handle=68 [proxy? -1], offset=0, length=9216000, io=1 [ 5859.991] sna_write_boxes__xor: box(0, 0), (1920, 1200), src=(0, 0), dst=(0, 0) [ 5859.991] memcpy_xor: src=(0, 0), dst=(0, 0), size=1920x1200, pitch=7680/7680, bpp=32, and=ffffffff, xor=ff000000 [ 5860.065] (EE) [ 5860.065] (EE) Backtrace: [ 5860.065] (EE) 0: /usr/bin/Xorg (xorg_backtrace+0x3d) [0x57fa3d] [ 5860.065] (EE) 1: /usr/bin/Xorg (0x400000+0x1837a9) [0x5837a9] [ 5860.065] (EE) 2: /usr/lib/libpthread.so.0 (0x7f9193bbb000+0xf870) [0x7f9193bca870] [ 5860.065] (EE) 3: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f919144d000+0x24b48) [0x7f9191471b48] [ 5860.065] (EE) 4: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f919144d000+0xb7500) [0x7f9191504500] [ 5860.065] (EE) 5: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f919144d000+0xb914f) [0x7f919150614f] [ 5860.065] (EE) 6: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f919144d000+0x846b3) [0x7f91914d16b3] [ 5860.065] (EE) 7: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f919144d000+0x8d443) [0x7f91914da443] [ 5860.065] (EE) 8: /usr/bin/Xorg (0x400000+0x10bd02) [0x50bd02] [ 5860.066] (EE) 9: /usr/bin/Xorg (0x400000+0x1055fa) [0x5055fa] [ 5860.066] (EE) 10: /usr/bin/Xorg (0x400000+0x3746e) [0x43746e] [ 5860.066] (EE) 11: /usr/bin/Xorg (0x400000+0x269ea) [0x4269ea] [ 5860.066] (EE) 12: /usr/lib/libc.so.6 (__libc_start_main+0xf5) [0x7f9192c3ebc5] [ 5860.066] (EE) 13: /usr/bin/Xorg (0x400000+0x26d31) [0x426d31] [ 5860.066] (EE) [ 5860.066] (EE) Segmentation fault at address 0x7f918e096000 [ 5860.066] (EE) Fatal server error: [ 5860.066] (EE) Caught signal 11 (Segmentation fault). Server aborting [ 5860.066] (EE) [ 5860.066] (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. [ 5860.066] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. [ 5860.066] (EE) [ 5860.066] (II) AIGLX: Suspending AIGLX clients for VT switch [ 5860.066] sna_leave_vt [ 5860.406] (EE) Server terminated with error (1). Closing log file. Ho hum, and the hack from comment 29 still succeeds in producing magenta? I think that is pointing very much towards that the source pointer here is invalid. ??? I've tweaked your troublesome path (though I haven't spotted the cause of your bug, I've spotted another one instead). Can you please upload a new debug trace when you get a chance? (In reply to comment #50) > Ho hum, and the hack from comment 29 still succeeds in producing magenta? I > think that is pointing very much towards that the source pointer here is > invalid. ??? Before today's patches, X segfaults both with and without the patch (the logs I sent you were with the patch applied). With today's patches, yes, the magenta patch still works, though it did cause my computer to hang entirely. Created attachment 88607 [details]
Video of magenta patch
Computer hangs at the end.
Created attachment 88608 [details]
Video of magenta patch
Computer hangs at the end.
Created attachment 88609 [details]
Log with no magenta patch applied
Created attachment 88610 [details]
Video with no magenta patch
Comment on attachment 88610 [details]
Video with no magenta patch
Aside from the next button briefly turning magenta, notice that the buttons and everything else do not turn magenta like they used to.
Don't know why it hung, but I spotted the bug at last! Thanks for your patience and help: commit 587c4866652e40e1e228b333028114766a6d3b08 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Nov 4 15:10:40 2013 +0000 sna: Promote uint16_t to a full int to avoid overflow in computing w*h in memcpy_xor Reported-by: Conley Moorhous <conleymoorhous@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70527 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (In reply to comment #58) > Don't know why it hung, but I spotted the bug at last! Thanks for your > patience and help: > > commit 587c4866652e40e1e228b333028114766a6d3b08 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Mon Nov 4 15:10:40 2013 +0000 > > sna: Promote uint16_t to a full int to avoid overflow in computing w*h > in memcpy_xor > > Reported-by: Conley Moorhous <conleymoorhous@gmail.com> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70527 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> I can confirm that you fixed it :D And I thank you for your patience with me! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.