Description
Ognian Tenchev
2013-10-27 17:04:17 UTC
xrestop? (In reply to comment #1) > xrestop? you need xrestop with 99.905? I need xrestop and /sys/kernel/debug/dri/0/i915_gem_objects from the supposed leak. (In reply to comment #3) > I need xrestop and /sys/kernel/debug/dri/0/i915_gem_objects from the > supposed leak. cat /sys/kernel/debug/dri/0/i915_gem_objects 416 objects, 373006336 bytes 62 [62] objects, 112177152 [112177152] bytes in gtt 0 [0] active objects, 0 [0] bytes 62 [62] inactive objects, 112177152 [112177152] bytes 199 unbound objects, 57036800 bytes 0 purgeable objects, 0 bytes 23 pinned mappable objects, 24096768 bytes 16 fault mappable objects, 48234496 bytes 268435456 [268435456] gtt total X: 414 objects, 372871168 bytes (0 active, 43479040 inactive, 57036800 unbound) xfwm4: 0 objects, 0 bytes (0 active, 0 inactive, 0 unbound) Created attachment 88187 [details]
xresetop
The number of objects here does not seem disproportionate to the amount of pixmaps allocated by the clients. (In reply to comment #6) > The number of objects here does not seem disproportionate to the amount of > pixmaps allocated by the clients. I have no idea what that mean but RES for X now says 400MB: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4155 root 20 0 489552 400068 385772 S 4.8 12.9 1:05.22 X And I can see for example in LibreOffice missing background from highlighted menu :) And in Geany text is displayed again after last row :D hm ... I switch to UXA with 2.99.905 in xorg.conf and now RES Xorg memory stays around 22-26MB just like 2.21.15. 2.99.x use SNA by default, 2.21.x with UXA by default may be ... but why there is so much difference between UXA and SNA in Xorg memory usage? And why my SNA just keep eat RAM ... Today all day working with SNA, RES Xorg memory was around 80MB but I never open large JPEG. Then open one large JPEG in Firefox and memory jump to 300MB and keep growing to 500MB ... Because the system doesn't report the memory that UXA uses to the process, whereas SNA uses CPU mappings of the bo that do show up in RES. cat /sys/kernel/debug/dri/0/i915_gem_objects OK - Thanks for explanation. But I still think that there is something broken. I switch back to SNA in xorg.conf and: # SNA - before cat /sys/kernel/debug/dri/0/i915_gem_objects 481 objects, 185946112 bytes 245 [241] objects, 248385536 [245698560] bytes in gtt 0 [0] active objects, 0 [0] bytes 245 [241] inactive objects, 248385536 [245698560] bytes 165 unbound objects, 60375040 bytes 36 purgeable objects, 25972736 bytes 23 pinned mappable objects, 24096768 bytes 35 fault mappable objects, 22155264 bytes 268435456 [268435456] gtt total X: 479 objects, 185810944 bytes (0 active, 123117568 inactive, 60375040 unbound) xfwm4: 0 objects, 0 bytes (0 active, 0 inactive, 0 unbound) Then I start firefox and open this image: http://dna-bucket-dna.cf.rawcdn.com/files/116016/original/_14P2868.jpg # SNA - after cat /sys/kernel/debug/dri/0/i915_gem_objects 362 objects, 208642048 bytes 57 [57] objects, 106934272 [106934272] bytes in gtt 0 [0] active objects, 0 [0] bytes 57 [57] inactive objects, 106934272 [106934272] bytes 127 unbound objects, 33849344 bytes 0 purgeable objects, 0 bytes 23 pinned mappable objects, 24096768 bytes 10 fault mappable objects, 41943040 bytes 268435456 [268435456] gtt total X: 360 objects, 208506880 bytes (0 active, 43421696 inactive, 33849344 unbound) xfwm4: 0 objects, 0 bytes (0 active, 0 inactive, 0 unbound) Now I start to see artifacts on my screen. I will attach them in this bug. before - LibreOffice normal menu before opening large image in firefox after - same menu but after opening large image in firefox (missing background on selected item) afert2 - there are two rows of text bellow actual text on the bottom - Geany editor :) after3- buttons on LibreOffice dialog with strange borders ... So something clearly broke after I open large image ... Created attachment 88250 [details]
beore
Created attachment 88251 [details]
after
Created attachment 88252 [details]
after2
Created attachment 88253 [details]
after3
Created attachment 88254 [details]
gkrellm artefacts
this is gkrellm which also show something wrong ...
I've spent a couple of days valgrinding X running various workloads and not found a leak. (In reply to comment #16) > I've spent a couple of days valgrinding X running various workloads and not > found a leak. OK. Let's assume that there is no memory leak. What about screen corruption then? I have no problems with screen artefacts until open large image. Actually my X is running for a day now and memory stay around 90MB with SNA. But then again I never open large image. If I open only one then memory starts to grow and artefacts are back ... I have to reset X to get rid of screen corruption. I will switch to UXA now. And may be some day ... it will be fixed :) My gut feeling was that they were GPU allocation failures, which we handle by dropping the operation (though it should also try to fallback to using the CPU) rather than crashing. Weird thing is that X don't crash and there is no error anywhere ... dmesg show nothing, Xorg.log show nothing. Thanks for the response btw OK this seems to be Firefox specific problem with SNA. I'm using now google chrome with SNA and 905 and can open as much as I like (actually 5 but ...) large images (>5000px wide) and X never use more than 50MB RES memory. (btw driver with last git ed282456240cc0a7ae9a235ea8aea14a8b8a54ef corrupt xfwm4 title bars) The difference there is that firefox stores images in X and uses GPU acceleration, Chromium does not. And the recent corruption in the titlebar is from: commit c6b0e3fe0c299488932ba0392847f1faf298d079 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Oct 30 11:52:05 2013 +0000 sna: Detect and handle mi recursion Now reverted and the issue with the potential recursion fixed differently. What's the current status? Are you able to reproduce the apparent leak or any of the corruption on the latest tip of xf86-video-intel [82e6d41c2f4f343bd1854d3d8ee4b624b5d68971] ? (In reply to comment #23) > What's the current status? Are you able to reproduce the apparent leak or > any of the corruption on the latest tip of xf86-video-intel > [82e6d41c2f4f343bd1854d3d8ee4b624b5d68971] ? I made checkout after comment 22 and borders are OK. I can't comment leak (and other corruptions) because I don't open large images since I need to finish my work :) But later tonight I will test again with Firefox because I'm pretty sure that problem don't exists with Chrome. Also I remove acceleration in Firefox, so I will report later. OK I have some news. Update to 82e6d41c2f4f343bd1854d3d8ee4b624b5d68971 and start use Firefox to open large images. So X memory goes up to 400+ MB. And then X crash :) But until that time no corruption so it's better than before - I can open more than one image before crash with now corruption. Then I try again. Open five or six images with Firefox then X freeze for couple of seconds. Mouse not move, but I can switch to console. Then I back to X it was alive and no crashed. But background was messed up with white and silver rows :) Also just one window have same border like with ed282456240cc0a7ae9a235ea8aea14a8b8a54ef. Memory grows up, but after closing Firefox it drops to around 100MB. I can't see any error messages in dmesg or Xorg.log. I remember that some time ago I see errors in dmesg when X crashes but now no massage - nothing. (In reply to comment #25) > OK I have some news. Update to 82e6d41c2f4f343bd1854d3d8ee4b624b5d68971 and > start use Firefox to open large images. So X memory goes up to 400+ MB. And > then X crash :) But until that time no corruption so it's better than before > - I can open more than one image before crash with now corruption. That's even worse. Please give me a backtrace for the crash. (In reply to comment #26) > (In reply to comment #25) > > OK I have some news. Update to 82e6d41c2f4f343bd1854d3d8ee4b624b5d68971 and > > start use Firefox to open large images. So X memory goes up to 400+ MB. And > > then X crash :) But until that time no corruption so it's better than before > > - I can open more than one image before crash with now corruption. > > That's even worse. Please give me a backtrace for the crash. I can follow the steps to produce backtrace, but I first need to know them :) Sorry, but I'm just a user here ... and give some time first. I messed with some settings in Firefox yesterday about images and will revert them back before confirm X crash for sure. May be it's my bad ... X should never, ever crash. If it does, please grab the Xorg.0.log and file a bug report. haha :) It's crash for sure. I restore Firefox settings but this time X freeze, and I can't switch to console. Mouse actually working but very very slowly ... I just leave laptop and after a while X crash, but actually it was killed because it use too much RAM :) I will attach dmesg and Xorg.log Created attachment 88415 [details]
dmesg after X was killed because out of memory
Created attachment 88416 [details]
Xorg.log after crash
Oh dear, looks like it recursed unto it itself. Can you please run 'addr2line -i -e /usr/lib/xorg/modules/drivers/intel_drv.so 0x1f420 0x3c4f9 0x41133 0x71a65 0x411ca' addr2line -i -e /usr/lib/xorg/modules/drivers/intel_drv.so 0x1f420 0x3c4f9 0x41133 0x71a65 0x411ca ??:0 ??:0 ??:0 ??:0 ??:0 Sigh. No debug symbols. Please compile with debug symbols and avoid stripping on installation. I build 82e6d41c2f4f343bd1854d3d8ee4b624b5d68971 with --enable-debug but X don't even start. gdm login screen is up and then when desktop is shown X crash and login screen again is shown. No information about crash in Xorg.log. With startx I see glimpse ot desktop then black screen and can't even switch to console. Just black screen. I have to reboot at this stage with magic sysrq because ctrl+alt+del not working. I will try again later to build with debug - I'm little busy right now .. sorry Probably the assertion failures fixed with commit 6b1a6f32179f7bff8503c6b8b38351a7cf1d08b7 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Nov 1 10:48:06 2013 +0000 sna: Scale uses of aperture_mappable by PAGE_SIZE After converting aperture_mappable to count in pages, there were a few residual users expecting a byte count. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71117 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> which is also a good candidate to explain the recursion you hit. I update to 5da329735ca79517a326aee002685bf33e8db861 Driver is now build with --enable-debug. X is running this time, but crash after I open couple of large images - not in same time. Open one - close tab, open another one close and so on. X crash first time when open first image, then after third ... just random. Strange thing is that there is no error in Xorg.log. More likely an assertion failure in that case: check stderr, often captured in /var/log/gdm/:0.log or similar. last row from gdm log: X: kgem.c:333: __kgem_bo_map__gtt: Assertion `kgem_bo_can_map(kgem, bo)' failed. Hmm, you can safely delete that assertion. It's purpose is to warn me of dubious mappings - the kernel will reject it if it truly is unmappable. I have no problem with it, but my X will not stop to crash in that way :) I just don't find anything other than that: :0.log.1: Initializing built-in extension XFree86-VidModeExtension Initializing built-in extension XFree86-DGA Initializing built-in extension XFree86-DRI Initializing built-in extension DRI2 Loading extension GLX X: kgem.c:333: __kgem_bo_map__gtt: Assertion `kgem_bo_can_map(kgem, bo)' failed. If I can make something to help more - welcome :) Found one candidate that could trigger your assertion: commit 6cb84c8d55f2f7cbb087a479c1dbc8bc58e97183 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Nov 1 15:57:56 2013 +0000 sna: Guard the replace-with-xor fallback path Before attempting to map the destination for uploading into after a failure to use the BLT, we need to recheck that it is indeed mappable. References: https://bugs.freedesktop.org/show_bug.cgi?id=70924 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> sorry ... it crash again with 6cb84c8d55f2f7cbb087a479c1dbc8bc58e97183 now I see on Xorg.log this: [ 3714.087] batch[3/0]: 58 58 65528, nreloc=14, nexec=5, nfence=2, aperture=52570, fenced=32768, high=24576: errno=28 [ 3714.087] exec[0] = handle:518, presumed offset: a255000, size: 35999744, tiling 1, fenced 1, snooped 0, deleted 0 [ 3714.087] exec[1] = handle:516, presumed offset: 4000000, size: 35999744, tiling 1, fenced 1, snooped 0, deleted 0 [ 3714.087] exec[2] = handle:520, presumed offset: 3e00000, size: 71663616, tiling 0, fenced 1, snooped 0, deleted 0 [ 3714.087] exec[3] = handle:521, presumed offset: 2200000, size: 71663616, tiling 0, fenced 1, snooped 0, deleted 0 [ 3714.087] exec[4] = handle:3, presumed offset: 6dd000, size: 4096, tiling 0, fenced 0, snooped 0, deleted 0 [ 3714.087] reloc[0] = pos:16, target:0, delta:0, read:2, write:2, offset:a255000 [ 3714.087] reloc[1] = pos:28, target:1, delta:0, read:2, write:0, offset:4000000 [ 3714.087] reloc[2] = pos:48, target:2, delta:0, read:2, write:2, offset:3e00000 [ 3714.087] reloc[3] = pos:60, target:0, delta:0, read:2, write:0, offset:a255000 [ 3714.087] reloc[4] = pos:80, target:2, delta:0, read:2, write:2, offset:3e00000 [ 3714.087] reloc[5] = pos:92, target:3, delta:0, read:2, write:0, offset:2200000 [ 3714.088] reloc[6] = pos:112, target:2, delta:0, read:2, write:2, offset:3e00000 [ 3714.088] reloc[7] = pos:124, target:3, delta:0, read:2, write:0, offset:2200000 [ 3714.088] reloc[8] = pos:144, target:2, delta:0, read:2, write:2, offset:3e00000 [ 3714.088] reloc[9] = pos:156, target:3, delta:0, read:2, write:0, offset:2200000 [ 3714.088] reloc[10] = pos:176, target:2, delta:0, read:2, write:2, offset:3e00000 [ 3714.088] reloc[11] = pos:188, target:3, delta:0, read:2, write:0, offset:2200000 [ 3714.088] reloc[12] = pos:208, target:2, delta:0, read:2, write:2, offset:3e00000 [ 3714.088] reloc[13] = pos:220, target:3, delta:0, read:2, write:0, offset:2200000 [ 3714.088] Aperture size 268435456, available 244338688 I get rid of gdm and use startx. this time after X crash everything goes black. Nothing on screen, no input can be done. Can't reboot with ctrl+alt+del. Just pure black screen :) ah I forgot ... images which I open were only half OK ... lower half was distorted ... I will attach one here Created attachment 88491 [details]
half OK image
Please always attach the Xorg.log with the crash info - even without the debug symbols available, I can often workout where the crash is likely to be, and it helps avoid treating multiple issues as one. Created attachment 88515 [details]
Xorg.log after crash
That we encounter ENOSPC when submitting the batch is indeed worrying and needs to be resolved, but does it actually crash after the error? It should disable acceleration (losing the batch in the process and causing corruption) but it should work thereafter. I encounter twice black screen with nothing on it (actually this log is from second black screen, no first one which I report early, but log looks the same). No mouse, can't type on keyboard for example to reboot or to restart X (I assume it is down to console), but nothing happened. No cltr+alt+backspace. So I have to reboot with magic sysrq. May be it's not crash but lockup ... I don't know how can you name it :) Ok, that sounds like you hit a page-fault-of-doom on the fallback path. The objects are just big enough for it to try mapping both of them at once, but is unable to fit both simultaneously into the aperture. The result is that it has to swap both objects in and out of the aperture around every single byte. That is slow enough for the computer to appear to be unresponsive. Sigh. It is meant to fallback to CPU mappings to prevent this. Created attachment 88534 [details]
Xorg.log after crash
today Xorg.log after crash with 6cb84c8d55f2f7cbb087a479c1dbc8bc58e97183
That's very unusual and very unexpected. Hmm, trying to retrace it locally gives an invalid line number. Is there no chance you can make gentoo install the bug symbols for the ddx (as you have for the Xserver)? (In reply to comment #52) > That's very unusual and very unexpected. Hmm, trying to retrace it locally > gives an invalid line number. Is there no chance you can make gentoo install > the bug symbols for the ddx (as you have for the Xserver)? ddx = xf86-video-intel? What option I have to add add to configure? This trace is with --enable-debug to driver. Yes, ddx here is xf86-video-intel. There is no specific option, the default cflags include the debug symbols. So all that needs to be done is be sure that those flags are not overridden and the driver is not stripped upon install. If you build by hand (e.g. ./autogen --prefix=/usr && make install) it should install the debug symbols. (Unless your /bin/install strips those by default...) I've some patches being tested that should improve the earlier symptoms (ENOSPC). Watch this space. ah I see "strip" ... OK I build driver without stripping it. Crashed twice but Xorg.log have no information about crash. Only in gdm I have found this: X: /mnt/storage/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/sna_blt.c:441: sna_blt_copy_one: Assertion `(src_y + height) * blt->bo[0]->pitch <= kgem_bo_size(blt->bo[0])' failed. The ENOSPC issue should be fixed now. I need a full stacktrace or debug log to be able to work out how you hit that assertion though. :| ... and I need to know how to make that log/trace :( checkout to 4a7217b05c232484a80abc7bd67494996dd32057 Crash again after first open large image. Xorg.log is clean - no error message. gdm log finish with same message on different line: X: /mnt/storage/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/sna_blt.c:500: sna_blt_copy_one: Assertion `(src_y + height) * blt->bo[0]->pitch <= kgem_bo_size(blt->bo[0])' failed. I will attach full log here Created attachment 88540 [details]
gdm log
I have another one: X: /mnt/storage/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/sna_accel.c:3735: sna_pixmap_move_to_gpu: Assertion `priv->gpu_bo->proxy == ((void *)0)' failed. This happened twice when I visit site which I believe have no big images on it. And sometimes I can see the web page ... sometimes X crash :) Ugh. Do you have a list of specific webpages that cause the most trouble on your machines? Not sure if I have a surviving 945gm, most of my gen3 are g33/pnv which have a different aperture and not quite so limited, or as likely to hit the same paths as your machine. :| Actually my X crashed twice only on this page when I visit it with Firefox: http://www.f1fanatic.co.uk/2013/11/03/no-penalty-for-alonso-over-vergne-incident/ error was: X: /mnt/storage/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/sna_accel.c:3735: sna_pixmap_move_to_gpu: Assertion `priv->gpu_bo->proxy == ((void *)0)' failed. I don't have any troubles with other pages. Only with big images (which I open in new tab in Firefox). And then error is: X: /mnt/storage/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/sna_blt.c:500: sna_blt_copy_one: Assertion `(src_y + height) * blt->bo[0]->pitch <= kgem_bo_size(blt->bo[0])' failed. I'm not sure if it is only driver problem, or it is Firefox problem ... X dying here is a problem with xf86-video-intel. Afaict everything is ok atm on my gen2 devices (which are more memory & aperture constained than your gen3) and on which I was proving the aperture space checks. As I try to reproduce this locally, can you please see if you can reproduce the issues with full debugging enabled (it will dramatically slow X down and generate voluminous log files) with ./configure --enable-debug=full (I think there is also a USE option if you prefer)? The log files will be massive, but should compress well with xz. Do you need whole gdm and Xorg logs? gdm log is 372MB uncompressed, 8MB compressed Xorg log is 8.2MB uncompressed, 250KB uncompressed The last 1000 lines should be enough... I hope. Created attachment 88601 [details]
gdm log with full debug
Created attachment 88602 [details]
Xorg log with full debug
Hmm, I think that should explain the half-ok image: commit 4734354209897448af61b7c3fcb35ef1ced8b11f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Nov 4 12:57:01 2013 +0000 sna: Apply the BLT source offset for individual copies Folloinw a complex path through multiple layers of indirections and tiling fallbacks, resulted in hitting a path where the source offset was subsequently ignored. This leads to the operation reading from invalid memory (or hitting the assert warning about the same). References: https://bugs.freedesktop.org/show_bug.cgi?id=70924 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Thanks a lot for the traces. That leaves the proxy assert still unresolved, but can you please check if that at least fixes one issue? hm ... I can't see that commit. Last commit which I can see is: 82b646a42f5a6271c8518ad454f1603714276caf Sorry, forgot to push. Should be there now. OK I have good and bad news :) Good news is that X is still running after I opened may be 10 or more large images. Bad news is that corruption (ie menus on LibreOffice) is still there. I will try to repeat crash on f1fanatic web site now. Problem is that yesterday is doesn't crash every time ... The libreoffice menu corruption is the same as https://bugs.freedesktop.org/attachment.cgi?id=88252 and https://bugs.freedesktop.org/attachment.cgi?id=88253 ? (In reply to comment #72) > The libreoffice menu corruption is the same as > https://bugs.freedesktop.org/attachment.cgi?id=88252 and > https://bugs.freedesktop.org/attachment.cgi?id=88253 ? yes I have a question. Can kernel update from 3.12-rc7 to 3.12 correct corruption? Or can it be something between kernel and driver to cause corruption? I'm still running driver from commit 8f6e227ba8127a2ca034271f2a660c24abbe056f, which before kernel upgrade and reboot produce corruption after I open large image. Now I can't reproduce this. After opening couple of large images (one by one - open one, close it, open another one, close it) there is no corruption. But if I open let's say 5 or 6 large images, X freeze and then is killed with out of memory :) Created attachment 88647 [details]
gdm log after X oom
(In reply to comment #74) > I have a question. > > Can kernel update from 3.12-rc7 to 3.12 correct corruption? Or can it be > something between kernel and driver to cause corruption? 3.12-rc7 to 3.12, not that I am aware of - I didn't send any fixes. > I'm still running driver from commit > 8f6e227ba8127a2ca034271f2a660c24abbe056f, which before kernel upgrade and > reboot produce corruption after I open large image. > > Now I can't reproduce this. After opening couple of large images (one by one > - open one, close it, open another one, close it) there is no corruption. That sounds like an issue with memory fragmentation for a long running system, would be my first guess. And difficult to test. > But if I open let's say 5 or 6 large images, X freeze and then is killed > with out of memory :) Meh, still hitting recursion. Any chance you can capture that with debug symbols? Does 'addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x1f62d' resolve to anything useful? (In reply to comment #76) > (In reply to comment #74) > > I'm still running driver from commit > > 8f6e227ba8127a2ca034271f2a660c24abbe056f, which before kernel upgrade and > > reboot produce corruption after I open large image. > > > > Now I can't reproduce this. After opening couple of large images (one by one > > - open one, close it, open another one, close it) there is no corruption. > > That sounds like an issue with memory fragmentation for a long running > system, would be my first guess. And difficult to test. It was up for no more than a day, but I guess it was "damaged" with old driver which crashed X. Or something like that may be :) > > But if I open let's say 5 or 6 large images, X freeze and then is killed > > with out of memory :) > > Meh, still hitting recursion. Any chance you can capture that with debug > symbols? > Does 'addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x1f62d' > resolve to anything useful? Unfortunately no (:?) ... driver is not stripped and build with debug, but not with full debug. I will build it with full debug and post last messages here like before. Full logs before X get killed with oom GDM log: http://www.jeckyll.net/gdm.log.xz Xorg.log http://www.jeckyll.net/Xorg.0.log.xz This should prevent it from taking the recursive path in the first place: commit dc61705a6e425952de4c81c2320382af07cf948a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Nov 5 08:49:28 2013 +0000 sna: Use an inplace exchange for large untiled BO On older architectures, large BO have to be untiled and so we can reuse an existing CPU bo by adjusting its caching mode. References: https://bugs.freedesktop.org/show_bug.cgi?id=70924 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> and this should fix the recursion: commit f3225fcb38686f3b9701725bf3a11ecf1c100c3f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Nov 5 08:38:22 2013 +0000 sna: Be move conservative with tiling sizes for older fenced gen The older generations have stricter requirements for alignment of fenced GPU surfaces, so accommodate this by reducing our estimate available space for the temporary tile. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Created attachment 88685 [details]
Xorg log after update to dc61705a6e425952de4c81c2320382af07cf948a
X again was killed with oom (I think), but this time gdm can't recover and I have to reboot so not absolutely sure it was oom.
Created attachment 88686 [details]
gdm log after update to dc61705a6e425952de4c81c2320382af07cf948a
Also after update to dc61705a6e425952de4c81c2320382af07cf948a I start to see missing/distorted/replaced with underscore letters but can't make screenshot for now because they are repainted ... I will try harder! :) Created attachment 88691 [details]
firefox titlebar
distorted and after couple of second or mouse move over it is restored ok
Created attachment 88692 [details]
wrong image
You can see wrong image on the left and with red circle correct image. I have to right click on wrong image - select view image - it is shown again wrong and then I have to click reload to see actual image. I can see couple of distortions i.e. "skewed" images and other funny "effects" also :)
And I'm not sure if it is firefox problem or not. It is Firefox 25 from their site so ...
Texts are sometimes also not displayed properly but it is impossible for me to make screen shot - they are redisplayed every time i bring terminal or gnome-screen shot window :)
Created attachment 88693 [details]
top of the image is displayed wrong
But definitely not oom now? Minor victories! Now I suspect that the transformation from CPU to GPU is not entirely cache coherent. I've patched of a couple of bugs in the ddx commit 723f17ca4f9c120be5fe667bf2c3e35c7ee687be Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Nov 5 18:36:45 2013 +0000 sna: Submit execution on the bo before changing its caching status Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> commit 10b573c5084cabcc1bae70c8d35311fa5ec0a245 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Nov 5 18:29:46 2013 +0000 sna: Clear snoop flag after converting from a CPU bo Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> if that doesn't fix things, there is also a possibility of a kernel bug. In comment 80 X was killed by oom in my view ... but if you see in logs anything different ok. I will test these commits later since I have to finish some work ... sorry about that I have to be most annoying man on earth ... but X again crash: X: /mnt/storage/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/sna_accel.c:1990: _sna_pixmap_move_to_cpu: Assertion `(flags & 0x2) == 0 || priv->cpu_damage == ((void *)0)' failed. This is with 723f17ca4f9c120be5fe667bf2c3e35c7ee687be commit. And these are full debug logs: http://www.jeckyll.net/X/201311060018/Xorg.0.old.xz http://www.jeckyll.net/X/201311060018/gdm.log.xz I'm still not sure about last commits for corruption - have no see any until now, but I will confirm that later. Fixed the assertion failure - that could also have caused some corruption with debug disabled. commit f2f9019bae5f6f03b5e23da759d3871fc18dd9f4 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Nov 5 22:41:06 2013 +0000 sna: Only operate inplace if no existing CPU damage for a read Hopefully, commit ef842d2ceee4d1ccf8a0f8a81530dc8be8e18b44 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 6 08:56:01 2013 +0000 sna: Be more pessimistic for tiling sizes on older gen is the right fix for the oom. Created attachment 88744 [details]
large image scaled with distortion
OK I have good news.
Update to c3d5b1d8fcb1b65c35827d38bf5b309e433d0907.
1. No X crash after open/close may be 20 or more large images
2. No X lockup or OOM after opening 12 large images in same time. X RES memory climb to 500MB+ but after closing images and wait for about 10/15 second RES memory back to around 150MB.
3. Sometimes let's say 3 from 12 images I can see distortion on scaled image (see attached image). With no zoom images are OK. Sometimes it is displayed whole in black for a moment before it is full opened and then it is displayed OK when it is fully loaded. This can be Firefox issue. I use Firefox 24 from Gentoo portage. I will test with 25 from mozilla ftp site after a while.
Whilst testing can you keep assertions enabled? Hopefully that will help to catch these errors earlier. Thanks for your patience - lets hope this is the light at the end of the tunnel. "Whilst testing can you keep assertions enabled?" how? Firefox 25 it now playing nice with me. X again was killed with OOM. full debug: http://www.jeckyll.net/X/201311061457/Xorg.0.log.old.xz http://www.jeckyll.net/X/201311061457/gdm.log.xz It is strange because 24 was ok ... I will try to build 25 tonight on my pc and will test again may be tomorrow. (In reply to comment #92) > "Whilst testing can you keep assertions enabled?" how? Default to using --enable-debug when you can risk X crashing. > Firefox 25 it now playing nice with me. X again was killed with OOM. > full debug: > http://www.jeckyll.net/X/201311061457/Xorg.0.log.old.xz > http://www.jeckyll.net/X/201311061457/gdm.log.xz Thanks. :| I've pushed yet another workaround to try and prevent falling down into the blackhole - but I haven't spotted exactly what is wrong there, though oddities abound. commit ae380a960df6b3a9714d78eb6cb42249764488ba Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 6 14:51:42 2013 +0000 sna: Use tiling BLT fallback for BLT composite operations This avoid a circuituous route through the render pathways and multiple levels of tiling fallbacks to accomplish the same copy. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> and commit 7578809ddcb244ad78ebf86359b7ee2a61e27ff6 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 6 13:42:27 2013 +0000 sna: Trim create flags if tiled sizes are too large Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> commit 073465817f54507ab6b7f801c5dfab2c06f678c0 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 6 13:41:39 2013 +0000 sna: Fences are power-of-two sizes Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Created attachment 88764 [details]
gdm log after X crash
This was a fast crash :)
Update to 7a9c1e153a9208e8cd7680e478fde18e051beaa9, restart X, open first large image - crash :)
X: /mnt/storage/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video-intel-9999/src/sna/sna_tiling.c:980: sna_tiling_blt_composite: Assertion `op->op == 1' failed.
full logs to follow shortly
full debug: http://www.jeckyll.net/X/201311061741/gdm_full.log.xz http://www.jeckyll.net/X/201311061741/Xorg.log.xz (In reply to comment #95) > Created attachment 88764 [details] > gdm log after X crash > > This was a fast crash :) > > Update to 7a9c1e153a9208e8cd7680e478fde18e051beaa9, restart X, open first > large image - crash :) > > X: > /mnt/storage/tmp/portage/x11-drivers/xf86-video-intel-9999/work/xf86-video- > intel-9999/src/sna/sna_tiling.c:980: sna_tiling_blt_composite: Assertion > `op->op == 1' failed. Invalid assertion deleted. Created attachment 88765 [details]
no full image displayed
X is still up! :) X don't OOM! :)
OK but now images are displayed half, or big portion of them is missing. Strange but this is more like to happen on Firefox 25. Firefox 24 is managed to display them more often but it fails to ...
Created attachment 88766 [details]
another part of image is missing
Created attachment 88767 [details]
and some black rectangles
Can you please send me a full debug loading a few images? If you can capture a failure that would be useful as well - but not essential as I don't expect that it will be immediately apparent in the logs, except perhaps as notable by its absence. Sure. I will do full debug log later, but how can I capture failure? (In reply to comment #102) > Sure. I will do full debug log later, but how can I capture failure? Just a debug log from the session containing the rendering error. Not essential, but it might have a clue. At the moment, I just want to check through and make sure that the code does actually behave the way I think it should... As you have probably noticed, these paths are quite tricky as we are trying to work on images larger than the GPU can naively handle. Full debug logs: http://www.jeckyll.net/X/201311061908/Xorg.log.xz http://www.jeckyll.net/X/201311061908/gdm.log.xz BUT ... may be because X with full debug is slow images are displayed correctly. Actually I can see how they are drawn: first top half of an image then some delay, and then bottom half is displayed. With no debug (and fast X) images are drawn in similar manner but some portions are just not drawn. These portions are often under mouse cursor and if I move mouse over image while it is still loading there is a big chance to miss portion of image. With full debug I try to move mouse over, to open next image fast while first is still loading, but can't reproduce missing portion :( I try with both firefox 25, and firefox 24 and can't reproduce same black missing portions of image with full debug. Then I restart X and make another try and again can't reproduce. Above logs are from first attempt. There are from second attempt: http://www.jeckyll.net/X/201311061908/Xorg2.log.xz http://www.jeckyll.net/X/201311061908/gdm2.log.xz So interesting we don't hit the new tiling shortcut paths in the full-debug logs. Though due to the earlier assertion we know that that do get utilized. So either there is a residual resource issue that causes transient rendering to be dropped, or the new paths have a bug. I consider both quite likely. One very minor tweak as I noticed in your traces that upload buffers were being retained for longer than intended: commit 84d667b94a97ad5fde68d730d57a19e1f4241ed5 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Nov 8 08:53:55 2013 +0000 sna: Always schedule upload buffers for retirement after use Even if they are multiply referenced due to cached references. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Created attachment 88902 [details]
orphaned symbols on Geany editor
I have run b796c33411218aeaf4daaeff41a1bc442b5f945f for some time now and have to say that it don't crash and don't OOM.
I have image corruption, but I make a mistake and run kernel 3.10 which I tough is 3.12 (grub config error by my side) :(
So when I ask what changed between 3.12-rc7 and 3.12 actually I was running 3.10 and not 3.12!
I found mistake just this morning. So today all day I was running 3.12 and I don't see image corruption.
I just find orphaned symbols in Geany editor. You can see Screenshot. If you are interested I can try to make again full debug logs with Geany editor. May be this time I can simulate corruption.
Now I update to abf1a16914d993cc150005879375d4bb17fdccf3 - still orphaned symbols in Geany editor.
(In reply to comment #107) > Created attachment 88902 [details] > orphaned symbols on Geany editor > > I have run b796c33411218aeaf4daaeff41a1bc442b5f945f for some time now and > have to say that it don't crash and don't OOM. > > I have image corruption, but I make a mistake and run kernel 3.10 which I > tough is 3.12 (grub config error by my side) :( > > So when I ask what changed between 3.12-rc7 and 3.12 actually I was running > 3.10 and not 3.12! > > I found mistake just this morning. So today all day I was running 3.12 and I > don't see image corruption. Let me just clarify: 3.10 - corruption 3.12 - no corruption There was definitely one major corruption fixed in late 3.10 / 3.11. So if I think you just rediscovered that bug. > I just find orphaned symbols in Geany editor. You can see Screenshot. If you > are interested I can try to make again full debug logs with Geany editor. > May be this time I can simulate corruption. > > Now I update to abf1a16914d993cc150005879375d4bb17fdccf3 - still orphaned > symbols in Geany editor. Is this 3.12 or 3.10? (In reply to comment #108) > (In reply to comment #107) > > Created attachment 88902 [details] > > orphaned symbols on Geany editor > > > > I have run b796c33411218aeaf4daaeff41a1bc442b5f945f for some time now and > > have to say that it don't crash and don't OOM. > > > > I have image corruption, but I make a mistake and run kernel 3.10 which I > > tough is 3.12 (grub config error by my side) :( > > > > So when I ask what changed between 3.12-rc7 and 3.12 actually I was running > > 3.10 and not 3.12! > > > > I found mistake just this morning. So today all day I was running 3.12 and I > > don't see image corruption. > > Let me just clarify: > > 3.10 - corruption > 3.12 - no corruption > > There was definitely one major corruption fixed in late 3.10 / 3.11. So if I > think you just rediscovered that bug. Yes So far 3.12 show images correctly. These half displayed images and black rectangles on them were with 3.10. I feel really stupid about this mistake by my side ... > > I just find orphaned symbols in Geany editor. You can see Screenshot. If you > > are interested I can try to make again full debug logs with Geany editor. > > May be this time I can simulate corruption. > > > > Now I update to abf1a16914d993cc150005879375d4bb17fdccf3 - still orphaned > > symbols in Geany editor. > > Is this 3.12 or 3.10? 3.12, but these symbols were here in 3.10 also. As a rough guess the geany issue is the bug 71191 oddity. So assuming that everything else is finally working, lets concentrate the remaining discussion there. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.