Created attachment 49568 [details] Xorg Log I have Windows 7 running in a virtualbox (no acceleration defined). Sometimes it starts to flicker, it blacks out and repaints only areas on mouse movement. I see the following error in xorg log over and over again [107833.198] (WW) intel(0): intel_uxa_prepare_access: bo map failed: No space left on device eventually my xorg freezes and crashes. Restart of xorg does not work I have to reboot to get working again. I have a dual screen setup, one running on HDMI and the other on DP. The problematic virtualbox runs on the DP. lspci: 00:00.0 Host bridge: Intel Corporation Core Processor DRAM Controller (rev 02) 00:02.0 VGA compatible controller: Intel Corporation Core Processor Integrated Graphics Controller (rev 02) 00:19.0 Ethernet controller: Intel Corporation 82577LM Gigabit Network Connection (rev 05) 00:1a.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05) 00:1b.0 Audio device: Intel Corporation Device 3b57 (rev 05) 00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 1 (rev 05) 00:1c.1 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 2 (rev 05) 00:1c.2 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 3 (rev 05) 00:1c.3 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI Express Root Port 4 (rev 05) 00:1d.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset USB2 Enhanced Host Controller (rev 05) 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev a5) 00:1f.0 ISA bridge: Intel Corporation 5 Series/3400 Series Chipset LPC Interface Controller (rev 05) 00:1f.2 RAID bus controller: Intel Corporation Mobile 82801 SATA RAID Controller (rev 05) 00:1f.3 SMBus: Intel Corporation 5 Series/3400 Series Chipset SMBus Controller (rev 05) 00:1f.6 Signal processing controller: Intel Corporation 5 Series/3400 Series Chipset Thermal Subsystem (rev 05) 02:00.0 Network controller: Intel Corporation WiFi Link 6000 Series (rev 35) 03:00.0 SD Host controller: Ricoh Co Ltd Device e822 (rev 01) 3f:00.0 Host bridge: Intel Corporation Core Processor QuickPath Architecture Generic Non-core Registers (rev 02) 3f:00.1 Host bridge: Intel Corporation Core Processor QuickPath Architecture System Address Decoder (rev 02) 3f:02.0 Host bridge: Intel Corporation Core Processor QPI Link 0 (rev 02) 3f:02.1 Host bridge: Intel Corporation Core Processor QPI Physical 0 (rev 02) 3f:02.2 Host bridge: Intel Corporation Core Processor Reserved (rev 02) 3f:02.3 Host bridge: Intel Corporation Core Processor Reserved (rev 02)
Created attachment 49571 [details] dmesg lot's of errors in dmesg too
not sure if it is interesting but here is xrandr: Screen 0: minimum 320 x 200, current 3200 x 1200, maximum 8192 x 8192 eDP1 connected (normal left inverted right x axis y axis) 1366x768 60.2 + 1024x768 60.0 800x600 60.3 56.2 640x480 59.9 VGA1 disconnected (normal left inverted right x axis y axis) HDMI1 connected 1280x1024+1920+176 (normal left inverted right x axis y axis) 376mm x 301mm 1280x1024 60.0*+ 75.0 1152x864 75.0 1024x768 75.1 60.0 800x600 75.0 60.3 640x480 75.0 60.0 720x400 70.1 DP1 disconnected (normal left inverted right x axis y axis) HDMI2 disconnected (normal left inverted right x axis y axis) DP2 connected 1920x1200+0+0 (normal left inverted right x axis y axis) 519mm x 320mm 1920x1200 60.0*+ 1600x1200 60.0 1280x1024 75.0 60.0 1152x864 75.0 1024x768 75.1 60.0 800x600 75.0 60.3 640x480 75.0 60.0 720x400 70.1
Looks like something is leaking vma (might be through a bo leak?) and we exhaust our mmap space. Can you grab the contents of /sys/kernel/debug/dri/0/* after you hit this issue?
I will, but I cannot predict when it happens next, might be a couple of days.
oh that was too fast, I don't have debug in my kernel
apparently running into this with my i3 laptop. every once in a while Xorg.0.log gets flooded with: * [943834.434] (WW) intel(0): intel_uxa_prepare_access: bo map failed: No space left on device dmesg gets: * [943834.434] (WW) intel(0): intel_uxa_prepare_access: bo map failed: No space left on device primary problem that i can connect this to is that web videos do not want to switch to full screen mode. quite likely the flood occurs when i try to press the full screen video button on youtube or whatnot. there's a slight flicker, then return to window. other than that, i'm *not* seeing crashing or hanging. so far. here's what i have of the #intel-gfx conversation: --- Log opened K juuli 27 14:25:20 2011 14:25 danvet>macmaN, is this on a 32bit install? yes, pae enabled, 4gb ram total, no swap. $ free total used free shared buffers cached Mem: 3810164 3002584 807580 0 5976 1583180 -/+ buffers/cache: 1413428 2396736 Swap: 0 0 0 14:27 danvet>macmaN, can you pastebin i195_gem_objects from debugfs? /sys/kernel/debug/dri/0 $ cat i915_gem_objects 13273 objects, 247914496 bytes 1022 [855] objects, 87339008 [34508800] bytes in gtt 6 [6] active objects, 5406720 [5406720] bytes 6 [6] pinned objects, 4501504 [4501504] bytes 1010 [843] inactive objects, 77430784 [24600576] bytes 0 [0] freed objects, 0 [0] bytes 7 pinned mappable objects, 9744384 bytes 75 fault mappable objects, 380928 bytes 2147479552 [268435456] gtt total 14:31 danvet>macmaN, are you sometimes running more demanding stuff like games, hd video decoding? no games, but hd video yes, off youtube, running xbmc, vlc every once in a while. $ uptime 12:52:18 up 18 days, 20:57, 3 users, load average: 0.74, 0.36, 0.32 logoffs from X are very rare, other than kernel upgrades/debugging, usually the machine goes into overnight suspends. this is the first time i've seen these errors, not sure if uptime or the quantity of "demanding stuff" has reached this far before. 16:59 danvet>macmaN exhausted the drm_mmap_offset address range of 4gb ... ... comment for ickle
Created attachment 49660 [details] /sys/kernel/debug/dri/0/vma per ickle's request on further testing, full screen video is actually not a problem, i just played some vimeo stuff without issues. but loading this page http://www.youtube.com/user/freedrumlessons is enough to start erroring. i have videos blocked out with noscript, so we don't even have to have video to get errors.
could the vma issue this have been bettered since 2.6.39.3ish? i noticed that nowhere in my bug comments did i specify what kernel im running and noone has asked. does it matter or is it for sure unfixed in 3.x? $ uname -a Linux travelmate 2.6.39-pf2 #3 SMP PREEMPT Sat Jul 9 15:16:49 EEST 2011 i686 Intel(R) Core(TM) i3 CPU U 330 @ 1.20GHz GenuineIntel GNU/Linux
Looking at the vma report, it doesn't seem to be the issue per se. I'm trying to think of what could cause it otherwise. If you can pinpoint why later kernels appears to work better, that would help!
i have no idea about later kernels actually. after some very annoying btrfs BUG's i ran into in 2.6.38, i'm absolutely in love with the stability of this 2.6.39 setup. really don't have resources to take risks right away, but will keep the need in mind.
Created attachment 49837 [details] /proc/dri/0 not sure if it helps but these are the proc dri files
Created attachment 49951 [details] slabinfo after another incident of this
I get this now without VirtualBox. the screen does not blank, just parts of it do not draw until after I cross that area with my mouse. same error in the xorg log file
i have not seen this issue with 2.16.0 and post-2.16 git HEAD at all.
I updated to 2.16. I'll report back once it occurs again. I don't have a definitive test, but I know that it occurs if I have my VM running for more than 3 days so lets see.
Honestly I can't think of a single change that should have impacted upon this bug. So please don't get your hopes up too much that is fixed and stays fixed. :|
I have this now much more rapidly, without a VM to impact things. I'm on xf86-video-intel-2.17 and kernel 3.1.10 now. It now only takes a day of working with eclipse (seems to be tied to GTK, other SWT applications deliver the same result). Even worse now, once this starts to happen, the X CPU spikes regularly to 100% for half minutes at a time. I get the same error in the xorg log. at some point it gets so bad you can't work. restart X does not work then anymore only a reboot helps. reoccurring error in dmesg [drm:i915_gem_create_mmap_offset] *ERROR* failed to allocate offset for bo 0 reoccurring in xorg log [141361.521] (WW) intel(0): intel_uxa_prepare_access: bo map failed: No space left on device
I have this now much more rapidly, without a VM to impact things. I'm on xf86-video-intel-2.17 and kernel 3.1.10 now. It now only takes a day of working with eclipse (seems to be tied to GTK, other SWT applications deliver the same result). Even worse now, once this starts to happen, the X CPU spikes regularly to 100% for half minutes at a time. I get the same error in the xorg log. at some point it gets so bad you can't work. restart X does not work then anymore only a reboot helps. reoccurring error in dmesg [drm:i915_gem_create_mmap_offset] *ERROR* failed to allocate offset for bo 0 reoccurring in xorg log [141361.521] (WW) intel(0): intel_uxa_prepare_access: bo map failed: No space left on device This is becoming a real problem now and hits my working environment. Not sure why it became worse, but I think it is only after my recent update to the 3.x kernel series.
The light at the end of the tunnel is a long way away on this one, I'm afraid. I've a series of kernel patches that should prevent the ENOSPC, but they are not ready for review, and depend on another series that is also not ready. In the meantime, you could try enabling sna as that handles the bo cache completely differently and I hope doesn't quite get into the same trouble. The mmap address exhaustion is a real issue, though another possible workaround is to use a 64-bit kernel.
Could you please clearify on the SNA? I don't have sandybridge, but an Intel(R) Arrandale. On the other hand if you have patches that I can try out and you think are reasonable well working, I am happy to do that.
SNA works with all of our supported chipsets, and even on Ironlake is significantly faster than UXA. I am curious as to how it fare in this situation. The underlying problem still exists, just the usage of buffer objects might be sufficiently different to hide it. The tree for testing the ENOSPC fixes is available from http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=reap-mmap-offsets. I don't pretend that is in a clean state at all. ;-) The patch of interest is http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=reap-mmap-offsets&id=c4a07eef055773efba7855bcaf5f26277695a5ae
ok I'm using sna now, let's see. I'll try your patch later on the weekend. I have also another issue and wonder if it is related, since kernel 3.1 xrandr is taken a lot longer to do its thing (change screen arrangement and resolution) and it blocks even the mouse from working, stops the whole X for a full minute. Should I report a separate issue or can this be related.
(In reply to comment #22) > ok I'm using sna now, let's see. I'll try your patch later on the weekend. > > I have also another issue and wonder if it is related, since kernel 3.1 xrandr > is taken a lot longer to do its thing (change screen arrangement and > resolution) and it blocks even the mouse from working, stops the whole X for a > full minute. > > Should I report a separate issue or can this be related. Whilst we know of a reason why xrandr is slow in general (probing of disconnected outputs causes timeouts rather than a quick "not detected"), I was not aware that the situation had got any worse with 3.2
its 3.1.10 not yet 3.2, but yes it got far worse. It used to be slow, but not blocking X completely. If you let me know what kind of information you need I will open a new ticket for that
Just start the report of an Xorg.log with timings from 3.0, the bad Xorg.log with timings from 3.1.0 and an strace -tt of X from 3.1.0 would be useful. The first priority is just to open a ticket stating the problem so that we have it tracked and raise awareness of the issue.
*** Bug 46044 has been marked as a duplicate of this bug. ***
Also note that in bug 46044, we also hit the VFS file limit.
Can you please test whether disable the bo cache is sufficient to avoid the issue: commit 5b5cd6780ef7cae8f49d71d7c8532597291402d8 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Feb 24 11:14:26 2012 +0000 uxa: Add a option to disable the bo cache If you are suffering from regular X crashes and rendering corruption with a flood of ENOSPC or even EFILE reported in the Xorg.log, try adding this snippet to your xorg.conf: Section "Driver" Option "BufferCache" "False" EndSection References: https://bugs.freedesktop.org/show_bug.cgi?id=39552 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
*** Bug 44185 has been marked as a duplicate of this bug. ***
(In reply to comment #28) > Section "Driver" > Option "BufferCache" "False" > EndSection X refused to start instead: [546141.376] Parse error on line 2 of section Driver in file /etc/X11/xorg.conf "Driver" is not a valid section name.
Oops, Section "Device" not "Driver". You would have thought I would have checked before committing..
Okay, fixing that, it failed for a different reason: [604303.989] Parse error on line 5 of section Device in file /etc/X11/xorg.conf This section must have an Identifier line. [604303.990] (EE) Problem parsing the config file [604303.990] (EE) Error parsing the config file [604303.990] Fatal server error: [604303.990] no screens found What does it mean by Identifier line? And are there any other requirements that are going to show up after this one gets fixed.
Ok, the minimum complete snippet is: Section "Device" Identifier "Device0" Driver "Intel" Option "BufferCache" "False" EndSection
Okay, been running it for a few hours with the cache disabled. file-nr is still going up every time I check it even if I haven't done anything. xrestop shows about 55 MB in pixmaps, while i915_gem_objects shows 267 MB, and is also going up every time I check it.
Huh, just now I hit some *other* kind of limit before file-nr was even halfway to file-max just now. I guess it was ENOSPC instead of ENFILE? Basically the exact symptoms described Bug 44185 -- with the addition that compiz coincidentally(?) crashed (bug 46303) while everything was screwing up, which made window decorations disappear as usual--but once it automatically restarted, I was left staring at my desktop background with nothing else showing, all windows / the mouse had disappeared. The numbers don't *seem* that different--I'm pretty sure it's gone past 1.6GB before, so no idea why it hit ENOSPC this time, and that one time weeks ago, and not the dozens of other times in between? $ cat /proc/sys/fs/file-nr 314400 0 796780 $ sudo cat /sys/kernel/debug/dri/0/i915_gem_objects 307611 objects, 1535336448 bytes 748 [731] objects, 95580160 [26685440] bytes in gtt 3 [1] active objects, 3424256 [16384] bytes 8 [8] pinned objects, 8704000 [8704000] bytes 737 [722] inactive objects, 83451904 [17965056] bytes 0 [0] freed objects, 0 [0] bytes 8 pinned mappable objects, 8704000 bytes 685 fault mappable objects, 3039232 bytes 2147479552 [268435456] gtt total
Whoops. I closed all my programs one by one, and the leak only occurs with gnome-system-monitor 2.28.2 running. With it closed, the number of BOs in i915_gem_objects is stable. Even when it's minimized, while it's running the climb in numbers is fairly steady. Sorry I didn't test this properly before just now -- I forgot I had that running in the background, even.
Sounds like we have a lead at last \o/. Thanks.
Is this bug peculiar to gnome-system-monitor 2.28.2? The systems I have all have gnome-system-monitor 3.2 and I have not seen it misbehave yet (multiple generations, with and without compositing).
(In reply to comment #38) > Is this bug peculiar to gnome-system-monitor 2.28.2? The systems I have all > have gnome-system-monitor 3.2 and I have not seen it misbehave yet (multiple > generations, with and without compositing). Huh. I just tried it on another computer with Ubuntu 11.10 on it, which has 3.2.1, and I couldn't reproduce it there either. I'm gonna try booting an Ubuntu 11.04 live usb image, since that's what I'm currently running and seeing it on.
I have no gnome system monitor running, I'm on KDE and still have this problem. I do have gtk applications running though, mostly RCP/SWT based stuff. maybe it is something that these two have in common? I can also say that I always have graphic issues in these applications and not in the QT ones.
I've applied some patches originally intended to aide chasing this bug down, but since proved to fix another bug they went straight to master. Can you please checkout xf86-video-intel.git and monitor for the leak? Thanks.
Created attachment 58513 [details] Xorg.0.log from machine where gnome-system-monitor causes leaks ...Okay, couldn't reproduce it off the LiveCD either. Here are all the differences I can think of between this laptop and that machine: - Running 3.2 kernel - Installed the slightly dated natty xorg-edgers repo: X server 1.10.4+, drivers and mesa from February git snapshots - HD3000 hardware - Running in the gnome 'fallback' compiz composited environment I did install the xorg-edgers repo on the LiveCD and upgraded/killed/restarted X and all. Still couldn't reproduce it. There isn't an x11trace equivalent to apitrace that would record what exactly gnome-system-monitor might be spamming the server with, is there? (And it's not exclusive to gnome-system-monitor; the leaks are just accumulating much slower now. They only don't happen at all if I'm doing absolutely nothing but checking the BO count repeatedly.)
(In reply to comment #41) Whoops, should've refreshed the page. Yeah will do.
(In reply to comment #42) > There isn't an x11trace equivalent to apitrace that would record what exactly > gnome-system-monitor might be spamming the server with, is there? There is xtrace (or xscope) but no means to replay yet (at least that I know of).
Ah. I just restarted X with the driver from commit 0a8218a535babb5969a58c3a7da0215912f6fef8 -- leak still happens.
The situation should be improved by commit a14917eeb2cc160d13f4fddefe5f7f9c80953ce1 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Feb 24 21:13:38 2012 +0000 drm/i915: Release the mmap offset when purging a buffer If we discard a buffer due to memory pressure, also release its alloted mmap address space. As it may be sometime before userspace wakes up and notices that it has buffers to purge from its cache, we may waste valuable address space on unusable objects for a period of time. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47738 Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> but I'm still searching for just why we end up with so many buffers.
The only thing I've found so far... commit a16616209bb2dcb7aaa859b38e154f0a10faa82b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Apr 14 19:03:25 2012 +0100 uxa: Fix leak of glyph mask for unhandled glyph composition ==1401== 7,344 bytes in 34 blocks are possibly lost in loss record 570 of 58 ==1401== at 0x4027034: calloc (in /usr/lib/valgrind/vgpreload_memcheck-am ==1401== by 0x8BE5150: drm_intel_gem_bo_alloc_internal (intel_bufmgr_gem. ==1401== by 0x899FC04: intel_uxa_create_pixmap (intel_uxa.c:1077) ==1401== by 0x89C2C41: uxa_glyphs (uxa-glyphs.c:254) ==1401== by 0x21F05E: damageGlyphs (damage.c:647) ==1401== by 0x218E06: ProcRenderCompositeGlyphs (render.c:1434) ==1401== by 0x15AA40: Dispatch (dispatch.c:439) ==1401== by 0x1499E9: main (main.c:287) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Could this seemingly insignificant path be the cause of your misery?
I am quite certain 2.17 +sna has never bombed on me with this. I'm now up to kernel 3.3.1. OTOH I can't move up to 2.18.0, since Firefox gets corruption all over the place. Have not tested 2.18.0+3.3 yet, rather feels like I should be waiting for 2.18.1.
(In reply to comment #48) > I am quite certain 2.17 +sna has never bombed on me with this. I'm now up to > kernel 3.3.1. > > OTOH I can't move up to 2.18.0, since Firefox gets corruption all over the > place. Have not tested 2.18.0+3.3 yet, rather feels like I should be waiting > for 2.18.1. Ah that means you are encountering the bug in 2.18.0-sna and so you won't be suffering from this bug any longer (as far as I can tell this is pure an UXA issue).
(In reply to comment #47) > The only thing I've found so far... > > commit a16616209bb2dcb7aaa859b38e154f0a10faa82b > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Sat Apr 14 19:03:25 2012 +0100 > > uxa: Fix leak of glyph mask for unhandled glyph composition > > Could this seemingly insignificant path be the cause of your misery? The bad news is, that commit didn't fix it. The good news is, an earlier one *did*: commit fde8a010b3d9406c2f65ee99978360a6ca54e006 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Mar 30 12:47:21 2012 +0100 uxa: Remove broken render glyphs-to-dst Reported-by: Vincent Untz <vuntz@gnome.org> Reported-by: Robert Bradford <robert.bradford@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48045 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> I made sure; reverting that commit on top of master causes the leak to re-appear. (Only question is, was the leak in that function all along, or did it happen to call a leaky path somewhere else?) Just a note on my testing: gnome-system-monitor only begins to leak after it's been running for 60 seconds while viewing the 'Resources' tab-- ie, enough time has passed to fill up the whole X axis of the rolling 'CPU History' graph. Once history starts falling off the back edge, the numbers start climbing.
Ah, it had a very, very similar bug. Almost as if I based both functions on the same skeleton code ;-) (It leaked the localSrc, localDst, if either were allocated, if it decided that it would be unable to render the glyphs using the GPU). Glad to have an answer finally.
A patch referencing this bug report has been merged in Linux v3.7-rc1: commit d8cb5086695dcdd076e911fc298a5a6701497371 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Aug 11 15:41:03 2012 +0100 drm/i915: Try harder to allocate an mmap_offset
I'm seeing bug #46044 and I'm not 100% convinced that it's a duplicate of this one, as marked. I've been running xserver-xorg-video-intel 2:2.20.8-0ubuntu2.1~precise2, which I assume includes this patch, and seen the problem. Just downgraded to 2:2.17.0-1ubuntu4.2 as that was just released into precise updates. I think it's closely related though. I do see the number of "inactive objects" in /sys/kernel/debug/dri/0/i915_gem_objects climbing sky-high if there's any animation in my AWN taskbar. The clock doesn't trigger it, but the dropbox sync icon does. If I stop dropbox, it drops back down to normal levels; if I start dropbox and it's syncing (animated rotating arrows icon) the number of "inactive objects" grows by a few per second. I also see the same errors in /var/log/kern.log, over and over again, once the graphics corruption starts: Oct 17 13:37:25 lap-x201 kernel: [266534.112127] [drm:drm_gem_create_mmap_offset] *ERROR* failed to allocate offset for bo 0 I've started syslogging the number of inactive objects to see if it reaches the same kind of heights. I've also logged a bug on Launchpad, so we can see whether a new driver release is needed in Ubuntu: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1053959
(In reply to comment #53) > Oct 17 13:37:25 lap-x201 kernel: [266534.112127] > [drm:drm_gem_create_mmap_offset] *ERROR* failed to allocate offset for bo 0 > > I've started syslogging the number of inactive objects to see if it reaches > the same kind of heights. I've also logged a bug on Launchpad, so we can see > whether a new driver release is needed in Ubuntu: > > https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/ Could be this bug as the attached Xorg.log there indicates you are using 2.17.0 (I'd like to see your Xorg.log with 2.20.x if you have it, just to confirm) or it could just be a client pixmap leak. So also watch xrestop. > 1053959
Created attachment 74005 [details] Xorg.0.log with intel_drv.so 2.20.8 Sorry for the delay, it's awfully confusing that we're both called Chris Wilson, I didn't realise that you'd replied to my comment :) Here is Xorg.0.log, hopefully showing that the driver running is 2.20.8 (so with the patch)? I haven't been running xrestop, sorry. I've just started. Even if this is a client pixmap leak, surely one app shouldn't be able to bring down my desktop? Wouldn't that be a bug, perhaps in X itself rather than the Intel driver? But I've never seen anything like this happen with any other graphics card or system in 15 years of using Linux on the desktop. I did notice that when my Chromium goes all black, if I make the window smaller then it starts working again, for a while, then it goes black again and I have to make it smaller again, and so on until I can't read web pages any more and I have to log out and back in again.
(In reply to comment #55) > Created attachment 74005 [details] > Xorg.0.log with intel_drv.so 2.20.8 > > Sorry for the delay, it's awfully confusing that we're both called Chris > Wilson, I didn't realise that you'd replied to my comment :) > > Here is Xorg.0.log, hopefully showing that the driver running is 2.20.8 (so > with the patch)? > > I haven't been running xrestop, sorry. I've just started. Even if this is a > client pixmap leak, surely one app shouldn't be able to bring down my > desktop? Wouldn't that be a bug, perhaps in X itself rather than the Intel > driver? But I've never seen anything like this happen with any other > graphics card or system in 15 years of using Linux on the desktop. It's a Denial-of-Service. There's a limited address space for mmapping of buffers, so that if the client does leak, eventually we will not be able to map a new buffer and it will remain blank. (KDE is full of such examples, or at least one sufficiently common one.) We've harden the kernel to recover as much of that space as possible, but that is limited by the guarantees given by the userspace API. On the other side, it is possible to use alternative fallback methods if the mmapping fails, that hardening is present in SNA. Nevertheless, the side-effects will remain unpleasant until the source of the bug is found.
Hi Chris, I'm afraid I don't understand the protocol/library/guarantees well enough to interpret what you're saying with 100% confidence. The behaviour that I'm seeing is not consistent with one app DOSing itself. Chromium goes black, I restart Chromium (with the same tabs open), it's still black. I kill and restart the X server, restart Chromium again (with the same tabs open), and now it works again (for ~8-24 hours until the same thing happens again). I think you're saying that individual clients are allowed to allocate pixmaps out of the (very) limited mmaped space that is video RAM for this card and shared between app apps. And it's possible for a client to leak this space (it seems that maybe AWN or some of its applets does this), which eventually results in a DoS for other X apps (gnome-terminal, chromium) and makes the desktop unusable. My view is that if clients can do this, it represents a violation of X's responsibility to maintain stability of the desktop for all clients, in a way that doesn't seem to be consistent with the behaviour of X's behaviour with any other graphics driver. If clients can quite easily exhaust that resource, I don't think they should be allowed to allocate it at all. Why does the X server allow clients to allocate direct mapping pixmaps? What if the X server managed the graphics card's mapped memory, and decided for itself which pixmaps are actually mapped into the limited space available? One of the reasons that I prefer X over for example Windows is that it has always tried to protect itself against badly behaved apps crashing the desktop. If that is no longer the case, it annoys me quite a bit. Do you think this is a deeper bug in X that needs to be fixed? Cheers, Chris.
I've verified that killing and restarting /usr/share/avant-window-navigator/applets/indicator-applet.desktop restores normal behaviour in other apps, so I don't have to restart the X server any more. i915_gem_objects before and after: chris@lap-x201:~$ sudo cat /sys/kernel/debug/dri/0/i915_gem_objects 43847 objects, 103444480 bytes 2565 [1740] objects, 351232000 [209481728] bytes in gtt 97 [32] active objects, 54587392 [13967360] bytes 2468 [1708] inactive objects, 296644608 [195514368] bytes 7 pinned mappable objects, 12759040 bytes 66 fault mappable objects, 385024 bytes 2147483648 [268435456] gtt total chris@lap-x201:~$ kill 5103 chris@lap-x201:~$ sudo cat /sys/kernel/debug/dri/0/i915_gem_objects 2548 objects, 292028416 bytes 1694 [1186] objects, 253702144 [143339520] bytes in gtt 49 [41] active objects, 46473216 [14962688] bytes 1645 [1145] inactive objects, 207228928 [128376832] bytes 7 pinned mappable objects, 12759040 bytes 140 fault mappable objects, 4497408 bytes 2147483648 [268435456] gtt total chris@lap-x201:~$ awn-applet -p /usr/share/avant-window-navigator/applets/indicator-applet.desktop -u 1347961205 -w 23068725 -i 1 & chris@lap-x201:~$ sudo cat /sys/kernel/debug/dri/0/i915_gem_objects 1616 objects, 194662400 bytes 798 [651] objects, 150114304 [93024256] bytes in gtt 70 [54] active objects, 46964736 [15323136] bytes 728 [597] inactive objects, 103149568 [77701120] bytes 8 pinned mappable objects, 12775424 bytes 91 fault mappable objects, 815104 bytes 2147483648 [268435456] gtt total
Hi Chris, Am I experiencing a bug in the X server then? Do you want me to open a new bug? Something is seriously wrong if one app is able to bring down my entire desktop by accident. Cheers, Chris.
Thinking about it, a bug against Xorg core to teach it per-client resource limits is actually not a bad idea. I would imagine that XACE, the security extension to X that already does all the permission checks, should be modifiable to also perform resource limit checks.
Thanks, filed bug #60925. Cheers, Chris.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.