Summary: | [gm45 sna] loimpress slideshow hangs with kwin + fullscreen GL desktop effects | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | sergio.callegari | ||||||||||
Component: | Driver/intel | Assignee: | Chris Wilson <chris> | ||||||||||
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||
Severity: | normal | ||||||||||||
Priority: | medium | ||||||||||||
Version: | unspecified | ||||||||||||
Hardware: | Other | ||||||||||||
OS: | All | ||||||||||||
Whiteboard: | |||||||||||||
i915 platform: | i915 features: | ||||||||||||
Attachments: |
|
Description
sergio.callegari
2012-07-20 09:55:00 UTC
And to check, which version of cairo? Also if you can attach an example presentation, that will save me the trouble of creating one :) fading == 'Fade Smoothly'? Hmm, would also like to see your Xorg.log to clarify versions. Libcairo is 1.10.2-6.1ubuntu3 that is 1.10.2 with ubuntu patches. WRT the sample presentation, I can try set one up for you soon, but unfortunately not right now... The effect is 'fade through black' (once I have set up the English user interface) xorg log in the next msg as soon as I restart it with SNA Created attachment 64423 [details]
xorg log
Created attachment 64424 [details]
Test case
Lunch break.... managed putting together the short test case too.
Thanks for the presentation, it just rules out something peculiar to the slideshow. So far it has worked flawlessly for me, with kwin compositing enabled and without a compositing wm. Anything else interesting in your configuration? I take it you are not in a position to be able to install a debug driver? My config should not be paricularly weird. I have kde 4.8.4 and kwin with the opengl compositing on. And I have an external screen attached to the VGA port, working with lvds off. Libreoffice is 3.6RC1, but - as I told - it is fine with UXA. When I finish some urgent work, I'll test without compositing. And in the weekend, I should be able to test the xserver-xorg-intel in the debug edition, since I have noticed that Fabio Pedretti is so kind to package that too. What should I look at? Doesn't appear to be GL effects either. And I'm using a similar setup (external DVI with LVDS off). Hmm. And you are able to reproduce this almost at will.. If you get the chance to run with --enable-debug=full that will generate a huge logfile for me to look at and see if I can spot what is happening when it stops responding. A plain --enable-debug package turns on lots of assertions, so would also be useful to test if that's available. Made some additional tests, without debug package yet. 1) I have a video in case you want to see what happens. It's a bit big, though, so let me know if it might be useful before I upload. 2) Bug is triggered by kwin desktop effects (compositing). Without the effects, no bug. 3) Bug disappears if I configure kwin to disable desktop effects with fullscreen windows. Maybe this is your configuration. 4) Without desktop effects, or with the disable desktop effects with fullscreen windows, not only the bug disappear, but the presentation quality improves a lot becoming completely smooth. With the desktop effects on there is something looking like tearing. Can reproduce also on: EEEPC 1000H (Atom 32 bit) with intel 945GM express (Gen3) Ubuntu 12.04 Similar graphics stack as the bigger brother DELL E6500 (Fabio pedretti ppa with recent git stuff), but the following differences: 1) Kernel 3.2 (stock ubuntu) instead of 3.4 2) Unity instead of KDE 3) libreoffice 3.5.4 instead of 3.6RC1 just a bit less frequent to see. (In reply to comment #11) > Made some additional tests, without debug package yet. > > 1) I have a video in case you want to see what happens. It's a bit big, though, > so let me know if it might be useful before I upload. If it is just the screen stops redrawing (even though the slides are advancing), then I think not, as a static image doesn't give much more information. :) > > 2) Bug is triggered by kwin desktop effects (compositing). Without the effects, > no bug. > > 3) Bug disappears if I configure kwin to disable desktop effects with > fullscreen windows. Maybe this is your configuration. Indeed it was. > 4) Without desktop effects, or with the disable desktop effects with fullscreen > windows, not only the bug disappear, but the presentation quality improves a > lot becoming completely smooth. With the desktop effects on there is something > looking like tearing. With debugging enabled, there is an obvious black frame between the static image and the fade from A to black to B (i.e. it goes blank, then restores the original image and fades to B). So it looks like I the same setup now (kwin + fullscreen GL effects), but no bug as of yet... Maybe your video will be useful after all, to try and reproduce your steps exactly. I still haven't encountered this. Any chance you can compile your own driver with debugging enabled (--enable-debug=full) and attach the Xorg.log? Sorry to ask questions that may be naive, but I have never done this before. Do I need to rebuild the intel driver alone (namely the package xserver-xorg-video-intel on ubuntu) or that plus the xorg framework (namely xorg-server) to get the required debug info? Just the xf86-video-intel is all that is required. Try: $ sudo apt-get build-dep xserver-xorg-video-intel $ git clone git://anongit.freedesktop.org/xorg/driver/xf86-video-intel $ cd xf86-video-intel $ ./autogen.sh --prefix=/usr --enable-debug=full $ make && sudo make install Ok, I've followed a slightly different path. Rebuilt the ubuntu package into a new ubuntu package with --enable-debug=full. This means that I have rebuilt the very same package that I was using, namely the intel driver with all the git stuff up to 20/7/2012. And it also means that I can easily re-install the package with the debug on whenever I need it. For those intersted, my package is in the Sergio Callegari ubuntu PPA https://launchpad.net/~callegar/+archive/ppa/+packages. The xorg log that I am attaching was obtained by starting X, starting xorg with the test presentation that is also attached here, activating the presentation, let it show the problem, closing libreoffice. Here it is: https://docs.google.com/open?id=0B0owM_i9wf0CSzZiSjRYOWhqXzg I cannot attach it inside the bug tracker because it is quite big. Does this help? Well I only see loimpress draw a single frame of that presentation (judging by the rendering commands I see locally). How many frames do you think it drew, and how many did it advance by? Started the presentation. LO made slide one appear on screen, then I clicked and nothing, then I clicked and nothing, then I clicked and nothing, then I pressed esc to exit the presentation and LO was at slide 4. Tested again with kwin configured to disable effects on full-screen and LO is happy and draws all the slides. Tried Apache Openoffice 3.4 too. It does almost the same. With kwin setup to disable the effects in fullscreen mode, it is happy. Whit kwin setup to leave the effects enabled in fullscreen mode it shows some slides (typically 1 or 2), then the screen stops updating. However, at this point when you press esc to exit openoffice it typically crashes. More funny bits... I have discovered this: if I 1) Let kwin keep the effects on in fullscreen so that the issue can manifest 2) Start libreoffice on the test presentation 3) Launch the presentation 4) Keep moving to the next slide until the bug manifests 5) When the slide does not update press ALT-F2 so that kwin gives me a little command line, use it to issue some xrandr command switching between the external monitor and the laptop screen or viceversa then, after the screen switch has completed, - while staying in full screen presentation mode - updates the screen gets updated to the new slide. Do you think that this may be a bug in SNA, or maybe a bug in libreoffice that gets triggered by SNA (say, by slightly different timings on things that SNA is imposing)? Should I cross post this bug to the LO and AOO mailing lists? Hmm, I wonder... commit e6cb5d93eaa01e7f4763f797bba341f3cc481d98 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jul 30 11:14:58 2012 +0100 sna: Avoid overlapping gpu/cpu damage with IGNORE_CPU We cannot simply ignore the presence of CPU damage with IGNORE_CPU but must remember to discard it. Can you please do a quick test with master? Hi, I am now running a packages xserver-xorg-video-intel which is labelled git1207301126.d3499c Which I guess refers to commit d3499c dated 12/07/30. Seems pretty recent, but I do not know if it precedes or follows the commit being mentioned in the latest post. In any case, it still has the issue. Probably I will not be able to do any test for the next week or so... I'll post some news right after that. It was a long shot, as I still can't see where it was even trying to render the missing updates... Managed to make a quick test on an EEEPC 1000H (gen3 with Mobile 945GSE Express Integrated Graphics Controller) after receiving an updated deb package with the intel video driver post 2.20.2 dated 3/8/12 at git commit 146959. Apparently things have regressed even further. With UXA everything is fine as usual. With SNA and unity 2D things are still fine, but there is a lot of flickering in the presentation effects. With SNA and unity 3D, libreoffice is now uncapable of rendering even the first slide in presentation mode. The screen remains at some very light shade of gray with no image on it. Still not seeing the issue you describe in unity (3d); unity-2d's problems are of its own making, at least compared to either xfce4 or awesome, Started working on the gen4 machine again. Now on 2.20.4. Lack of screen update now manifests also in other applications (e.g. firefox, thunderbird). Screen is partially rendered and the software is quiescent (no sign of it being still downloading something). CTRL+ALT+F1 go to console, CTRL+ALT+F7 back to X11 and the screen is now updated. Do not know if it is just me having more attention in looking for the issue or if it is new in 2.20.4. This is just really frustrating. It is one of those cases where if I could spend an afternoon playing with the bug, I could fix it. But until I have the opportunity to experiment, I'm none the wiser as to where exactly it is manifesting. :| It is also frustrating for me not to be able to provide a reproducible test case... If I can find enough time for it, I'll try building a minimal setup like mine to see if the bug is present or not. I think that apart from the LO presentation, 2.20.4 indeed has some further regression wrt 2.20.2 when using SNA acceleration and Gen4 with Kwin and effects on. In many apps, I am now experiencing missing screen updates: 1) Editors where the pression of a key does not make the character appear, but the pression of another key make the current character and the previous one simultaneously appear. 2) Firefox where parts of the screen only get update when one moves the mouse over them or scrolls. 3) Textboxes where the text cursor disappears Before vacation I could work on SNA without problems (apart from the LO thing), while now I need to be back on UXA. If I can find some test case where (at least on my machine) some action can invariably trigger rendering failure, I'll open another bug for that. So far, I am only sending this to you as a very preliminary information. Yes, a missing flush of the DRI pixmaps (for GL compositors like kwin) crept in: commit fc6b7f564df88ca773ae245b1b4e278b47dffd59 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Aug 23 15:13:14 2012 +0100 sna: Flush the batch if it contains any DRI pixmaps This fixes a regression from commit 02963f489b177d0085006753e91e240545933387 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Aug 19 15:45:35 2012 +0100 sna: Only submit the batch if flushing a DRI client bo which made the presumption that we called sna_add_flush_pixmap() for every DRI pixmap that we used. However, that is only called for the dirty pixmaps, any native exported pixmap only marks the batch as requiring a flush. So in those cases we always need to submit the batch if it contains an exported DRI pixmap. Reported-by: chr.ohm@gmx.net Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53967 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> In the meantime, please do keep me informed of any testing you do. Any changes, good or bad, may give an insight as to where the root cause is. Downloaded a more recent xserver-xorg-video-intel. I am now at a version packaged on 24/8 and marked as git 454cc8 (sna: Submit the partial batch before throttling). Missing updates on firefox/thunderbird and more that I reported recently are in fact gone. However, I am still experiencing the missing update with the LO presentation. Furthermore, I have a new rendering problem, again with Libreoffice and Openoffice (either the LO 3.6.0 from the libreoffice site or the Apache OO 3.4.0 from the apache site). Again, it looks as a wrong/missing update. Icons are only partially rendered. Hovering with the mouse makes their appearance improve. Please look at the two attachments. Created attachment 66169 [details]
Snapshot of LO initial screen
Some buttons/icons are rendered wrongly.
Created attachment 66170 [details]
Snapshot of LO writer
Toolbars are rendered wrong
That looks more like a missed cache flush. Quite possible since we no longer flush after every single rectangle that the original code was missing some required flushes. TIL that typing loimpress into the KDE start menu does different things to than launching it through the menu. Haven't seen the wholescale corruption you have in your toolbar, but I am catching the odd corrupt toolbar icon (like half is uninitialised garbage), only when using kwin opengl effects. That suggests an issue along the DRI serialization path, or some missing damage. That it is not trivially reproducible lends credence to it being a timing and/or damage flush issue between the gl compositor and X. Apparently, also launching from the command line with --nologo or even with --writer as an option reduces the probability of seeing the mis-rendered icons/toolbar. Yet, at times the issue manifests even with these options. I can see the icon corruption on both 965 and 945(pineview) so it is unlikely to be anything chipset specific, i.e. not a missing flush inside the GPU. More likely than is some unpushed damaged to the compositor. Found the issue with my broken icons: commit 26c731efc2048663b6a19a7ed7db0e94243ab30f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Aug 27 20:50:08 2012 +0100 sna: Ensure that we create a GTT mapping for the inplace upload buffer As the code will optimistically convert a request for a GTT mapping into a CPU mapping if the object is still in the CPU domain, we need to overrule that in this case where we explicitly want to write directly into the GTT and furthermore keep the buffer around in an upload cache. References: https://bugs.freedesktop.org/show_bug.cgi?id=51422 References: https://bugs.freedesktop.org/show_bug.cgi?id=52299 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Can you please test and see if that clears up any of your issues? Tested... the driver is built in my ubuntu PPA for reference. Unfortunately the issues (misrendered icons, misrendered toolbar, presentation not advancing) are still there. (In reply to comment #41) > Tested... the driver is built in my ubuntu PPA for reference. > > Unfortunately the issues (misrendered icons, misrendered toolbar, presentation > not advancing) are still there. Yeah, after more testing I came to the conclusion that is was just the placebo effect. :( Ok, this seems to be better: commit deaa1cac269be03f4ec44092f70349ff466d59de Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Aug 28 22:23:22 2012 +0100 sna: Align active upload buffers to the next page for reuse If we write to the same page as it already active on the GPU then despite the invalidation performed at the beginning of each batch, we do not seem to correctly sample the new data. References: https://bugs.freedesktop.org/show_bug.cgi?id=51422 References: https://bugs.freedesktop.org/show_bug.cgi?id=52299 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Yes, I confirm that the latest git fixes the wrong startup icons and the toolbar in libreoffice/openoffice! Thank you for the quick fix! Now we only remain with the initial problem of the Libreoffice/openoffice presentation that is not advancing if the window manager is configured to have the desktop effects and not to disable them in fullscreen mode. Meh, in reality it wasn't a quick fix, but that issue had been lurking since around January. Thanks for confirming the fix. And now back to trying to trigger the loimpress failure again. Are all your packages, except for the gfx, stack from precise? All is from precise, but for the following: 1) Virtualbox - from virtual box deb repository this should not matter 2) Maxima, Xmaxima, WXMaxima - from the blahota-wxmaxima ppa this should not matter 3) Asymptote - from my callegar-asymptote ppa this should not matter 4) Samsung spp printer driver - from my cupsdrivers (oneiric) ppa this should not matter 5) Qxmledit - from my callegar-qxmledit (oneiric) ppa this should not matter 6) The Obnam backup tool and its dependencies - from chris-bigballofwax-obnam-ppa ppa this should not matter 7) Git from the git-core ppa this should not matter 8) kde 4.9 from the kubuntu-ppa-backports ppa this may matter 9) A few entries from the kubuntu-ppa this may matter - I can provide the detailed list if necessary 10) librecad from the librecad-dev-librecad-stable ppa this should not matter 11) the updated graphics stack from the oibaf-graphics-drivers ppa this surely matters 12) openshot from the openshot.developers ppa this should not matter 13) recoll from the recoll-backports-recoll-1.15-on ppa this should not matter 14) texlive 2012 from the texlive-backports ppa this should not matter 15) wine 1.5 from the ubuntu-wine ppa this should not matter 16) vala 0.16 from the vala-team ppa this should not matter 17) rekonq 1.0 from the yoann-laissus-rekonq ppa this should not matter 18) a few things that should not matter from the ubuntu official precise backports this should not matter 19) a couple of items from medibuntu this should not matter 20) xpra from winswitch this should not matter 21) google chrome and the talk plugin from google this should not matter 22) wuala from wuala this should not matter 23) jitsi from jitsi this should not matter 24) the draftsight cad (i386) from dassault systemes this should not matter 25) atlas libraries recompiled by myself to be optimized for my specific system this should not matter 26) libreoffice 3.6 from the libreoffice site this may matter - it is not libreoffice as distributed by ubuntu, but libreoffice from the libreoffice site 27) linux 3.5.3 from the ubuntu mainline ppa this may matter 28) nsp and scicoslab matlab clones from the scicoslab site this should not matter 29) nxclient from nomachine this should not matter 30) skype from skype this should not matter ALL the above is in deb packages... Furthermore I have some stuff in opt... but this really should not matter... Nothing significant (libs) in local... Let me know if you need more details on something... In the previous msg I forgot to mention that the machine has never been re-installed since the time of ubuntu hardy... only upgraded... so that there might be some cruft around... I think I finally have some news on this... Looks like it is specific to libreoffice builds from libreoffice.org and openoffice builds from apache. I tried downgrading to the ubuntu libreoffice and the presentation now works. Either with kwin having compositing on in fullscreen or not. But libreoffice.org and apache builds do not. I have also noticed that libreoffice and apache builds incorporate many libraries that get used in place of system libraries. Hence, probably all boils down to this. Either: 1) The libraries distributed by libreoffice/apache have some incompatibilities with the ubuntu precise mesa (but how does this happen?). Or 2) The libraries distributed by libreoffice/apache impose some slightly different timings of things and the stopping presentation is related to the timing of events. Maybe it is 1. As a matter of fact, I downgraded libreoffice to the ubuntu precise debs (even if they are very old and buggy), just to test since I was experimenting and I had noticed that the libreoffice from libreoffice.org cannot work with the soft renderer. Namely, LIBGL_ALWAYS_SOFTWARE=1 makes libreoffice complain that it cannot load opengl support... Can you check whether libreoffice bundles cairo in their build? In particular there was a regression in cairo-1.12.0: commit 9e81c5b737cda9dc539b2cf497c20ac48ddb91ac Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Apr 25 20:41:16 2012 +0100 xlib: Allow applications to create 0x0 surfaces Although 0x0 is not a legimate surface size, we do allow applications the flexibility to reset the size before drawing. As we previously never checked the size against minimum legal constraints, applications expect to be able to create seemingly illegal surfaces, and so we must continue to provide backwards compatibility. Many thanks to Pauli Nieminen for trawling through the protocol traces, diving into the depths of libreoffice and identifying the regression. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=49118 (presentation mode in loimpress is blank). Reported-by: Eric Valette <eric.valette@free.fr> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk The symptoms didn't seem to match completely, but maybe... libcairo.so.2 is in... I am not sure which version. I tried strings on it and a 1.10.2 comes out that might be it. Some more info... unfortunately possibly irrelevant... by moving the libstdc++.so.6 distributed with libreoffice (using the libreoffice.org build) out of the way (so that the system one is used) libreoffice can now use the software pipe for opengl. With this, no hangs. Also, I have learned that there is an opengl debug option in libreoffice: LIBGL_DEBUG=verbose. With this during the presentation libreoffice complains all the time that glGetError is no-op glGetError is no-op glGetError is no-op glGetError is no-op glGetError is no-op either with the soft or the hard pipe. moving the libcairo shipped with libreoffice out of the way makes no difference. BTW... it is completely unclear to me why libreoffice is actually loading i965_dri.so or swrast_dri.so since I use it configured not to use hardware acceleration (that is anyway completely broken on linux). As a matter of fact one more difference between the libreoffice and the ubuntu builds of libreoffice is that in the former you can configure hardware acceleration, while in the latter it is permanently disabled at compile time. Interesting and very scary that it seems to boil down to a particular version of libstdc++. This is a library only used by libreoffice and not kwin etc? At the moment, I'm leaning towards this being a libreoffice build issue and perhaps they might know a little more. Can you raise a bug report in libreoffice and cross-link? Well, not exactly... with the system library I can run the libreoffice.org libreoffice with the llvmpipe renderer, which gives no hang in the presentation. With the libreoffice.org packaged libstdc++ I can only run the hardware renderer... which hangs during the presentations only if using sna either with the system libstdc++ and with the libreoffice.org one. On the ubuntu libreoffice, where /all/ the libraries are system libraries, it looks like I can use sna with no hangs. I have posted the thing to the libreoffice.org bug tracker. I am worried that mine will look like a corner case to them... Please see https://bugs.freedesktop.org/show_bug.cgi?id=54725 As a last note... With the latest driver, the bug manifests in a sligtly different way on my eeepc (the gen3 machine with the unity window manager). Now the hang is not trying to pass to the next slide, but just before the rendering of the current slide is over (namely I often get the background forming through the slow gradient effect, and then the presentation hanging before all the text is rendered onto the slide). Unless something relevant has changed in the drivers, this seems to suggest that even minor changes may affect how the bug manifests, as if it was quite sensitive to timings or something like that. Quite amazing. I have updated to ubuntu quantal (12.10) and this bug is gone. It is gone both for gen4 (dell E6500) and gen3 (eeepc 1000H). It is amazing, because since I used git versions of libdrm, the xorg intel graphics and mesa, I am actually running the same versions of those that I used to run before. The same goes with the kernel. I think that the only thing that has actually changed is the xorg infrastructure. What remains of the older bug is the following: - when compositing is switched off or it is off at least for the full screen windows, on the transition between slides is OK - when compositing is switched on (even for full screen windows), in the transition between the slides gen 4 with kde temporarily inserts a black frame at the beginning of the transistion. Gen3 with unity does not (only a bit of flicker). Obviously all this is just quite minor in comparison to the previous blocking presentation. Issue is back. With Linux 3.6.7, and mesa 9.1 devel (git snapshot 19/11/2012) as in the oibaf ppa. This looks as some timing problem, so that very minor things make the issue appear and disappear... I am sure that it will go unnoticed by most... since kde now disables effects for fullscreen windows by default. I forgot to say... tested on gen4 There is an outside chance this is related to: commit 9ab1d1f94e502e5fde87e7c171f3502f8a55f22b Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Nov 20 18:42:58 2012 +0000 sna/dri: Queue a vblank-continuation after flip-completion If a vblank request was delayed due to a pending flip, we need to make sure that we then queue it after that flip or else progress cease Can you please test with -intel.git? patch is in 2.20.14 for testing Sergio, any news? Assuming fixed by the vblank queue fixes. Sorry for remaining silent... the spam filter all of a sudden decided that it did not like this thread and I missed the last messages. I only realized it today. The latest git seems to fix the issue for me!!! I have tested with a couple of presentations with effects on and off for fullscreen windows and I could play all of them fine. Should it happen again I'll reopen the bug. Many thanks and let me take the occasion to wish you a great 2013! Thanks for the update! Please do let me know if you encounter any other issues. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.