Summary: | [DRI3] Compiz segfaults in intel_destroy_image() | ||
---|---|---|---|
Product: | Mesa | Reporter: | Eero Tamminen <eero.t.tamminen> |
Component: | Drivers/DRI/i965 | Assignee: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Status: | VERIFIED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | critical | ||
Priority: | high | CC: | daniel, lfrb, sergii.romantsov |
Version: | git | Keywords: | bisected, patch, regression |
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 106157 | ||
Attachments: | Gdb backtrace of the crash |
Looks like a DRI3 issue. Cc Louis-Francis & Daniel who've worked on this recently. I'm experiencing the same issue when switching between windows. Firstly it was noticed on GFXBench. When doing ALT+TAB between GFXBench and Firefox similar crash is presented. Issue looks very similar to the https://bugs.freedesktop.org/show_bug.cgi?id=104392 and https://bugs.freedesktop.org/show_bug.cgi?id=104301. I checked that bugs and it reproduces again. Probably it should be reopened. Also I suppose bug with the Dota should also appear again https://bugs.freedesktop.org/show_bug.cgi?id=104214. Found bad commit: commit 3160cb86aa9234ff78e11fe7a00f30bfb5cb8445 Author: Louis-Francis Ratté-Boulianne <lfrb@collabora.com> Date: Fri Oct 6 01:26:51 2017 -0400 egl/x11: Re-allocate buffers if format is suboptimal If PresentCompleteNotify event says the pixmap was presented with mode PresentCompleteModeSuboptimalCopy, it means the pixmap could possibly have been flipped instead if allocated with a different format/modifier. Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com> Reviewed-by: Daniel Stone <daniels@collabora.com> I can add that Bug 104301 is back and can be reproduced with Unity desktop on Ubuntu 16.04. But can't be reproduced on the same Ubuntu with xfce-desktop and Debian buster with Xfce. (In reply to vadym from comment #2) > Found bad commit: Thanks for the bisect! > commit 3160cb86aa9234ff78e11fe7a00f30bfb5cb8445 > Author: Louis-Francis Ratté-Boulianne <lfrb@collabora.com> > Date: Fri Oct 6 01:26:51 2017 -0400 > > egl/x11: Re-allocate buffers if format is suboptimal I'd recommend using --format=fuller option to get correct upstreaming date: ----------------------------------------------- commit 3160cb86aa9234ff78e11fe7a00f30bfb5cb8445 Author: Louis-Francis Ratté-Boulianne <lfrb@collabora.com> AuthorDate: Fri Oct 6 01:26:51 2017 -0400 Commit: Daniel Stone <daniels@collabora.com> CommitDate: Fri Mar 9 17:47:14 2018 +0000 egl/x11: Re-allocate buffers if format is suboptimal ----------------------------------------------- (In reply to Sergii Romantsov from comment #3) > Proposed patch: > https://lists.freedesktop.org/archives/mesa-dev/2018-April/191363.html Thanks, I just started 3h test-run to validate whether this fixes the issue completely and whether there's any perfomance impact. (In reply to Andriy Khulap from comment #4) > I can add that Bug 104301 is back and can be reproduced with Unity desktop > on Ubuntu 16.04. But can't be reproduced on the same Ubuntu with > xfce-desktop and Debian buster with Xfce. AFAIK XFCE uses XRender to do compositing, not GL/ES, so it works quite differently compared to most of the other compositors. I've verified that patch fixes all the Compiz crashes and doesn't regress anything in our test-set. Since this bug regresses in the same way as 104301 and 104214, is it time to make an automated test that will detect these types of errors? Is that even possible? The reference counting mechanism is clearly fragile. (In reply to Mark Janes from comment #7) > Since this bug regresses in the same way as 104301 and 104214, is it time to > make an automated test that will detect these types of errors? Is that even > possible? Depends on what the environment is, I suppose: we'd need to start X with DRI3 support in a controlled manner, and AFAIK that involves complete TTY access. At worst, compositor seems to crash about 10 times / hour because of this. Compositor going away and being restarted is causing also other programs to fail to X errors, if they happen to start at the same time. Btw. on newer Ubuntu release (17.10), this is a desktop killer bug. Desktop dies along with Compiz, it doesn't get restarted like on 16.04. Even on 16.04, desktop sometimes fails when Compiz goes down, although it's rare. Sergii had fix available already 2 weeks ago, why it's not yet commited? Latest patch fixing the issue: https://patchwork.freedesktop.org/patch/219239/ (comments) https://patchwork.freedesktop.org/patch/219923/ Compiz crashing will also sometimes cause other programs to crash (when they start, I assume) due to failing XGetProperty call. Fixed in Git master: Commit: 6f81e07ecb8c0793dc482307d5d96fd3df95b7d2 URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=6f81e07ecb8c0793dc482307d5d96fd3df95b7d2 Author: Michel Dänzer <michel.daenzer@amd.com> Date: Fri Apr 27 17:41:48 2018 +0200 dri3: Only update number of back buffers in loader_dri3_get_buffers Verified, the crashes are gone. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 138623 [details] Gdb backtrace of the crash Somewhere between following Mesa commits: 1e9d779331: 2018-03-08 18:14:02 UTC: meson: Fix building gallium media libs without egl a2f08dd574: 2018-03-12 17:24:31 UTC: gallium: Use struct gl_array_attributes* as st_pipe_vertex_format argument. Ubuntu 16.04 Unity Compiz started randomly crashing to NULL pointer access during our test-runs. Normally Unity desktop is able to successfully restart Compiz, so it can crash again. During ~3 hour test runs, it will segfault a few times, which can be seen from dmes: [ 8002.554441] compiz[5936]: segfault at 8 ip 00007fe34f8bcc34 sp 00007ffe0e44a810 error 4 in i965_dri.so[7fe34f4ac000+84e000] [ 8046.153748] compiz[7073]: segfault at 8 ip 00007f218d4f7c34 sp 00007ffe8e5973f0 error 4 in i965_dri.so[7f218d0e7000+84e000] I've seen these crashes on all platforms we have. I was able to catch the crash twice in Gdb from 3 hour test-run, both times it was due to intel_destroy_image() getting a NULL pointer: #0 intel_destroy_image (image=0x0) #1 dri3_free_render_buffer () #2 dri3_get_buffer () #3 loader_dri3_get_buffers () #4 intel_update_image_buffers () #5 intel_update_renderbuffers () #6 intel_prepare_render () #7 brw_prepare_drawing () #8 brw_draw_prims () #9 vbo_draw_arrays () ... #22 CompositeScreen::handlePaintTimeout() See attached full backtrace for details. As this happens randomly i.e. seems to be timing related, my guess would be that it happens when application either starts or exits, and compositor happens to be doing screen update at the same time. (Unfortunately I don't have data from between those Mesa dates. Because issue takes long time to reproduce and is random, it's not bisection friendly.) --- In dmesg outputs, the crash happens always on same VMA page in Mesa, on all platforms. The actual crash instruction pointer address has couple of different addresses inside that (4K?) page, so it's possible that the above backtrace isn't the only one. Crash happens both in a setup using slightly older kernel & X builds, and one using the latest git version of those i.e. it's due to a Mesa change, not one in other components (in the Ubuntu itself, in this time frame there was only update to libgcrypto20 to disable FIPS, if it was enabled).