Bug 92103 - [G45] Segmentation fault in get_stencil_miptree
Summary: [G45] Segmentation fault in get_stencil_miptree
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 10.6
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
Keywords: patch
Depends on:
Reported: 2015-09-24 12:33 UTC by Giulio Bernardi
Modified: 2017-01-06 19:04 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:

Stack trace (5.87 KB, text/plain)
2015-09-24 12:33 UTC, Giulio Bernardi
The patch (537 bytes, patch)
2015-09-24 12:33 UTC, Giulio Bernardi
Details | Splinter Review
debugging output (307.01 KB, text/plain)
2016-02-12 08:03 UTC, Marek Chalupa

Note You need to log in before you can comment on or make changes to this bug.
Description Giulio Bernardi 2015-09-24 12:33:24 UTC
Created attachment 118428 [details]
Stack trace

This bug was originally reported on RedHat's bugzilla for Fedora 22:

Description of problem:
On the machine I use at work sometimes plasma crashes after the login, before showing the KDE desktop. Investigation with gdb might have shown the cause of the problem.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. In KDM, perform the login

Actual results:
Sometimes plasma crashes before showing the desktop

Expected results:
Plasma should start and show the desktop

Additional info:
I Launched gdb from the KDE crash handler (drkonqi?), see attached stack trace. The crash happened at brw_misc_state.c:215, that is:

(gdb) list
210     static struct intel_mipmap_tree *
211     get_stencil_miptree(struct intel_renderbuffer *irb)
212     {
213        if (!irb)
214           return NULL;
215        if (irb->mt->stencil_mt)
216           return irb->mt->stencil_mt;
217        return irb->mt;
218     }

It turns out that irb->mt was null:

(gdb) print irb->mt
$3 = (struct intel_mipmap_tree *) 0x0

I modified the line to read

if (irb->mt && irb->mt->stencil_mt)

(see the attached patch) and so far (a couple of restarts, a shutdown, and a tenth of logouts/logins) no crash happened.

However, since the crash is not always reproducible, I cannot be 100% sure.

The video card is

00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03)

I am using the intel driver with UXA acceleration method (not SNA).
Glxinfo says:

server glx vendor string: SGI
server glx version string: 1.4
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) G45/G43 
OpenGL version string: 2.1 Mesa 10.6.3 (git-ccef890)
OpenGL shading language version string: 1.20
OpenGL ES profile version string: OpenGL ES 2.0 Mesa 10.6.3 (git-ccef890)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 1.0.16

I think this is not a duplicate of bugs like this:
because it was fixed long ago.
Comment 1 Giulio Bernardi 2015-09-24 12:33:58 UTC
Created attachment 118429 [details] [review]
The patch
Comment 2 Giulio Bernardi 2015-09-24 13:16:41 UTC
It looks like this fix is not enough. The crash happened again after a shutdown and a power on (30 minutes later).

In brw_workaroound_depthstencil_alignment pointers depth_mt and stencil_mt are null so the crash is only postponed until some other statement tries to access a member of these structures.

Maybe I'm wrong, but it looks like this crash only happens at first boot after power on...
Comment 3 Marek Chalupa 2016-02-12 08:02:17 UTC

I'm experiencing the same bug, 100% reproducible. I have gnome-shell and mesa compiled from sources and everytime I try to run the gnome-shell, I hit this bug.

When I add check for irb->mt being NULL in brw_clear() function, the crash won't happen, but the external monitor is just fuzzy and blinking (the one on laptop is fine)

mesa: 0f3cea95 (and earlier)
drm: f884af9b (earlier too)
Fedora 23
Comment 4 Marek Chalupa 2016-02-12 08:03:54 UTC
Created attachment 121701 [details]
debugging output
Comment 5 Marek Chalupa 2016-02-15 10:03:03 UTC
(In reply to Marek Chalupa from comment #3)

> When I add check for irb->mt being NULL in brw_clear() function, the crash
> won't happen, but the external monitor is just fuzzy and blinking (the one
> on laptop is fine)

After I change the external monitor from primary to secondary, it stops cluttering and looks just fine (even after changing back to primary)
Comment 6 Bernd Buschinski 2016-12-29 13:26:50 UTC
I also run into this problem with Plasma 5.8.5.
using mesa-13.0.2, kernel 4.9.0, xorg-server-1.18.4 and using dri3.

Plasma crashes under some conditions when trying to render a preview, it is not 100% reproduceable.

GlxInfo stuff:

Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Intel Open Source Technology Center (0x8086)
    Device: Mesa DRI Intel(R) HD Graphics 520 (Skylake GT2)  (0x1916)
    Version: 13.0.2
    Accelerated: yes
    Video memory: 3072MB
    Unified memory: yes
    Preferred profile: core (0x1)
    Max core profile version: 4.5
    Max compat profile version: 3.0
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 520 (Skylake GT2) 
OpenGL core profile version string: 4.5 (Core Profile) Mesa 13.0.2
OpenGL core profile shading language version string: 4.50


Thread 1 (Thread 0x7fca8d3e9780 (LWP 6440)):
[KCrash Handler]
#6  0x00007fc9dbb6099f in get_stencil_miptree (irb=<optimized out>) at /var/tmp/portage/media-libs/mesa-13.0.2/work/mesa-13.0.2/src/mesa/drivers/dri/i965/brw_misc_state.c:229
#7  brw_workaround_depthstencil_alignment (brw=brw@entry=0x14cfa78, clear_mask=clear_mask@entry=50) at /var/tmp/portage/media-libs/mesa-13.0.2/work/mesa-13.0.2/src/mesa/drivers/dri/i965/brw_misc_state.c:245
#8  0x00007fc9dbb44c87 in brw_clear (ctx=0x14cfa78, mask=50) at /var/tmp/portage/media-libs/mesa-13.0.2/work/mesa-13.0.2/src/mesa/drivers/dri/i965/brw_clear.c:233
#9  0x00007fca8b60aa5a in QSGBatchRenderer::Renderer::renderBatches() () from /usr/lib64/libQt5Quick.so.5
#10 0x00007fca8b610354 in QSGBatchRenderer::Renderer::render() () from /usr/lib64/libQt5Quick.so.5
#11 0x00007fca8b61b9af in QSGRenderer::renderScene(QSGBindable const&) () from /usr/lib64/libQt5Quick.so.5
#12 0x00007fca8b61c06b in QSGRenderer::renderScene(unsigned int) () from /usr/lib64/libQt5Quick.so.5
#13 0x00007fca8b62bc7e in QSGRenderContext::renderNextFrame(QSGRenderer*, unsigned int) () from /usr/lib64/libQt5Quick.so.5
#14 0x00007fca8b675719 in QQuickWindowPrivate::renderSceneGraph(QSize const&) () from /usr/lib64/libQt5Quick.so.5
#15 0x00007fca8b642935 in ?? () from /usr/lib64/libQt5Quick.so.5
#16 0x00007fca8b643b68 in ?? () from /usr/lib64/libQt5Quick.so.5
#17 0x00007fca88993085 in QWindow::event(QEvent*) () from /usr/lib64/libQt5Gui.so.5
#18 0x00007fca8b67fec5 in QQuickWindow::event(QEvent*) () from /usr/lib64/libQt5Quick.so.5
#19 0x00007fca8cf7eafb in PlasmaQuick::Dialog::event(QEvent*) () from /usr/lib64/libKF5PlasmaQuick.so.5
#20 0x00007fc9d83d67ac in ?? () from /usr/lib64/qt5/qml/org/kde/plasma/core/libcorebindingsplugin.so
#21 0x00007fca88e88acc in QApplicationPrivate::notify_helper(QObject*, QEvent*) () from /usr/lib64/libQt5Widgets.so.5
#22 0x00007fca88e9046e in QApplication::notify(QObject*, QEvent*) () from /usr/lib64/libQt5Widgets.so.5
#23 0x00007fca88654c8a in QCoreApplication::notifyInternal2(QObject*, QEvent*) () from /usr/lib64/libQt5Core.so.5
#24 0x00007fca88988c4d in QGuiApplicationPrivate::processExposeEvent(QWindowSystemInterfacePrivate::ExposeEvent*) () from /usr/lib64/libQt5Gui.so.5
#25 0x00007fca8898987d in QGuiApplicationPrivate::processWindowSystemEvent(QWindowSystemInterfacePrivate::WindowSystemEvent*) () from /usr/lib64/libQt5Gui.so.5
#26 0x00007fca8896a6cb in QWindowSystemInterface::sendWindowSystemEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib64/libQt5Gui.so.5
#27 0x00007fca765ba030 in ?? () from /usr/lib64/libQt5XcbQpa.so.5
#28 0x00007fca83bb75fd in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#29 0x00007fca83bb78e0 in ?? () from /usr/lib64/libglib-2.0.so.0
#30 0x00007fca83bb798c in g_main_context_iteration () from /usr/lib64/libglib-2.0.so.0
#31 0x00007fca886a20af in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib64/libQt5Core.so.5
#32 0x00007fca88653c3a in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib64/libQt5Core.so.5
#33 0x00007fca8865b64c in QCoreApplication::exec() () from /usr/lib64/libQt5Core.so.5
#34 0x000000000041ca98 in ?? ()
#35 0x00007fca87c99650 in __libc_start_main (main=0x41bf30, argc=2, argv=0x7ffcb2204f28, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffcb2204f18) at ../csu/libc-start.c:289
#36 0x000000000041ce19 in _start ()
Comment 7 Ben Widawsky 2016-12-30 00:57:27 UTC
You could try:

diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c b/src/mesa/drivers/dri/i965/brw_misc_state.c
index 40a8d07bfb..ae4967f15d 100644
--- a/src/mesa/drivers/dri/i965/brw_misc_state.c
+++ b/src/mesa/drivers/dri/i965/brw_misc_state.c
@@ -224,7 +224,7 @@ brw_get_depthstencil_tile_masks(struct intel_mipmap_tree *depth_mt,
 static struct intel_mipmap_tree *
 get_stencil_miptree(struct intel_renderbuffer *irb)
-   if (!irb)
+   if (!irb || !irb->mt)
       return NULL;
    if (irb->mt->stencil_mt)
       return irb->mt->stencil_mt;
Comment 8 Bernd Buschinski 2016-12-30 09:53:03 UTC
Ok, I did, but as this is not 100% reproduceable for me... I wonder if I can give a reliable "it is fixed for me".
Comment 9 Ben Widawsky 2016-12-30 19:30:30 UTC
It will certainly fix the segfault. The question is if it breaks something else.
Comment 10 Bernd Buschinski 2017-01-06 12:31:33 UTC
So after 1 week of no crashes, I would say yes, the crash is fixed for me.

But I go some "spam" in dmesg

[16128.464257] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe C (start=142813 end=142814) time 147 us, min 1073, max 1079, scanline start 1072, end 1082
[16163.647659] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe C (start=144924 end=144925) time 169 us, min 1073, max 1079, scanline start 1072, end 1083
[16208.947705] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe C (start=147642 end=147643) time 143 us, min 1073, max 1079, scanline start 1071, end 1081
[16229.047678] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe C (start=148848 end=148849) time 149 us, min 1073, max 1079, scanline start 1069, end 1080
[16414.864584] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe C (start=159997 end=159998) time 162 us, min 1073, max 1079, scanline start 1069, end 1080
[16419.897913] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe C (start=160299 end=160300) time 167 us, min 1073, max 1079, scanline start 1068, end 1080
[16455.046980] [drm:intel_pipe_update_end] *ERROR* Atomic update failure on pipe B (start=162411 end=162412) time 157 us, min 1073, max 1079, scanline start 1069, end 1080

Not sure if this is related or not, I could double check, but afaik it was not present before the patch.
Comment 11 Ben Widawsky 2017-01-06 19:04:58 UTC
It's probably unrelated.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.