Bug 86281 - brw_meta_fast_clear (brw=brw@entry=0x7fffd4097a08, fb=fb@entry=0x7fffd40fa900, buffers=buffers@entry=2, partial_clear=partial_clear@entry=false)
Summary: brw_meta_fast_clear (brw=brw@entry=0x7fffd4097a08, fb=fb@entry=0x7fffd40fa900...
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 10.5
Hardware: Other All
: high critical
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
: 91449 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-11-14 12:13 UTC by Mathieu Malaterre
Modified: 2016-11-03 09:35 UTC (History)
36 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Description of problem, test case. (6.54 KB, text/plain)
2015-08-11 10:44 UTC, Roger
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mathieu Malaterre 2014-11-14 12:13:43 UTC
I can get a segfault in i915 driver using this simple steps (debian/jessie/amd64 up-to-date):

$ wget http://mirrorblender.top-ix.org/peach/bigbuckbunny_movies/big_buck_bunny_480p_h264.mov
$ sudo apt-get install mpv
$ mpv big_buck_bunny_480p_h264.mov
-> press 'f' to get full screen
-> press & hold right-arrow key to fast-forward


See:
https://bugs.debian.org/769518#15

Backtrace is:


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffdbfff700 (LWP 32052)]
brw_meta_fast_clear (brw=brw@entry=0x7fffd4097a08, fb=fb@entry=0x7fffd40fa900, buffers=buffers@entry=2, partial_clear=partial_clear@entry=false)
    at ../../../../../../../src/mesa/drivers/dri/i965/brw_meta_fast_clear.c:447
447 ../../../../../../../src/mesa/drivers/dri/i965/brw_meta_fast_clear.c: No such file or directory.
(gdb) bt
#0  brw_meta_fast_clear (brw=brw@entry=0x7fffd4097a08, fb=fb@entry=0x7fffd40fa900, buffers=buffers@entry=2, partial_clear=partial_clear@entry=false)
    at ../../../../../../../src/mesa/drivers/dri/i965/brw_meta_fast_clear.c:447
#1  0x00007fffdb3c2a48 in brw_clear (ctx=0x7fffd4097a08, mask=2) at ../../../../../../../src/mesa/drivers/dri/i965/brw_clear.c:246
#2  0x0000555555637269 in gl_video_render_frame (p=0x7fffd40fded0) at ../video/out/gl_video.c:1614
#3  0x000055555563c354 in draw_image (vo=<optimized out>, mpi=0x555556370e00) at ../video/out/vo_opengl.c:167
#4  0x0000555555639fa3 in render_frame (vo=0x55555631de90) at ../video/out/vo.c:581
#5  vo_thread (ptr=0x55555631de90) at ../video/out/vo.c:679
#6  0x00007ffff6ac20a4 in start_thread (arg=0x7fffdbfff700) at pthread_create.c:309
#7  0x00007ffff00b3ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

See:

https://bugs.debian.org/769518#5
Comment 1 Matt Turner 2015-03-07 00:21:36 UTC
A bunch of fixes have been committed to the fast clear code. Please test a new version of Mesa.
Comment 2 Lionel Landwerlin 2015-03-09 08:38:39 UTC
Just recompiled 10.4.6 on my Debian system and can't reproduce the crash anymore. Thanks!
Comment 3 Mathieu Malaterre 2015-05-19 20:43:06 UTC
Here is the new backtrace using 10.5.5-1:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffdbfff700 (LWP 15019)]
brw_meta_fast_clear (brw=brw@entry=0x7fffd40b8ff8,
fb=fb@entry=0x7fffd411b9f0, buffers=buffers@entry=2,
partial_clear=partial_clear@entry=false)
    at ../../../../../../../src/mesa/drivers/dri/i965/brw_meta_fast_clear.c:446
446      if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS)
(gdb) bt
#0  brw_meta_fast_clear (brw=brw@entry=0x7fffd40b8ff8,
fb=fb@entry=0x7fffd411b9f0, buffers=buffers@entry=2,
partial_clear=partial_clear@entry=false)
    at ../../../../../../../src/mesa/drivers/dri/i965/brw_meta_fast_clear.c:446
#1  0x00007fffdb36bbb8 in brw_clear (ctx=0x7fffd40b8ff8, mask=2) at
../../../../../../../src/mesa/drivers/dri/i965/brw_clear.c:246
#2  0x0000555555637269 in gl_video_render_frame (p=0x7fffd411f020) at
../video/out/gl_video.c:1614
#3  0x000055555563c354 in draw_image (vo=<optimized out>,
mpi=0x5555562c1700) at ../video/out/vo_opengl.c:167
#4  0x0000555555639fa3 in render_frame (vo=0x5555560c00e0) at
../video/out/vo.c:581
#5  vo_thread (ptr=0x5555560c00e0) at ../video/out/vo.c:679
#6  0x00007ffff6ac20a4 in start_thread (arg=0x7fffdbfff700) at
pthread_create.c:309
#7  0x00007ffff00b004d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

See the `bt full` here:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=769518#38
Comment 4 Tapani Pälli 2015-05-21 10:30:31 UTC
(In reply to mathieu.malaterre from comment #3)
> Here is the new backtrace using 10.5.5-1:
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffdbfff700 (LWP 15019)]
> brw_meta_fast_clear (brw=brw@entry=0x7fffd40b8ff8,
> fb=fb@entry=0x7fffd411b9f0, buffers=buffers@entry=2,
> partial_clear=partial_clear@entry=false)
>     at
> ../../../../../../../src/mesa/drivers/dri/i965/brw_meta_fast_clear.c:446
> 446      if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS)
> (gdb) bt
> #0  brw_meta_fast_clear (brw=brw@entry=0x7fffd40b8ff8,
> fb=fb@entry=0x7fffd411b9f0, buffers=buffers@entry=2,
> partial_clear=partial_clear@entry=false)
>     at
> ../../../../../../../src/mesa/drivers/dri/i965/brw_meta_fast_clear.c:446
> #1  0x00007fffdb36bbb8 in brw_clear (ctx=0x7fffd40b8ff8, mask=2) at
> ../../../../../../../src/mesa/drivers/dri/i965/brw_clear.c:246
> #2  0x0000555555637269 in gl_video_render_frame (p=0x7fffd411f020) at
> ../video/out/gl_video.c:1614
> #3  0x000055555563c354 in draw_image (vo=<optimized out>,
> mpi=0x5555562c1700) at ../video/out/vo_opengl.c:167
> #4  0x0000555555639fa3 in render_frame (vo=0x5555560c00e0) at
> ../video/out/vo.c:581
> #5  vo_thread (ptr=0x5555560c00e0) at ../video/out/vo.c:679
> #6  0x00007ffff6ac20a4 in start_thread (arg=0x7fffdbfff700) at
> pthread_create.c:309
> #7  0x00007ffff00b004d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> 
> See the `bt full` here:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=769518#38

Is this still on Sandybridge (like the debian bug report)?
Comment 5 Mathieu Malaterre 2015-05-21 10:33:42 UTC
No, it's my laptop. If I use the default debian package on it I could reproduce *exactly* the same original segfault, so I assumed I could report it here also.

References:

$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GF108M [GeForce GT 525M] (rev a1)

$ lspci -vv -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
	Subsystem: Dell Device 04c6
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 54
	Region 0: Memory at f1400000 (64-bit, non-prefetchable) [size=4M]
	Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at 5000 [size=64]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: <access denied>
	Kernel driver in use: i915


Please note, that I cannot reproduce this issue when using the NVidia card (need to prefix `mpv` with `optirun`).
Comment 6 Mathieu Malaterre 2015-05-21 10:37:23 UTC
If that matters, it's a DELL Vostro 3750 with:

$ glxinfo | grep renderer
OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Mobile
Comment 7 Tapani Pälli 2015-05-21 10:52:39 UTC
(In reply to mathieu.malaterre from comment #6)
> If that matters, it's a DELL Vostro 3750 with:
> 
> $ glxinfo | grep renderer
> OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Mobile

OK, thanks. Why I asked is that fast clears are not supported on Sandybridge and the backtrace points to a part of code that would not execute on SNB, there are checks for it. Would it be possible that the backtrace is bogus?
Comment 8 Tapani Pälli 2015-05-21 11:07:56 UTC
(In reply to Tapani Pälli from comment #7)
> (In reply to mathieu.malaterre from comment #6)
> > If that matters, it's a DELL Vostro 3750 with:
> > 
> > $ glxinfo | grep renderer
> > OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Mobile
> 
> OK, thanks. Why I asked is that fast clears are not supported on Sandybridge
> and the backtrace points to a part of code that would not execute on SNB,
> there are checks for it. Would it be possible that the backtrace is bogus?

Oops sorry, I was bogus. Fast clear path gets executed on SNB too, it'll just be not that fast. This might be still SNB specific, so it is good to know.
Comment 9 Mathieu Malaterre 2015-05-21 12:43:31 UTC
I've followed suggestion from Tapani Pälli and here is what I have now:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffdbfff700 (LWP 19870)]
brw_meta_fast_clear (brw=brw@entry=0x7fffd40bceb8, fb=fb@entry=0x7fffd411dd10, buffers=buffers@entry=2, partial_clear=partial_clear@entry=false)
    at ../../../../../../../src/mesa/drivers/dri/i965/brw_meta_fast_clear.c:449
449	      if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS)
(gdb) list
444	         clear_type = REP_CLEAR;
445	
446	      assert(irb);
447	      struct intel_mipmap_tree *mt = irb->mt;
448	      assert(mt);
449	      if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS)
450	         clear_type = REP_CLEAR;
451	
452	      /* We can't do scissored fast clears because of the restrictions on the
453	       * fast clear rectangle size.
(gdb) p irb
$1 = <optimized out>
(gdb) p mt
$2 = (struct intel_mipmap_tree *) 0x0
(gdb)
Comment 10 Tapani Pälli 2015-05-25 08:30:39 UTC
I could not reproduce this with a Sandybridge laptop running Ubuntu 14.10, I used oibaf-ppa repositories with Mesa 10.7.0-devel (current git head). I tried to rew and ffwd multiple times. I tried with libvdpau-va-gl1 and also --vo=opengl, did not seem to make difference.
Comment 11 Mathieu Malaterre 2015-05-25 09:59:58 UTC
Difficult for me to reproduce here, since mesa 10.7 now requires libdrm_intel >= 2.4.60.

I'll try to update both packages here.
Comment 12 AnAkkk 2015-06-29 11:26:19 UTC
I have a similar issue which started happening recently, plasmashell crashes a lot in the same function.

#5  0x00007fbf834cfed0 in brw_meta_fast_clear () from /usr/lib/xorg/modules/dri/i965_dri.so
#6  0x00007fbf83469bec in brw_clear () from /usr/lib/xorg/modules/dri/i965_dri.so
#7  0x00007fbfa4fe1d4a in QSGBatchRenderer::Renderer::renderBatches() () from /usr/lib/libQt5Quick.so.5
#8  0x00007fbfa4fe650a in QSGBatchRenderer::Renderer::render() () from /usr/lib/libQt5Quick.so.5
#9  0x00007fbfa4ff115c in QSGRenderer::renderScene(QSGBindable const&) () from /usr/lib/libQt5Quick.so.5
#10 0x00007fbfa4ff15db in QSGRenderer::renderScene(unsigned int) () from /usr/lib/libQt5Quick.so.5
#11 0x00007fbfa4fffcde in QSGRenderContext::renderNextFrame(QSGRenderer*, unsigned int) () from /usr/lib/libQt5Quick.so.5
#12 0x00007fbfa504995c in QQuickWindowPrivate::renderSceneGraph(QSize const&) () from /usr/lib/libQt5Quick.so.5
#13 0x00007fbfa501a2bc in ?? () from /usr/lib/libQt5Quick.so.5
#14 0x00007fbfa501ade1 in ?? () from /usr/lib/libQt5Quick.so.5
#15 0x00007fbfa2ba262c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () from /usr/lib/libQt5Widgets.so.5
#16 0x00007fbfa2ba7d10 in QApplication::notify(QObject*, QEvent*) () from /usr/lib/libQt5Widgets.so.5
#17 0x00007fbfa184b57b in QCoreApplication::notifyInternal(QObject*, QEvent*) () from /usr/lib/libQt5Core.so.5
#18 0x00007fbfa18a1b1d in QTimerInfoList::activateTimers() () from /usr/lib/libQt5Core.so.5
#19 0x00007fbfa18a2021 in ?? () from /usr/lib/libQt5Core.so.5
#20 0x00007fbf9d23d9fd in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
#21 0x00007fbf9d23dce0 in ?? () from /usr/lib/libglib-2.0.so.0
#22 0x00007fbf9d23dd8c in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0
#23 0x00007fbfa18a2cff in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQt5Core.so.5
#24 0x00007fbfa1848ffa in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQt5Core.so.5
#25 0x00007fbfa1850a4c in QCoreApplication::exec() () from /usr/lib/libQt5Core.so.5
#26 0x000000000042ed66 in main ()
Comment 13 AnAkkk 2015-06-29 11:27:29 UTC
I forgot to say that I am on ArchLinux with Mesa 10.6, I've also tried mesa-git, same issue.
Comment 14 Martin Sandsmark 2015-07-05 00:08:39 UTC
I can reproduce it here on ArchLinux with broadwell. Googling a bit shows it happening for people using xbmc, gnome shell or plasma on various distros.

I just hacked in a null check to stop the crashing for now (hard to work when the desktop shell keeps crashing):

diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
index 5b8191c..f0e5e77 100644
--- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
+++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c
@@ -442,6 +442,10 @@ brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb,
       if (rb == NULL)
          continue;
 
+      // For some reason this render buffer can lack a mipmap tree
+      if (irb->mt == NULL)
+          continue;
+
       clear_type = FAST_CLEAR;
 
       /* We don't have fast clear until gen7. */
Comment 15 Martin Sandsmark 2015-07-05 10:57:06 UTC
... which didn't work all that well, now it crashes in gen8_update_renderbuffer_surface() when trying to dereference the mt again.

I guess I need to figure out why the mipmap tree is not set.
Comment 16 Martin Sandsmark 2015-07-18 09:15:02 UTC
Some users report that changing the AccelMethod for the xorg intel driver to UXA seems to fix it. This seems weird to me, because I didn't think it would affect mesa?

Another KDE developer suggested that the bug seems to happen when the system is under above normal load, so I guess this might be the result of some resource exhaustion, and something not checking if their allocation actually succeeded before using the buffer?
Comment 17 maxf 2015-07-20 10:14:05 UTC
I can reproduce this on Arch Linux (kernel 4.1.2, Xorg ga8a0f64 and Mesa 8c8a71f0 from GIT) on Haswell, also using plasmashell:
#5  0x00007f5836872340 in brw_meta_fast_clear (brw=brw@entry=0x25f6ae8, fb=fb@entry=0x5555620, buffers=buffers@entry=34, partial_clear=partial_clear@entry=false) at brw_meta_fast_clear.c:451
#6  0x00007f583680f6cc in brw_clear (ctx=0x25f6ae8, mask=34) at brw_clear.c:247
#7  0x00007f5854d57946 in QSGBatchRenderer::Renderer::renderBatches() (this=this@entry=0x4d063e0) at scenegraph/coreapi/qsgbatchrenderer.cpp:2471
#8  0x00007f5854d5d536 in QSGBatchRenderer::Renderer::render() (this=<optimized out>) at scenegraph/coreapi/qsgbatchrenderer.cpp:2674
#9  0x00007f5854d6b0cf in QSGRenderer::renderScene(QSGBindable const&) (this=0x4d063e0, bindable=...) at scenegraph/coreapi/qsgrenderer.cpp:208
#10 0x00007f5854d6ba1b in QSGRenderer::renderScene(unsigned int) (this=<optimized out>, fboId=<optimized out>) at scenegraph/coreapi/qsgrenderer.cpp:168
#11 0x00007f5854d7f03e in QSGRenderContext::renderNextFrame(QSGRenderer*, unsigned int) (this=0x141c950, renderer=0x4d063e0, fboId=<optimized out>) at scenegraph/qsgcontext.cpp:558
#12 0x00007f5854dd12fb in QQuickWindowPrivate::renderSceneGraph(QSize const&) (this=this@entry=0x3e77470, size=...) at items/qquickwindow.cpp:383
#13 0x00007f5854d9daa1 in QSGGuiThreadRenderLoop::renderWindow(QQuickWindow*) (this=this@entry=0xbbefb0, window=0x3a3cfd0) at scenegraph/qsgrenderloop.cpp:375
#14 0x00007f5854d9ed41 in QSGGuiThreadRenderLoop::event(QEvent*) (this=0xbbefb0, e=<optimized out>) at scenegraph/qsgrenderloop.cpp:471
#15 0x00007f58527e4204 in QApplicationPrivate::notify_helper(QObject*, QEvent*) (this=this@entry=0x969130, receiver=receiver@entry=0xbbefb0, e=e@entry=0x7ffdbffd6c00) at kernel/qapplication.cpp:3717
#16 0x00007f58527e97e9 in QApplication::notify(QObject*, QEvent*) (this=0x7ffdbffd6fc0, receiver=0xbbefb0, e=0x7ffdbffd6c00) at kernel/qapplication.cpp:3500
#17 0x00007f585143431d in QCoreApplication::notifyInternal(QObject*, QEvent*) (this=0x7ffdbffd6fc0, receiver=0xbbefb0, event=event@entry=0x7ffdbffd6c00) at kernel/qcoreapplication.cpp:965
#18 0x00007f585148908a in QTimerInfoList::activateTimers() (event=0x7ffdbffd6c00, receiver=<optimized out>) at ../../include/QtCore/../../src/corelib/kernel/qcoreapplication.h:224
#19 0x00007f585148908a in QTimerInfoList::activateTimers() (this=0x9a0b80) at kernel/qtimerinfo_unix.cpp:637
#20 0x00007f5851489669 in idleTimerSourceDispatch(GSource*, GSourceFunc, gpointer) (source=<optimized out>) at kernel/qeventdispatcher_glib.cpp:177
#21 0x00007f5851489669 in idleTimerSourceDispatch(GSource*, GSourceFunc, gpointer) (source=<optimized out>) at kernel/qeventdispatcher_glib.cpp:224
#22 0x00007f584cdcfd57 in g_main_context_dispatch () at /usr/lib/libglib-2.0.so.0
#23 0x00007f584cdcffb0 in  () at /usr/lib/libglib-2.0.so.0
#24 0x00007f584cdd005c in g_main_context_iteration () at /usr/lib/libglib-2.0.so.0
#25 0x00007f5851489a3f in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (this=0x9a7040, flags=...) at kernel/qeventdispatcher_glib.cpp:418
#26 0x00007f5851432c5a in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (this=this@entry=0x7ffdbffd6e40, flags=..., flags@entry=...) at kernel/qeventloop.cpp:204
#27 0x00007f585143acfc in QCoreApplication::exec() () at kernel/qcoreapplication.cpp:1229
#28 0x000000000042ed66 in main ()
Comment 18 Rhys Kidd 2015-07-24 12:15:49 UTC
*** Bug 91449 has been marked as a duplicate of this bug. ***
Comment 19 Samuel Iglesias Gonsálvez 2015-07-28 09:01:39 UTC
I reproduced this bug on my SNB laptop and Mesa master (HEAD 7850774). I found that just before crashing at brw_meta_fast_clear.c:451, it prints out the following error:

  Failed to open BO for returned DRI2 buffer (1600x900, dri2 back buffer, named 11).
  This is likely a bug in the X Server that will lead to a crash soon.

Which is printed at intel_process_dri2_buffer() when drm_intel_bo_gem_create_from_name() returns a NULL pointer. I added some traces to that function at libdrm and found that drmIoctl(bufmgr_gem->fd, DRM_IOCTL_GEM_OPEN, &open_arg) is returning an error, so this bug seems to be produced by the kernel driver.

My distro is Debian Jessie with a Linux kernel 3.14.

Hope this helps.
Comment 20 Samuel Iglesias Gonsálvez 2015-07-28 09:14:04 UTC
(In reply to Samuel Iglesias from comment #19)
> I reproduced this bug on my SNB laptop and Mesa master (HEAD 7850774). I
> found that just before crashing at brw_meta_fast_clear.c:451, it prints out
> the following error:
> 
>   Failed to open BO for returned DRI2 buffer (1600x900, dri2 back buffer,
> named 11).
>   This is likely a bug in the X Server that will lead to a crash soon.
> 
> Which is printed at intel_process_dri2_buffer() when
> drm_intel_bo_gem_create_from_name() returns a NULL pointer.

Forgot to mention that when this error message is printed out, we return from intel_process_dri2_buffer() but the miptree was free'd before by intel_miptree_release(&rb->mt) call. Later on, we try to dereference it at brw_meta_fast_clear.c:451 and segmentation fault happens.

> I added some
> traces to that function at libdrm and found that drmIoctl(bufmgr_gem->fd,
> DRM_IOCTL_GEM_OPEN, &open_arg) is returning an error, so this bug seems to
> be produced by the kernel driver.
> 
> My distro is Debian Jessie with a Linux kernel 3.14.
> 
> Hope this helps.
Comment 21 Martin Sandsmark 2015-07-30 09:36:42 UTC
Good job tracking it further! I kind of gave up, especially after figuring out the workaround by choosing UXA as AccelMethod.

I'm not sure exactly what the proper behavior should be here (should everything check if the mipmap tree is null?), which makes it harder for me to get started on a patch. Trying to trace the failing path in the kernel is a bit beyond my skills and what I have time for at the moment
Comment 22 Rodrigo Vivi 2015-07-30 22:35:16 UTC
Martin, which xf86-video-intel version are you using?

Chris, any idea what would this affect?
Comment 23 Martin Sandsmark 2015-07-30 22:38:59 UTC
I currently have xf86-video-intel 1:2.99.917+381+g5772556-1.
Comment 24 Einar Hov 2015-07-31 00:07:18 UTC
I can reproduce this crash on a Broadwell laptop running Arch, both in mpv with vo=gl and in other OpenGL applications.

mesa 10.6.3-1
xf86-video-intel 1:2.99.917+381+g5772556-1

I have not been able to reproduce the issue with AccelMethod UXA. However, UXA method on this system does not perform at an adequate level.

I have not been able to reproduce the issue with DRI3. I have observed no performance issues with DRI3, so this can be an alternative workaround if you experience low performance on UXA.
Comment 25 AnAkkk 2015-08-02 15:56:40 UTC
I am now actually getting this crash a lot of times per day, it's crashing plasmashell all the time.

I'm not sure when it started happening but I didn't have any issues before. I'm guessing it started after an update to xf86-video-intel.

xf86-video-intel 1:2.99.917+381+g5772556-1
mesa 10.6.3

Many people running Plasma 5 seem to be affected:
https://www.reddit.com/r/linux/comments/3f5567/the_intel_driver_bug_that_crashes_plasma5_and/
Comment 26 antonykikaxa 2015-08-06 10:15:39 UTC
(In reply to Samuel Iglesias from comment #20)
> (In reply to Samuel Iglesias from comment #19)
> > I added some
> > traces to that function at libdrm and found that drmIoctl(bufmgr_gem->fd,
> > DRM_IOCTL_GEM_OPEN, &open_arg) is returning an error, so this bug seems to
> > be produced by the kernel driver.
> > 
> > My distro is Debian Jessie with a Linux kernel 3.14.

Just my 5 cents:

I had several semi-rare crashes in plasma/kickoff-tooltips a long time ago on suse tumbleweed, but it's stable for a long time now, ~since 3.19 kernel.

As per this comment, this may be an old bug in the kernel, so you should probably test with new *kernels*, not mesa.
Comment 27 bastian.beischer 2015-08-06 10:19:55 UTC
People on Arch Linux are using kernel 4.1.4 at the moment, so it's definitely not an issue that's solved with new kernel.

I'm quite confident the problem is in mesa.
Comment 28 Martin Sandsmark 2015-08-06 21:53:34 UTC
I was told on IRC earlier today that someone from Intel said the bug was fixed in git, but I just tested with 30a7e0c021c3a77c20c6f041dc80b7dc90ad238f, and the first thing that happened when I logged in was this crash.
Comment 29 StructureDr 2015-08-06 22:00:27 UTC
Just a few comments in case it helps.  I run Manjaro KDE - currently Plasma 5.3.2-2 64 bit.

I started seeing these segmentation issues about a month ago - seemed to coincide with a Plasma update so initially blamed that.  Was also running an older kernel (3.18), and wondered about that to.

Recently switched to kernel 4.1 - this made no difference.

As suggested by others, switching to UXA mode solved the problem, but graphics performance (Haswell i5) was unacceptable.

Last night started progressively downgrading mesa and xf86-video-intel in various combinations.  Cross fingers I have stable behavior now - having dropped to:

               mesa 10.5.7-1 
               xf86-video-intel 2.99.917-5.  

My testing wasn't extensive, but switching to either of these on their own did not seem to solve the problem - only downgrading both packages.
Comment 30 AnAkkk 2015-08-06 23:10:34 UTC
(In reply to Martin Sandsmark from comment #28)
> I was told on IRC earlier today that someone from Intel said the bug was
> fixed in git, but I just tested with
> 30a7e0c021c3a77c20c6f041dc80b7dc90ad238f, and the first thing that happened
> when I logged in was this crash.

He probably meant the git from xf86-video-intel.
I've started using mesa-git and xf86-video-intel-git today, I haven't had any crashes so far.
Comment 31 Cjacker 2015-08-07 02:56:56 UTC
There is no commit to xf86-drv-intel since July 31, but today or yesterday a log of commits happened in mesa git.

With kernel-4.2rc5/mesa git/xf86-drv-intel git, I can confirm there is no such crash when enable dri3 in driver.

I just update the mesa git and switch driver back to dri2 to verify it's fixed or not.
Comment 32 Cjacker 2015-08-07 03:16:06 UTC
(In reply to Martin Sandsmark from comment #28)
> I was told on IRC earlier today that someone from Intel said the bug was
> fixed in git, but I just tested with
> 30a7e0c021c3a77c20c6f041dc80b7dc90ad238f, and the first thing that happened
> when I logged in was this crash.

Bug not fixed, when enabling dri2 of driver, with kernel-4.2rc5/mesa git head/xf86-drv-video git head/drm git head.

Hardware is "Thinkpad x250" with "00:02.0 VGA compatible controller: Intel Corporation Broadwell-U Integrated Graphics (rev 09)"(00:02.0 0300: 8086:1616 (rev 09)).


There is still no easy way to reproduce it, generally, I started ltp test and clicked kickoff menu or some popups. after serveral minutes, segfault happened.


Thread 1 (Thread 0x7f55086dd840 (LWP 983)):
[KCrash Handler]
#6  0x00007f54f7af19dc in brw_meta_fast_clear (brw=brw@entry=0x1c69aa8, fb=fb@entry=0x65868a0, buffers=buffers@entry=34, partial_clear=partial_clear@entry=false) at brw_meta_fast_clear.c:451
#7  0x00007f54f7a8d73c in brw_clear (ctx=0x1c69aa8, mask=34) at brw_clear.c:247
#8  0x0000003b56527676 in QSGBatchRenderer::Renderer::renderBatches() () at /lib/libQt5Quick.so.5
#9  0x0000003b5652cdb2 in QSGBatchRenderer::Renderer::render() () at /lib/libQt5Quick.so.5
#10 0x0000003b565386af in QSGRenderer::renderScene(QSGBindable const&) () at /lib/libQt5Quick.so.5
#11 0x0000003b56538edb in QSGRenderer::renderScene(unsigned int) () at /lib/libQt5Quick.so.5
#12 0x0000003b56548a7e in QSGRenderContext::renderNextFrame(QSGRenderer*, unsigned int) () at /lib/libQt5Quick.so.5
#13 0x0000003b5659009b in QQuickWindowPrivate::renderSceneGraph(QSize const&) () at /lib/libQt5Quick.so.5
#14 0x0000003b56562933 in  () at /lib/libQt5Quick.so.5
#15 0x0000003b56563a31 in  () at /lib/libQt5Quick.so.5
#16 0x0000003b4f59632c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /lib/libQt5Widgets.so.5
#17 0x0000003b4f59b446 in QApplication::notify(QObject*, QEvent*) () at /lib/libQt5Widgets.so.5
#18 0x00000035b8490b73 in QCoreApplication::notifyInternal(QObject*, QEvent*) () at /lib/libQt5Core.so.5
#19 0x00000035b84e321d in QTimerInfoList::activateTimers() () at /lib/libQt5Core.so.5
#20 0x00000035b84e3769 in  () at /lib/libQt5Core.so.5
#21 0x00000038dec48847 in g_main_context_dispatch () at /lib/libglib-2.0.so.0
#22 0x00000038dec48a78 in  () at /lib/libglib-2.0.so.0
#23 0x00000038dec48b1c in g_main_context_iteration () at /lib/libglib-2.0.so.0
#24 0x00000035b84e436f in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/libQt5Core.so.5
#25 0x00000035b848e6da in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/libQt5Core.so.5
#26 0x00000035b849624d in QCoreApplication::exec() () at /lib/libQt5Core.so.5
#27 0x000000000042e8fd in main ()
Comment 33 AnAkkk 2015-08-07 18:48:22 UTC
I can confirm that with mesa-git and xf86-video-intel-git it still crashes, I had the issue again today.
Comment 34 Martin Sandsmark 2015-08-07 19:03:23 UTC
I looked at the stuff Samuel found out and went to look at the libdrm git, where I found a couple of commits related to passing buffer alignment restrictions to the kernel, which I thought looked plausibly relevant to my layman eyes.

So I installed libdrm-git yesterday, and I haven't had a crash in almost two days. 

It's hard to definitely say if it is fixed because of the spurious nature of the bug, but it looks good so far (if I'm not mistaken there could also be several reasons for the miptree to not be allocated, though, so it's not given that all crashes with this backtrace are related).
Comment 35 Martin Sandsmark 2015-08-07 23:40:48 UTC
Never mind, crashed again. Back to square one. (And sorry for all the noise here...)
Comment 36 Martin Sandsmark 2015-08-08 01:59:47 UTC
I looked in the user journal, and found first this:

aug. 08 01:40:07 neruval kernel: [drm:gen8_irq_handler [i915]] *ERROR* The master control interrupt lied (SDE)!

Then it seems like kscreen (the display handler for Plasma) tried to launch another instance of itself (which fails, because it is already running), and then receives some weird notifications about the displays changing:

aug. 08 01:40:07 neruval kscreen_backend_launcher[28810]: kscreen.xrandr: Connected output 67 to CRTC 63
aug. 08 01:40:07 neruval kscreen_backend_launcher[28810]: kscreen.xcb.helper: Detected XRandR 1.4
aug. 08 01:40:07 neruval kscreen_backend_launcher[28810]: kscreen.xcb.helper: Event Base:  89
aug. 08 01:40:07 neruval kscreen_backend_launcher[28810]: kscreen.xcb.helper: Event Error:  147
aug. 08 01:40:07 neruval kscreen_backend_launcher[28810]: kscreen.backendLauncher: Loading "XRandR" backend
aug. 08 01:40:07 neruval kscreen_backend_launcher[28810]: kscreen.backendLauncher: Failed to register as DBus service: another launcher already running?
aug. 08 01:40:07 neruval kscreen_backend_launcher[28810]: kscreen.backendLauncher: ""
aug. 08 01:40:07 neruval kscreen_backend_launcher[10198]: kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" ) to KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" )
aug. 08 01:40:07 neruval kscreen_backend_launcher[10198]: kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" ) to KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" )
aug. 08 01:40:07 neruval kscreen_backend_launcher[10198]: kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" ) to KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" )
aug. 08 01:40:07 neruval kscreen_backend_launcher[10198]: kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" ) to KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" )

And a couple of seconds after that, Plasma crashes:

aug. 08 01:40:09 neruval drkonqi[28805]: Sending SIGSTOP to process

My theory from these logs was that because of these weird display changes Plasma resizes itself to some really weird sizes, which leads to mesa trying to allocate some really weird BOs, which the kernel fails to allocate, and then things crash.

I added some debug output to libdrm, to see what errors the kernel was actually returning:

================= DRMIOCTL FAILED 22: Invalid argument
================= DRMIOCTL FAILED 1: Operation not permitted
================= DRMIOCTL FAILED 22: Invalid argument


The invalid argument error I think might support my theory. I forgot to log the actual request, but I'll do that now, hopefully it will help to track this further. If it is the error from the IRQ handler that causes this, I'm even more out of my depth, though.
Comment 37 Cjacker 2015-08-08 05:55:57 UTC
(In reply to Martin Sandsmark from comment #36)
> I looked in the user journal, and found first this:
> 
> aug. 08 01:40:07 neruval kernel: [drm:gen8_irq_handler [i915]] *ERROR* The
> master control interrupt lied (SDE)!
> 
> Then it seems like kscreen (the display handler for Plasma) tried to launch
> another instance of itself (which fails, because it is already running), and
> then receives some weird notifications about the displays changing:

I never got such outputs when crash happened. 

It seems the crash only happened when high workload, but I am not sure about that.

The bios I used also had some problems(I guess) with VTD and iommu, I also not sure is this issue relate to bios or not.

You can have a try to build xf86-video-intel with "--with-default-dri=3". at least, after default to dri3, there is no crash here.
Comment 38 szunti 2015-08-08 17:43:02 UTC
I have probably the same issue, the backtrace on the crash is a bit different, but my card might not have fast_clear.

I have the

Failed to open BO for returned DRI2 buffer

message before the crash. And it doesn't happen with uxa or sna+dri3.

I have an apitrace trace that makes it happen for me in 4 out of 5 times.

run with (after decompressing)
$ glretrace replay test_case.trace

It's 14MB so I uploaded it on an external site. (https://mega.co.nz/#!O0MHRByI!z7GywnuO8Ai_9633pmj5FY8ejzvKYBsDAri1lXFSXN0)

Software versions:
------------------
Linux 4.1.4 (Archlinux)
mesa 10.6.3
libdrm 2.4.62+106+gc8df9e7-1 (git master at the time of writing)
xf86-video-intel 2.99.917+426+g611ec7d-1 (git master) 


Hack to fix it in gdb
----------------------
After drm_intel_bo_gem_create_from_name fails(brw_context.c:1421), if I run (from gdb) getBuffersWithFormat and drm_intel_bo_gem_create_for_name again
then it succeds.
(This is what has to be done when dri2 buffers got invalidated)



Backtrace when drm_intel_bo_gem_create_from_name fails
------------------------------------------------------
#0  intel_process_dri2_buffer (buffer_name=0x7ffff21ddf75 "dri2 back buffer", rb=0xaf2cb0, buffer=0xaf2b50, drawable=0xa76ad0, brw=0x7ffff7fd1038)
    at brw_context.c:1423
#1  intel_update_dri2_buffers (drawable=0xa76ad0, brw=0x7ffff7fd1038) at brw_context.c:1226
#2  intel_update_renderbuffers (context=context@entry=0xbaea90, drawable=drawable@entry=0xa76ad0) at brw_context.c:1248
#3  0x00007ffff203d5b1 in intel_prepare_render (brw=brw@entry=0x7ffff7fd1038) at brw_context.c:1267
#4  0x00007ffff2031290 in brw_clear (ctx=0x7ffff7fd1038, mask=18) at brw_clear.c:234
#5  0x00000000004e8c96 in ?? ()
#6  0x000000000040bd1d in ?? ()
#7  0x000000000040c37c in ?? ()
#8  0x0000000000407b05 in ?? ()
#9  0x00007ffff61e0790 in __libc_start_main () from /usr/lib/libc.so.6
#10 0x00000000004095e9 in _start ()
Comment 39 Martin Peres 2015-08-11 09:26:26 UTC
This patch should be a good workaround: http://cgit.freedesktop.org/~ickle/mesa/commit/?h=brw-batch&id=e2a696a4cd93c2dbe445243de48ed478fbdb8009

I will test it tonight on my home machine and see if I can reproduce it. The patch may make the screen flicker for a frame instead of crashing. The actual problem is a race condition of DRI2 that is not trivial to fix.
Comment 40 Roger 2015-08-11 10:44:00 UTC
Created attachment 117626 [details]
Description of problem, test case.
Comment 41 Jan Alexander Steffens (heftig) 2015-08-12 11:34:27 UTC
(In reply to Martin Peres from comment #39)
> This patch should be a good workaround:
> http://cgit.freedesktop.org/~ickle/mesa/commit/?h=brw-
> batch&id=e2a696a4cd93c2dbe445243de48ed478fbdb8009
> 
> I will test it tonight on my home machine and see if I can reproduce it. The
> patch may make the screen flicker for a frame instead of crashing. The
> actual problem is a race condition of DRI2 that is not trivial to fix.

This patch seems to solve the brw_meta_fast_clear crashes I've had in Firefox and Steam since Mesa 10.6. Cheers!
Comment 42 Martin Peres 2015-08-13 11:50:28 UTC
(In reply to Jan Alexander Steffens (heftig) from comment #41)
> (In reply to Martin Peres from comment #39)
> > This patch should be a good workaround:
> > http://cgit.freedesktop.org/~ickle/mesa/commit/?h=brw-
> > batch&id=e2a696a4cd93c2dbe445243de48ed478fbdb8009
> > 
> > I will test it tonight on my home machine and see if I can reproduce it. The
> > patch may make the screen flicker for a frame instead of crashing. The
> > actual problem is a race condition of DRI2 that is not trivial to fix.
> 
> This patch seems to solve the brw_meta_fast_clear crashes I've had in
> Firefox and Steam since Mesa 10.6. Cheers!

Same at home. No crashes to report since I applied the patch. I will review the patch and push it upstream tomorrow.
Comment 43 Tatsuyuki Ishi 2015-08-13 23:34:32 UTC
I'd rather a complete fix merged into upstream. Workaround should be applied by user-patched builds.
Comment 44 Martin Peres 2015-08-31 08:07:19 UTC
(In reply to Tatsuyuki Ishi from comment #43)
> I'd rather a complete fix merged into upstream. Workaround should be applied
> by user-patched builds.

The complete fix requires fixing the xserver, the ddx and mesa. This is in the pipe but it is not coming any time soon. In the mean time, let's avoid crashing if we can. The code will still be useful when everything is fixed!
Comment 45 AnAkkk 2015-08-31 22:24:50 UTC
Has the patch been included in mesa git?
Comment 46 Alberto Salvia Novella 2015-09-04 15:15:30 UTC
Reported also in <https://bugs.launchpad.net/ubuntu/+source/mesa/+bug/1492037>.
Comment 47 Germano Massullo 2015-09-04 15:28:01 UTC
Fedora bugreport of this bug (linked to Freedesktop bugreport too)
https://bugzilla.redhat.com/show_bug.cgi?id=1259443
Comment 48 AnAkkk 2015-09-23 11:41:02 UTC
Anything new on merging the patch?
Comment 49 Martin Peres 2015-09-30 07:54:03 UTC
(In reply to AnAkkk from comment #48)
> Anything new on merging the patch?

Done!
Comment 50 Matt Turner 2016-11-02 06:15:07 UTC
(In reply to Martin Peres from comment #49)
> (In reply to AnAkkk from comment #48)
> > Anything new on merging the patch?
> 
> Done!

commit 70e91d61fde239e8ae58148cacd4ff891126e2aa
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Aug 7 21:13:12 2015 +0100

    i965: Remove early release of DRI2 miptree
Comment 51 Mathieu Malaterre 2016-11-03 09:35:11 UTC
As said above this has long been fixed:

commit 70e91d61fde239e8ae58148cacd4ff891126e2aa
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Aug 7 21:13:12 2015 +0100

    i965: Remove early release of DRI2 miptree

Closing.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.