Summary: | brw_meta_fast_clear (brw=brw@entry=0x7fffd4097a08, fb=fb@entry=0x7fffd40fa900, buffers=buffers@entry=2, partial_clear=partial_clear@entry=false) | ||
---|---|---|---|
Product: | Mesa | Reporter: | Mathieu Malaterre <mathieu.malaterre> |
Component: | Drivers/DRI/i965 | Assignee: | Ian Romanick <idr> |
Status: | RESOLVED FIXED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | critical | ||
Priority: | high | CC: | anakin.cs, andyrtr, bastian.beischer, beniamino, bugs, chadversary, chris, enrico.tagliavini, eugene.shalygin+bugzilla.FDO, facorread, fademind, frederik.schwan, fritsch, germano.massullo, holger.k, ishitatsuyuki, jholtrop, jirislaby, jlp.bugs, kiril, lee295012, martin.fdo, maxf, mishu, nrndda, n.schnelle, pavel.ondracka, perry3d, peuc, rdieter, roger.powell, siglesias, thuryn1, tibbs, travneff, vanoudt |
Version: | 10.5 | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | Description of problem, test case. |
Description
Mathieu Malaterre
2014-11-14 12:13:43 UTC
A bunch of fixes have been committed to the fast clear code. Please test a new version of Mesa. Just recompiled 10.4.6 on my Debian system and can't reproduce the crash anymore. Thanks! Here is the new backtrace using 10.5.5-1: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffdbfff700 (LWP 15019)] brw_meta_fast_clear (brw=brw@entry=0x7fffd40b8ff8, fb=fb@entry=0x7fffd411b9f0, buffers=buffers@entry=2, partial_clear=partial_clear@entry=false) at ../../../../../../../src/mesa/drivers/dri/i965/brw_meta_fast_clear.c:446 446 if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS) (gdb) bt #0 brw_meta_fast_clear (brw=brw@entry=0x7fffd40b8ff8, fb=fb@entry=0x7fffd411b9f0, buffers=buffers@entry=2, partial_clear=partial_clear@entry=false) at ../../../../../../../src/mesa/drivers/dri/i965/brw_meta_fast_clear.c:446 #1 0x00007fffdb36bbb8 in brw_clear (ctx=0x7fffd40b8ff8, mask=2) at ../../../../../../../src/mesa/drivers/dri/i965/brw_clear.c:246 #2 0x0000555555637269 in gl_video_render_frame (p=0x7fffd411f020) at ../video/out/gl_video.c:1614 #3 0x000055555563c354 in draw_image (vo=<optimized out>, mpi=0x5555562c1700) at ../video/out/vo_opengl.c:167 #4 0x0000555555639fa3 in render_frame (vo=0x5555560c00e0) at ../video/out/vo.c:581 #5 vo_thread (ptr=0x5555560c00e0) at ../video/out/vo.c:679 #6 0x00007ffff6ac20a4 in start_thread (arg=0x7fffdbfff700) at pthread_create.c:309 #7 0x00007ffff00b004d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 See the `bt full` here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=769518#38 (In reply to mathieu.malaterre from comment #3) > Here is the new backtrace using 10.5.5-1: > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffdbfff700 (LWP 15019)] > brw_meta_fast_clear (brw=brw@entry=0x7fffd40b8ff8, > fb=fb@entry=0x7fffd411b9f0, buffers=buffers@entry=2, > partial_clear=partial_clear@entry=false) > at > ../../../../../../../src/mesa/drivers/dri/i965/brw_meta_fast_clear.c:446 > 446 if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS) > (gdb) bt > #0 brw_meta_fast_clear (brw=brw@entry=0x7fffd40b8ff8, > fb=fb@entry=0x7fffd411b9f0, buffers=buffers@entry=2, > partial_clear=partial_clear@entry=false) > at > ../../../../../../../src/mesa/drivers/dri/i965/brw_meta_fast_clear.c:446 > #1 0x00007fffdb36bbb8 in brw_clear (ctx=0x7fffd40b8ff8, mask=2) at > ../../../../../../../src/mesa/drivers/dri/i965/brw_clear.c:246 > #2 0x0000555555637269 in gl_video_render_frame (p=0x7fffd411f020) at > ../video/out/gl_video.c:1614 > #3 0x000055555563c354 in draw_image (vo=<optimized out>, > mpi=0x5555562c1700) at ../video/out/vo_opengl.c:167 > #4 0x0000555555639fa3 in render_frame (vo=0x5555560c00e0) at > ../video/out/vo.c:581 > #5 vo_thread (ptr=0x5555560c00e0) at ../video/out/vo.c:679 > #6 0x00007ffff6ac20a4 in start_thread (arg=0x7fffdbfff700) at > pthread_create.c:309 > #7 0x00007ffff00b004d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > See the `bt full` here: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=769518#38 Is this still on Sandybridge (like the debian bug report)? No, it's my laptop. If I use the default debian package on it I could reproduce *exactly* the same original segfault, so I assumed I could report it here also. References: $ lspci | grep VGA 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) 01:00.0 VGA compatible controller: NVIDIA Corporation GF108M [GeForce GT 525M] (rev a1) $ lspci -vv -s 00:02.0 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller]) Subsystem: Dell Device 04c6 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 54 Region 0: Memory at f1400000 (64-bit, non-prefetchable) [size=4M] Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M] Region 4: I/O ports at 5000 [size=64] Expansion ROM at <unassigned> [disabled] Capabilities: <access denied> Kernel driver in use: i915 Please note, that I cannot reproduce this issue when using the NVidia card (need to prefix `mpv` with `optirun`). If that matters, it's a DELL Vostro 3750 with: $ glxinfo | grep renderer OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Mobile (In reply to mathieu.malaterre from comment #6) > If that matters, it's a DELL Vostro 3750 with: > > $ glxinfo | grep renderer > OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Mobile OK, thanks. Why I asked is that fast clears are not supported on Sandybridge and the backtrace points to a part of code that would not execute on SNB, there are checks for it. Would it be possible that the backtrace is bogus? (In reply to Tapani Pälli from comment #7) > (In reply to mathieu.malaterre from comment #6) > > If that matters, it's a DELL Vostro 3750 with: > > > > $ glxinfo | grep renderer > > OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Mobile > > OK, thanks. Why I asked is that fast clears are not supported on Sandybridge > and the backtrace points to a part of code that would not execute on SNB, > there are checks for it. Would it be possible that the backtrace is bogus? Oops sorry, I was bogus. Fast clear path gets executed on SNB too, it'll just be not that fast. This might be still SNB specific, so it is good to know. I've followed suggestion from Tapani Pälli and here is what I have now: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffdbfff700 (LWP 19870)] brw_meta_fast_clear (brw=brw@entry=0x7fffd40bceb8, fb=fb@entry=0x7fffd411dd10, buffers=buffers@entry=2, partial_clear=partial_clear@entry=false) at ../../../../../../../src/mesa/drivers/dri/i965/brw_meta_fast_clear.c:449 449 if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS) (gdb) list 444 clear_type = REP_CLEAR; 445 446 assert(irb); 447 struct intel_mipmap_tree *mt = irb->mt; 448 assert(mt); 449 if (irb->mt->fast_clear_state == INTEL_FAST_CLEAR_STATE_NO_MCS) 450 clear_type = REP_CLEAR; 451 452 /* We can't do scissored fast clears because of the restrictions on the 453 * fast clear rectangle size. (gdb) p irb $1 = <optimized out> (gdb) p mt $2 = (struct intel_mipmap_tree *) 0x0 (gdb) I could not reproduce this with a Sandybridge laptop running Ubuntu 14.10, I used oibaf-ppa repositories with Mesa 10.7.0-devel (current git head). I tried to rew and ffwd multiple times. I tried with libvdpau-va-gl1 and also --vo=opengl, did not seem to make difference. Difficult for me to reproduce here, since mesa 10.7 now requires libdrm_intel >= 2.4.60. I'll try to update both packages here. I have a similar issue which started happening recently, plasmashell crashes a lot in the same function. #5 0x00007fbf834cfed0 in brw_meta_fast_clear () from /usr/lib/xorg/modules/dri/i965_dri.so #6 0x00007fbf83469bec in brw_clear () from /usr/lib/xorg/modules/dri/i965_dri.so #7 0x00007fbfa4fe1d4a in QSGBatchRenderer::Renderer::renderBatches() () from /usr/lib/libQt5Quick.so.5 #8 0x00007fbfa4fe650a in QSGBatchRenderer::Renderer::render() () from /usr/lib/libQt5Quick.so.5 #9 0x00007fbfa4ff115c in QSGRenderer::renderScene(QSGBindable const&) () from /usr/lib/libQt5Quick.so.5 #10 0x00007fbfa4ff15db in QSGRenderer::renderScene(unsigned int) () from /usr/lib/libQt5Quick.so.5 #11 0x00007fbfa4fffcde in QSGRenderContext::renderNextFrame(QSGRenderer*, unsigned int) () from /usr/lib/libQt5Quick.so.5 #12 0x00007fbfa504995c in QQuickWindowPrivate::renderSceneGraph(QSize const&) () from /usr/lib/libQt5Quick.so.5 #13 0x00007fbfa501a2bc in ?? () from /usr/lib/libQt5Quick.so.5 #14 0x00007fbfa501ade1 in ?? () from /usr/lib/libQt5Quick.so.5 #15 0x00007fbfa2ba262c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () from /usr/lib/libQt5Widgets.so.5 #16 0x00007fbfa2ba7d10 in QApplication::notify(QObject*, QEvent*) () from /usr/lib/libQt5Widgets.so.5 #17 0x00007fbfa184b57b in QCoreApplication::notifyInternal(QObject*, QEvent*) () from /usr/lib/libQt5Core.so.5 #18 0x00007fbfa18a1b1d in QTimerInfoList::activateTimers() () from /usr/lib/libQt5Core.so.5 #19 0x00007fbfa18a2021 in ?? () from /usr/lib/libQt5Core.so.5 #20 0x00007fbf9d23d9fd in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0 #21 0x00007fbf9d23dce0 in ?? () from /usr/lib/libglib-2.0.so.0 #22 0x00007fbf9d23dd8c in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0 #23 0x00007fbfa18a2cff in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQt5Core.so.5 #24 0x00007fbfa1848ffa in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQt5Core.so.5 #25 0x00007fbfa1850a4c in QCoreApplication::exec() () from /usr/lib/libQt5Core.so.5 #26 0x000000000042ed66 in main () I forgot to say that I am on ArchLinux with Mesa 10.6, I've also tried mesa-git, same issue. I can reproduce it here on ArchLinux with broadwell. Googling a bit shows it happening for people using xbmc, gnome shell or plasma on various distros. I just hacked in a null check to stop the crashing for now (hard to work when the desktop shell keeps crashing): diff --git a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c index 5b8191c..f0e5e77 100644 --- a/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c +++ b/src/mesa/drivers/dri/i965/brw_meta_fast_clear.c @@ -442,6 +442,10 @@ brw_meta_fast_clear(struct brw_context *brw, struct gl_framebuffer *fb, if (rb == NULL) continue; + // For some reason this render buffer can lack a mipmap tree + if (irb->mt == NULL) + continue; + clear_type = FAST_CLEAR; /* We don't have fast clear until gen7. */ ... which didn't work all that well, now it crashes in gen8_update_renderbuffer_surface() when trying to dereference the mt again. I guess I need to figure out why the mipmap tree is not set. Some users report that changing the AccelMethod for the xorg intel driver to UXA seems to fix it. This seems weird to me, because I didn't think it would affect mesa? Another KDE developer suggested that the bug seems to happen when the system is under above normal load, so I guess this might be the result of some resource exhaustion, and something not checking if their allocation actually succeeded before using the buffer? I can reproduce this on Arch Linux (kernel 4.1.2, Xorg ga8a0f64 and Mesa 8c8a71f0 from GIT) on Haswell, also using plasmashell: #5 0x00007f5836872340 in brw_meta_fast_clear (brw=brw@entry=0x25f6ae8, fb=fb@entry=0x5555620, buffers=buffers@entry=34, partial_clear=partial_clear@entry=false) at brw_meta_fast_clear.c:451 #6 0x00007f583680f6cc in brw_clear (ctx=0x25f6ae8, mask=34) at brw_clear.c:247 #7 0x00007f5854d57946 in QSGBatchRenderer::Renderer::renderBatches() (this=this@entry=0x4d063e0) at scenegraph/coreapi/qsgbatchrenderer.cpp:2471 #8 0x00007f5854d5d536 in QSGBatchRenderer::Renderer::render() (this=<optimized out>) at scenegraph/coreapi/qsgbatchrenderer.cpp:2674 #9 0x00007f5854d6b0cf in QSGRenderer::renderScene(QSGBindable const&) (this=0x4d063e0, bindable=...) at scenegraph/coreapi/qsgrenderer.cpp:208 #10 0x00007f5854d6ba1b in QSGRenderer::renderScene(unsigned int) (this=<optimized out>, fboId=<optimized out>) at scenegraph/coreapi/qsgrenderer.cpp:168 #11 0x00007f5854d7f03e in QSGRenderContext::renderNextFrame(QSGRenderer*, unsigned int) (this=0x141c950, renderer=0x4d063e0, fboId=<optimized out>) at scenegraph/qsgcontext.cpp:558 #12 0x00007f5854dd12fb in QQuickWindowPrivate::renderSceneGraph(QSize const&) (this=this@entry=0x3e77470, size=...) at items/qquickwindow.cpp:383 #13 0x00007f5854d9daa1 in QSGGuiThreadRenderLoop::renderWindow(QQuickWindow*) (this=this@entry=0xbbefb0, window=0x3a3cfd0) at scenegraph/qsgrenderloop.cpp:375 #14 0x00007f5854d9ed41 in QSGGuiThreadRenderLoop::event(QEvent*) (this=0xbbefb0, e=<optimized out>) at scenegraph/qsgrenderloop.cpp:471 #15 0x00007f58527e4204 in QApplicationPrivate::notify_helper(QObject*, QEvent*) (this=this@entry=0x969130, receiver=receiver@entry=0xbbefb0, e=e@entry=0x7ffdbffd6c00) at kernel/qapplication.cpp:3717 #16 0x00007f58527e97e9 in QApplication::notify(QObject*, QEvent*) (this=0x7ffdbffd6fc0, receiver=0xbbefb0, e=0x7ffdbffd6c00) at kernel/qapplication.cpp:3500 #17 0x00007f585143431d in QCoreApplication::notifyInternal(QObject*, QEvent*) (this=0x7ffdbffd6fc0, receiver=0xbbefb0, event=event@entry=0x7ffdbffd6c00) at kernel/qcoreapplication.cpp:965 #18 0x00007f585148908a in QTimerInfoList::activateTimers() (event=0x7ffdbffd6c00, receiver=<optimized out>) at ../../include/QtCore/../../src/corelib/kernel/qcoreapplication.h:224 #19 0x00007f585148908a in QTimerInfoList::activateTimers() (this=0x9a0b80) at kernel/qtimerinfo_unix.cpp:637 #20 0x00007f5851489669 in idleTimerSourceDispatch(GSource*, GSourceFunc, gpointer) (source=<optimized out>) at kernel/qeventdispatcher_glib.cpp:177 #21 0x00007f5851489669 in idleTimerSourceDispatch(GSource*, GSourceFunc, gpointer) (source=<optimized out>) at kernel/qeventdispatcher_glib.cpp:224 #22 0x00007f584cdcfd57 in g_main_context_dispatch () at /usr/lib/libglib-2.0.so.0 #23 0x00007f584cdcffb0 in () at /usr/lib/libglib-2.0.so.0 #24 0x00007f584cdd005c in g_main_context_iteration () at /usr/lib/libglib-2.0.so.0 #25 0x00007f5851489a3f in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) (this=0x9a7040, flags=...) at kernel/qeventdispatcher_glib.cpp:418 #26 0x00007f5851432c5a in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) (this=this@entry=0x7ffdbffd6e40, flags=..., flags@entry=...) at kernel/qeventloop.cpp:204 #27 0x00007f585143acfc in QCoreApplication::exec() () at kernel/qcoreapplication.cpp:1229 #28 0x000000000042ed66 in main () *** Bug 91449 has been marked as a duplicate of this bug. *** I reproduced this bug on my SNB laptop and Mesa master (HEAD 7850774). I found that just before crashing at brw_meta_fast_clear.c:451, it prints out the following error: Failed to open BO for returned DRI2 buffer (1600x900, dri2 back buffer, named 11). This is likely a bug in the X Server that will lead to a crash soon. Which is printed at intel_process_dri2_buffer() when drm_intel_bo_gem_create_from_name() returns a NULL pointer. I added some traces to that function at libdrm and found that drmIoctl(bufmgr_gem->fd, DRM_IOCTL_GEM_OPEN, &open_arg) is returning an error, so this bug seems to be produced by the kernel driver. My distro is Debian Jessie with a Linux kernel 3.14. Hope this helps. (In reply to Samuel Iglesias from comment #19) > I reproduced this bug on my SNB laptop and Mesa master (HEAD 7850774). I > found that just before crashing at brw_meta_fast_clear.c:451, it prints out > the following error: > > Failed to open BO for returned DRI2 buffer (1600x900, dri2 back buffer, > named 11). > This is likely a bug in the X Server that will lead to a crash soon. > > Which is printed at intel_process_dri2_buffer() when > drm_intel_bo_gem_create_from_name() returns a NULL pointer. Forgot to mention that when this error message is printed out, we return from intel_process_dri2_buffer() but the miptree was free'd before by intel_miptree_release(&rb->mt) call. Later on, we try to dereference it at brw_meta_fast_clear.c:451 and segmentation fault happens. > I added some > traces to that function at libdrm and found that drmIoctl(bufmgr_gem->fd, > DRM_IOCTL_GEM_OPEN, &open_arg) is returning an error, so this bug seems to > be produced by the kernel driver. > > My distro is Debian Jessie with a Linux kernel 3.14. > > Hope this helps. Good job tracking it further! I kind of gave up, especially after figuring out the workaround by choosing UXA as AccelMethod. I'm not sure exactly what the proper behavior should be here (should everything check if the mipmap tree is null?), which makes it harder for me to get started on a patch. Trying to trace the failing path in the kernel is a bit beyond my skills and what I have time for at the moment Martin, which xf86-video-intel version are you using? Chris, any idea what would this affect? I currently have xf86-video-intel 1:2.99.917+381+g5772556-1. I can reproduce this crash on a Broadwell laptop running Arch, both in mpv with vo=gl and in other OpenGL applications. mesa 10.6.3-1 xf86-video-intel 1:2.99.917+381+g5772556-1 I have not been able to reproduce the issue with AccelMethod UXA. However, UXA method on this system does not perform at an adequate level. I have not been able to reproduce the issue with DRI3. I have observed no performance issues with DRI3, so this can be an alternative workaround if you experience low performance on UXA. I am now actually getting this crash a lot of times per day, it's crashing plasmashell all the time. I'm not sure when it started happening but I didn't have any issues before. I'm guessing it started after an update to xf86-video-intel. xf86-video-intel 1:2.99.917+381+g5772556-1 mesa 10.6.3 Many people running Plasma 5 seem to be affected: https://www.reddit.com/r/linux/comments/3f5567/the_intel_driver_bug_that_crashes_plasma5_and/ (In reply to Samuel Iglesias from comment #20) > (In reply to Samuel Iglesias from comment #19) > > I added some > > traces to that function at libdrm and found that drmIoctl(bufmgr_gem->fd, > > DRM_IOCTL_GEM_OPEN, &open_arg) is returning an error, so this bug seems to > > be produced by the kernel driver. > > > > My distro is Debian Jessie with a Linux kernel 3.14. Just my 5 cents: I had several semi-rare crashes in plasma/kickoff-tooltips a long time ago on suse tumbleweed, but it's stable for a long time now, ~since 3.19 kernel. As per this comment, this may be an old bug in the kernel, so you should probably test with new *kernels*, not mesa. People on Arch Linux are using kernel 4.1.4 at the moment, so it's definitely not an issue that's solved with new kernel. I'm quite confident the problem is in mesa. I was told on IRC earlier today that someone from Intel said the bug was fixed in git, but I just tested with 30a7e0c021c3a77c20c6f041dc80b7dc90ad238f, and the first thing that happened when I logged in was this crash. Just a few comments in case it helps. I run Manjaro KDE - currently Plasma 5.3.2-2 64 bit. I started seeing these segmentation issues about a month ago - seemed to coincide with a Plasma update so initially blamed that. Was also running an older kernel (3.18), and wondered about that to. Recently switched to kernel 4.1 - this made no difference. As suggested by others, switching to UXA mode solved the problem, but graphics performance (Haswell i5) was unacceptable. Last night started progressively downgrading mesa and xf86-video-intel in various combinations. Cross fingers I have stable behavior now - having dropped to: mesa 10.5.7-1 xf86-video-intel 2.99.917-5. My testing wasn't extensive, but switching to either of these on their own did not seem to solve the problem - only downgrading both packages. (In reply to Martin Sandsmark from comment #28) > I was told on IRC earlier today that someone from Intel said the bug was > fixed in git, but I just tested with > 30a7e0c021c3a77c20c6f041dc80b7dc90ad238f, and the first thing that happened > when I logged in was this crash. He probably meant the git from xf86-video-intel. I've started using mesa-git and xf86-video-intel-git today, I haven't had any crashes so far. There is no commit to xf86-drv-intel since July 31, but today or yesterday a log of commits happened in mesa git. With kernel-4.2rc5/mesa git/xf86-drv-intel git, I can confirm there is no such crash when enable dri3 in driver. I just update the mesa git and switch driver back to dri2 to verify it's fixed or not. (In reply to Martin Sandsmark from comment #28) > I was told on IRC earlier today that someone from Intel said the bug was > fixed in git, but I just tested with > 30a7e0c021c3a77c20c6f041dc80b7dc90ad238f, and the first thing that happened > when I logged in was this crash. Bug not fixed, when enabling dri2 of driver, with kernel-4.2rc5/mesa git head/xf86-drv-video git head/drm git head. Hardware is "Thinkpad x250" with "00:02.0 VGA compatible controller: Intel Corporation Broadwell-U Integrated Graphics (rev 09)"(00:02.0 0300: 8086:1616 (rev 09)). There is still no easy way to reproduce it, generally, I started ltp test and clicked kickoff menu or some popups. after serveral minutes, segfault happened. Thread 1 (Thread 0x7f55086dd840 (LWP 983)): [KCrash Handler] #6 0x00007f54f7af19dc in brw_meta_fast_clear (brw=brw@entry=0x1c69aa8, fb=fb@entry=0x65868a0, buffers=buffers@entry=34, partial_clear=partial_clear@entry=false) at brw_meta_fast_clear.c:451 #7 0x00007f54f7a8d73c in brw_clear (ctx=0x1c69aa8, mask=34) at brw_clear.c:247 #8 0x0000003b56527676 in QSGBatchRenderer::Renderer::renderBatches() () at /lib/libQt5Quick.so.5 #9 0x0000003b5652cdb2 in QSGBatchRenderer::Renderer::render() () at /lib/libQt5Quick.so.5 #10 0x0000003b565386af in QSGRenderer::renderScene(QSGBindable const&) () at /lib/libQt5Quick.so.5 #11 0x0000003b56538edb in QSGRenderer::renderScene(unsigned int) () at /lib/libQt5Quick.so.5 #12 0x0000003b56548a7e in QSGRenderContext::renderNextFrame(QSGRenderer*, unsigned int) () at /lib/libQt5Quick.so.5 #13 0x0000003b5659009b in QQuickWindowPrivate::renderSceneGraph(QSize const&) () at /lib/libQt5Quick.so.5 #14 0x0000003b56562933 in () at /lib/libQt5Quick.so.5 #15 0x0000003b56563a31 in () at /lib/libQt5Quick.so.5 #16 0x0000003b4f59632c in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /lib/libQt5Widgets.so.5 #17 0x0000003b4f59b446 in QApplication::notify(QObject*, QEvent*) () at /lib/libQt5Widgets.so.5 #18 0x00000035b8490b73 in QCoreApplication::notifyInternal(QObject*, QEvent*) () at /lib/libQt5Core.so.5 #19 0x00000035b84e321d in QTimerInfoList::activateTimers() () at /lib/libQt5Core.so.5 #20 0x00000035b84e3769 in () at /lib/libQt5Core.so.5 #21 0x00000038dec48847 in g_main_context_dispatch () at /lib/libglib-2.0.so.0 #22 0x00000038dec48a78 in () at /lib/libglib-2.0.so.0 #23 0x00000038dec48b1c in g_main_context_iteration () at /lib/libglib-2.0.so.0 #24 0x00000035b84e436f in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/libQt5Core.so.5 #25 0x00000035b848e6da in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/libQt5Core.so.5 #26 0x00000035b849624d in QCoreApplication::exec() () at /lib/libQt5Core.so.5 #27 0x000000000042e8fd in main () I can confirm that with mesa-git and xf86-video-intel-git it still crashes, I had the issue again today. I looked at the stuff Samuel found out and went to look at the libdrm git, where I found a couple of commits related to passing buffer alignment restrictions to the kernel, which I thought looked plausibly relevant to my layman eyes. So I installed libdrm-git yesterday, and I haven't had a crash in almost two days. It's hard to definitely say if it is fixed because of the spurious nature of the bug, but it looks good so far (if I'm not mistaken there could also be several reasons for the miptree to not be allocated, though, so it's not given that all crashes with this backtrace are related). Never mind, crashed again. Back to square one. (And sorry for all the noise here...) I looked in the user journal, and found first this: aug. 08 01:40:07 neruval kernel: [drm:gen8_irq_handler [i915]] *ERROR* The master control interrupt lied (SDE)! Then it seems like kscreen (the display handler for Plasma) tried to launch another instance of itself (which fails, because it is already running), and then receives some weird notifications about the displays changing: aug. 08 01:40:07 neruval kscreen_backend_launcher[28810]: kscreen.xrandr: Connected output 67 to CRTC 63 aug. 08 01:40:07 neruval kscreen_backend_launcher[28810]: kscreen.xcb.helper: Detected XRandR 1.4 aug. 08 01:40:07 neruval kscreen_backend_launcher[28810]: kscreen.xcb.helper: Event Base: 89 aug. 08 01:40:07 neruval kscreen_backend_launcher[28810]: kscreen.xcb.helper: Event Error: 147 aug. 08 01:40:07 neruval kscreen_backend_launcher[28810]: kscreen.backendLauncher: Loading "XRandR" backend aug. 08 01:40:07 neruval kscreen_backend_launcher[28810]: kscreen.backendLauncher: Failed to register as DBus service: another launcher already running? aug. 08 01:40:07 neruval kscreen_backend_launcher[28810]: kscreen.backendLauncher: "" aug. 08 01:40:07 neruval kscreen_backend_launcher[10198]: kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" ) to KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" ) aug. 08 01:40:07 neruval kscreen_backend_launcher[10198]: kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" ) to KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" ) aug. 08 01:40:07 neruval kscreen_backend_launcher[10198]: kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" ) to KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" ) aug. 08 01:40:07 neruval kscreen_backend_launcher[10198]: kscreen: Primary output changed from KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" ) to KScreen::Output(Id: 67 , Name: "eDP1" ) ( "eDP1" ) And a couple of seconds after that, Plasma crashes: aug. 08 01:40:09 neruval drkonqi[28805]: Sending SIGSTOP to process My theory from these logs was that because of these weird display changes Plasma resizes itself to some really weird sizes, which leads to mesa trying to allocate some really weird BOs, which the kernel fails to allocate, and then things crash. I added some debug output to libdrm, to see what errors the kernel was actually returning: ================= DRMIOCTL FAILED 22: Invalid argument ================= DRMIOCTL FAILED 1: Operation not permitted ================= DRMIOCTL FAILED 22: Invalid argument The invalid argument error I think might support my theory. I forgot to log the actual request, but I'll do that now, hopefully it will help to track this further. If it is the error from the IRQ handler that causes this, I'm even more out of my depth, though. (In reply to Martin Sandsmark from comment #36) > I looked in the user journal, and found first this: > > aug. 08 01:40:07 neruval kernel: [drm:gen8_irq_handler [i915]] *ERROR* The > master control interrupt lied (SDE)! > > Then it seems like kscreen (the display handler for Plasma) tried to launch > another instance of itself (which fails, because it is already running), and > then receives some weird notifications about the displays changing: I never got such outputs when crash happened. It seems the crash only happened when high workload, but I am not sure about that. The bios I used also had some problems(I guess) with VTD and iommu, I also not sure is this issue relate to bios or not. You can have a try to build xf86-video-intel with "--with-default-dri=3". at least, after default to dri3, there is no crash here. I have probably the same issue, the backtrace on the crash is a bit different, but my card might not have fast_clear. I have the Failed to open BO for returned DRI2 buffer message before the crash. And it doesn't happen with uxa or sna+dri3. I have an apitrace trace that makes it happen for me in 4 out of 5 times. run with (after decompressing) $ glretrace replay test_case.trace It's 14MB so I uploaded it on an external site. (https://mega.co.nz/#!O0MHRByI!z7GywnuO8Ai_9633pmj5FY8ejzvKYBsDAri1lXFSXN0) Software versions: ------------------ Linux 4.1.4 (Archlinux) mesa 10.6.3 libdrm 2.4.62+106+gc8df9e7-1 (git master at the time of writing) xf86-video-intel 2.99.917+426+g611ec7d-1 (git master) Hack to fix it in gdb ---------------------- After drm_intel_bo_gem_create_from_name fails(brw_context.c:1421), if I run (from gdb) getBuffersWithFormat and drm_intel_bo_gem_create_for_name again then it succeds. (This is what has to be done when dri2 buffers got invalidated) Backtrace when drm_intel_bo_gem_create_from_name fails ------------------------------------------------------ #0 intel_process_dri2_buffer (buffer_name=0x7ffff21ddf75 "dri2 back buffer", rb=0xaf2cb0, buffer=0xaf2b50, drawable=0xa76ad0, brw=0x7ffff7fd1038) at brw_context.c:1423 #1 intel_update_dri2_buffers (drawable=0xa76ad0, brw=0x7ffff7fd1038) at brw_context.c:1226 #2 intel_update_renderbuffers (context=context@entry=0xbaea90, drawable=drawable@entry=0xa76ad0) at brw_context.c:1248 #3 0x00007ffff203d5b1 in intel_prepare_render (brw=brw@entry=0x7ffff7fd1038) at brw_context.c:1267 #4 0x00007ffff2031290 in brw_clear (ctx=0x7ffff7fd1038, mask=18) at brw_clear.c:234 #5 0x00000000004e8c96 in ?? () #6 0x000000000040bd1d in ?? () #7 0x000000000040c37c in ?? () #8 0x0000000000407b05 in ?? () #9 0x00007ffff61e0790 in __libc_start_main () from /usr/lib/libc.so.6 #10 0x00000000004095e9 in _start () This patch should be a good workaround: http://cgit.freedesktop.org/~ickle/mesa/commit/?h=brw-batch&id=e2a696a4cd93c2dbe445243de48ed478fbdb8009 I will test it tonight on my home machine and see if I can reproduce it. The patch may make the screen flicker for a frame instead of crashing. The actual problem is a race condition of DRI2 that is not trivial to fix. Created attachment 117626 [details]
Description of problem, test case.
(In reply to Martin Peres from comment #39) > This patch should be a good workaround: > http://cgit.freedesktop.org/~ickle/mesa/commit/?h=brw- > batch&id=e2a696a4cd93c2dbe445243de48ed478fbdb8009 > > I will test it tonight on my home machine and see if I can reproduce it. The > patch may make the screen flicker for a frame instead of crashing. The > actual problem is a race condition of DRI2 that is not trivial to fix. This patch seems to solve the brw_meta_fast_clear crashes I've had in Firefox and Steam since Mesa 10.6. Cheers! (In reply to Jan Alexander Steffens (heftig) from comment #41) > (In reply to Martin Peres from comment #39) > > This patch should be a good workaround: > > http://cgit.freedesktop.org/~ickle/mesa/commit/?h=brw- > > batch&id=e2a696a4cd93c2dbe445243de48ed478fbdb8009 > > > > I will test it tonight on my home machine and see if I can reproduce it. The > > patch may make the screen flicker for a frame instead of crashing. The > > actual problem is a race condition of DRI2 that is not trivial to fix. > > This patch seems to solve the brw_meta_fast_clear crashes I've had in > Firefox and Steam since Mesa 10.6. Cheers! Same at home. No crashes to report since I applied the patch. I will review the patch and push it upstream tomorrow. I'd rather a complete fix merged into upstream. Workaround should be applied by user-patched builds. (In reply to Tatsuyuki Ishi from comment #43) > I'd rather a complete fix merged into upstream. Workaround should be applied > by user-patched builds. The complete fix requires fixing the xserver, the ddx and mesa. This is in the pipe but it is not coming any time soon. In the mean time, let's avoid crashing if we can. The code will still be useful when everything is fixed! Has the patch been included in mesa git? Reported also in <https://bugs.launchpad.net/ubuntu/+source/mesa/+bug/1492037>. Fedora bugreport of this bug (linked to Freedesktop bugreport too) https://bugzilla.redhat.com/show_bug.cgi?id=1259443 Anything new on merging the patch? (In reply to AnAkkk from comment #48) > Anything new on merging the patch? Done! (In reply to Martin Peres from comment #49) > (In reply to AnAkkk from comment #48) > > Anything new on merging the patch? > > Done! commit 70e91d61fde239e8ae58148cacd4ff891126e2aa Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Aug 7 21:13:12 2015 +0100 i965: Remove early release of DRI2 miptree As said above this has long been fixed: commit 70e91d61fde239e8ae58148cacd4ff891126e2aa Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Aug 7 21:13:12 2015 +0100 i965: Remove early release of DRI2 miptree Closing. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.