Created attachment 108298 [details] frequent crash of i965 driver: intel_do_flush_locked failed: Input/output error after for example: glmark2 -b ideas always my i965 'crashes'. It returns at leas to the command line but the driver stops 'hardware' 3D functions. intel_do_flush_locked failed: Input/output error (Kernel 3.18, newest driverstack) [30472.820066] [drm] stuck on render ring [30472.820965] [drm] GPU HANG: ecode 0:0x874df8fe, in glmark2-es2 [11103], reason: Ring hung, action: reset [30472.820967] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [30472.820968] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [30472.820969] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [30472.820971] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [30472.820972] [drm] GPU crash dump saved to /sys/class/drm/card0/error [30472.821069] [drm:i915_reset] *ERROR* Failed to reset chip: -19 [31393.405354] es2gears[11313]: segfault at 3 ip 00007f6be38438c0 sp 00007fff82898340 error 4 in egl_gallium.so[7f6be36ca000+80c000] [31648.328773] es2gears[11497]: segfault at 3 ip 00007f233f1fa8c0 sp 00007fff765c93e0 error 4 in egl_gallium.so[7f233f081000+80c000] attaced /sys/class/drm/card0/error maybe this helps
Certainly similar to bus: 85367 DRI DRM/Inte intel-gfx-bugs@lists.freede... NEW --- frequent crash of i965 driver(intel_do_flush_locked failed: Input/output error) 13:48:43 76554 DRI DRM/Inte chris@chris-wilson.co.uk REOP --- [gm45] [drm:init_ring_common]: *ERROR* render ring initialization failed 23:17:40 84557 Mesa Drivers/ mattst88@gmail.com NEW --- [HSW] "Emit ELSE/ENDIF JIP with type D on Gen 7" causes Atomic Afterlife and GPU hangs 2014-10-03 51722 Mesa Drivers/ idr@freedesktop.org NEW --- [ILK] GPU hang in assaultcube 2014-09-21 71759 Mesa Drivers/ idr@freedesktop.org NEW --- Intel driver fails with "intel_do_flush_locked failed: No such file or directory" if buffer imported with EGL_NATIVE_PIXMAP_KHR 2014-09-15 41736 Mesa Drivers/ intel-3d-bugs@lists.freedes... NEW --- mesa xdemo manywin aborts with intel_do_flush_locked error 2014-08-11 52939 Mesa Drivers/ intel-3d-bugs@lists.freedes... NEW --- [snb] death by blorp while playing Psychonauts 2014-08-11 81578 Mesa Drivers/ idr@freedesktop.org NEW --- intel_do_flush_locked when trying to use clEnqueueAcquireGLObjects 2014-07-20 80233 Mesa Drivers/ dri-devel@lists.freedesktop... NEW --- DirectX wine crush 2014-06-19 47236 Mesa Drivers/ idr@freedesktop.org NEED --- Crashes when using EGL and GLES2 from multiple threads 2014-03-17 74993 Mesa Drivers/ idr@freedesktop.org NEED --- Firefox 'exit' on i915 ... 2014-02-14 61386 Mesa Drivers/ dri-devel@lists.freedesktop... NEW --- i855 GPU hang with AoE2 under Wine 2013-02-25 53348
The 3.18.0-031800rc1-genericKernel gives some problem Hints: [ 55.273712] CUSE: failed to register chrdev region [ 55.608771] [drm:intel_pipe_config_compare] *ERROR* mismatch in pipe_src_w (expected 0, found 4096) [ 55.608774] ------------[ cut here ]------------ [ 55.608820] WARNING: CPU: 1 PID: 1522 at /home/apw/COD/linux/drivers/gpu/drm/i915/intel_display.c:10969 check_crtc_state+0x291/0x380 [i915]() [ 55.608822] pipe state doesn't match! [ 55.608824] Modules linked in: snd_hrtimer zram lz4_compress rfcom..isdhci sky2 [ 55.608880] CPU: 1 PID: 1522 Comm: Xorg Tainted: G OE 3.18.0-031800rc1-generic #201410192135 [ 55.608882] Hardware name: FUJITSU SIEMENS LIFEBOOK T4220/FJNB1D4, BIOS Version 1.18 02/23/2009 [ 55.608884] 0000000000002ad9 ffff88021760f8a8 ffffffff817a1613 0000000000000007 [ 55.608888] ffff88021760f8f8 ffff88021760f8e8 ffffffff81074cfc ffff88021760f918 [ 55.608891] ffff880220fa8000 ffff8800ca0b0b38 ffff8800ca0b0800 ffff880220fa8708 [ 55.608894] Call Trace: [ 55.608902] [<ffffffff817a1613>] dump_stack+0x46/0x58 [ 55.608907] [<ffffffff81074cfc>] warn_slowpath_common+0x8c/0xc0 [ 55.608910] [<ffffffff81074de6>] warn_slowpath_fmt+0x46/0x50 [ 55.608938] [<ffffffffc08a256d>] ? intel_lvds_get_config+0x4d/0xf0 [i915] [ 55.608962] [<ffffffffc086e2d1>] check_crtc_state+0x291/0x380 [i915] [ 55.608989] [<ffffffffc087e8f5>] intel_modeset_check_state+0x65/0xa0 [i915] [ 55.609014] [<ffffffffc087e955>] intel_set_mode+0x25/0x30 [i915] [ 55.609039] [<ffffffffc087f446>] intel_crtc_set_config+0x1e6/0x370 [i915] [ 55.609044] [<ffffffff817acda6>] ? mutex_lock+0x16/0x37 [ 55.609065] [<ffffffffc05ddf90>] drm_mode_set_config_internal+0x60/0x100 [drm] [ 55.609080] [<ffffffffc05e1aa0>] drm_mode_setcrtc+0x290/0x4e0 [drm] [ 55.609092] [<ffffffffc05d2e46>] drm_ioctl+0x2e6/0x590 [drm] [ 55.609107] [<ffffffffc05e1810>] ? drm_mode_setplane+0x240/0x240 [drm] [ 55.609111] [<ffffffff81201c85>] do_vfs_ioctl+0x75/0x2c0 [ 55.609116] [<ffffffff8120c2a5>] ? __fget_light+0x25/0x70 [ 55.609119] [<ffffffff81201f61>] SyS_ioctl+0x91/0xb0 [ 55.609123] [<ffffffff817aef6d>] system_call_fastpath+0x16/0x1b [ 55.609125] ---[ end trace b05afc4c96235de3 ]--- [ 55.610032] [drm:drm_calc_timestamping_constants] *ERROR* crtc 11: Can't calculate constants, dotclock = 0! [ 55.610147] [drm:i9xx_crtc_mode_set] *ERROR* Couldn't find PLL settings for mode! [ 55.618765] [drm:intel_pipe_config_compare] *ERROR* mismatch in pipe_src_w (expected 0, found 4096) [ 55.618767] ------------[ cut here ]------------ [ 55.618790] WARNING: CPU: 1 PID: 1522 at /home/apw/COD/linux/drivers/gpu/drm/i915/intel_display.c:10969 check_crtc_state+0x291/0x380 [i915]() [ 55.618791] pipe state doesn't match! [ 55.618792] Modules linked in: snd_hrtimer zram lz4_compress rfcomm bnep binfmt_misc wacom_w8001 coretemp kvm_intel arc4 serport kvm pcmcia snd_hda_codec_realtek snd_hda_codec_generic joydev snd_hda_intel yenta_socket snd_hda_controller serio_raw pcmcia_rsrc iwl4965 snd_hda_codec pcmcia_core snd_hwdep snd_pcm iwlegacy i915 mac80211 snd_seq_midi snd_seq_midi_event irda cfg80211 drm_kms_helper snd_rawmidi snd_seq drm lpc_ich btusb snd_seq_device snd_timer crc_ccitt fujitsu_laptop fujitsu_tablet i2c_algo_bit snd bluetooth video soundcore tpm_infineon shpchp cuse parport_pc ppdev mac_hid lp parport btrfs xor raid6_pq mmc_block hid_generic usbhid hid psmouse ahci libahci pata_acpi sdhci_pci sdhci sky2 [ 55.618830] CPU: 1 PID: 1522 Comm: Xorg Tainted: G W OE 3.18.0-031800rc1-generic #201410192135 [ 55.618832] Hardware name: FUJITSU SIEMENS LIFEBOOK T4220/FJNB1D4, BIOS Version 1.18 02/23/2009 [ 55.618833] 0000000000002ad9 ffff88021760f8a8 ffffffff817a1613 0000000000000007 [ 55.618836] ffff88021760f8f8 ffff88021760f8e8 ffffffff81074cfc ffff88021760f918 [ 55.618838] ffff880220fa8000 ffff8800ca0b0b38 ffff8800ca0b0800 ffff880220fa8708 [ 55.618840] Call Trace: [ 55.618844] [<ffffffff817a1613>] dump_stack+0x46/0x58 [ 55.618847] [<ffffffff81074cfc>] warn_slowpath_common+0x8c/0xc0 [ 55.618849] [<ffffffff81074de6>] warn_slowpath_fmt+0x46/0x50 [ 55.618870] [<ffffffffc08a256d>] ? intel_lvds_get_config+0x4d/0xf0 [i915] [ 55.618894] [<ffffffffc086e2d1>] check_crtc_state+0x291/0x380 [i915] [ 55.618913] [<ffffffffc087e8f5>] intel_modeset_check_state+0x65/0xa0 [i915] [ 55.618931] [<ffffffffc087e955>] intel_set_mode+0x25/0x30 [i915] [ 55.618949] [<ffffffffc087f4a4>] intel_crtc_set_config+0x244/0x370 [i915] [ 55.618952] [<ffffffff817acda6>] ? mutex_lock+0x16/0x37 [ 55.618964] [<ffffffffc05ddf90>] drm_mode_set_config_internal+0x60/0x100 [drm] [ 55.618974] [<ffffffffc05e1aa0>] drm_mode_setcrtc+0x290/0x4e0 [drm] [ 55.618982] [<ffffffffc05d2e46>] drm_ioctl+0x2e6/0x590 [drm] [ 55.618993] [<ffffffffc05e1810>] ? drm_mode_setplane+0x240/0x240 [drm] [ 55.618996] [<ffffffff811f0f60>] ? __fput+0x170/0x250 [ 55.618998] [<ffffffff81201c85>] do_vfs_ioctl+0x75/0x2c0 [ 55.619001] [<ffffffff81091d7c>] ? task_work_run+0xac/0xe0 [ 55.619003] [<ffffffff8120c2a5>] ? __fget_light+0x25/0x70 [ 55.619005] [<ffffffff81201f61>] SyS_ioctl+0x91/0xb0 [ 55.619008] [<ffffffff817aef6d>] system_call_fastpath+0x16/0x1b [ 55.619009] ---[ end trace b05afc4c96235de4 ]--- [ 55.619177] [drm:i965_irq_handler] *ERROR* pipe A underrun [ 56.196131] [drm:i9xx_check_fifo_underruns] *ERROR* pipe A underrun [ 56.701636] systemd-logind[1135]: Failed to start unit user@112.service: Unknown unit: user@112.service [ 56.701643] systemd-logind[1135]: Failed to start user service: Unknown unit: user@112.service [ 56.707358] systemd-logind[1135]: New session c1 of user lightdm. [ 56.707380] systemd-logind[1135]: Linked /tmp/.X11-unix/X0 to /run/user/112/X11-display.
Does it work if you run: always_flush_cache=true glmark2 -b ideas If not, does it work if you run: always_flush_batch=true glmark2 -b ideas
(In reply to Kenneth Graunke from comment #3) > Does it work if you run: > > always_flush_cache=true glmark2 -b ideas > > If not, does it work if you run: > > always_flush_batch=true glmark2 -b ideas For me using "always_flush_cache=true" works (glmark2 finishes without lockup). I'm using kernel-3.17.8-300.fc21.x86_64, mesa 10.4.1, xorg server 1.16.2.901 and xorg-x11-drv-intel-2.99.916-3.20141117.fc21.x86_64.
Created attachment 112383 [details] Trimmed apitrace which reproduces the hang (344 GL calls) Still no idea what's going on. I managed to trim down an apitrace to a mere 344 GL calls, and only two draw calls...still reproduces the issue. Either draw call appears to work fine by itself - you apparently have to do them together.
The draws are: 341 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 0, count = 18) 342 glUniformMatrix4fv(...) 343 glVertexAttribPointer(...) 344 glDrawElements(mode = GL_TRIANGLE_STRIP, count = 18, type = GL_UNSIGNED_SHORT, indices = NULL) Removing the glUniformMatrix4fv call between the two makes the hang disappear. This corresponds to removing the CONSTANT_BUFFER packet between the two 3DPRIMITIVE packets...
I can't find anything indicating what we're doing wrong, but today I committed a workaround for the problem. It should be fixed in master with: commit c4fd0c9052dd391d6f2e9bb8e6da209dfc7ef35b Author: Kenneth Graunke <kenneth@whitecape.org> Date: Sat Jan 17 23:21:15 2015 -0800 i965: Work around mysterious Gen4 GPU hangs with minimal state changes. Gen4 hardware appears to GPU hang frequently when using Chromium, and also when running 'glmark2 -b ideas'. Most of the error states contain 3DPRIMITIVE commands in quick succession, with very few state packets between them - usually VERTEX_BUFFERS/ELEMENTS and CONSTANT_BUFFER. I trimmed an apitrace of the glmark2 hang down to two draw calls with a glUniformMatrix4fv call between the two. Either draw by itself works fine, but together, they hang the GPU. Removing the glUniform call makes the hangs disappear. In the hardware state, this translates to removing the CONSTANT_BUFFER packet between the two 3DPRIMITIVE packets. Flushing before emitting CONSTANT_BUFFER packets also appears to make the hangs disappear. I observed a slowdown in glxgears by doing it all the time, so I've chosen to only do it when BRW_NEW_BATCH and BRW_NEW_PSP are unset (i.e. we haven't done a CS_URB_STATE change or already flushed the whole pipeline). I'd much rather understand the problem, but at this point, I don't see how we'd ever be able to track it down further. We have no real tools, and the hardware people moved on years ago. I've analyzed 20+ error states and read every scrap of documentation I could find. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80568 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85367 Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Matt Turner <mattst88@gmail.com> Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
*** Bug 73699 has been marked as a duplicate of this bug. ***
*** Bug 89706 has been marked as a duplicate of this bug. ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.