Bug 85367 - [gen4] GPU hang in glmark-es2
Summary: [gen4] GPU hang in glmark-es2
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
: 73699 89706 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-10-23 13:48 UTC by Andreas
Modified: 2015-03-21 23:48 UTC (History)
5 users (show)

See Also:
i915 platform:
i915 features:


Attachments
frequent crash of i965 driver: intel_do_flush_locked failed: Input/output error (846.20 KB, text/plain)
2014-10-23 13:48 UTC, Andreas
Details
Trimmed apitrace which reproduces the hang (344 GL calls) (57.83 KB, text/plain)
2015-01-17 11:26 UTC, Kenneth Graunke
Details

Description Andreas 2014-10-23 13:48:43 UTC
Created attachment 108298 [details]
frequent crash of i965 driver: intel_do_flush_locked failed: Input/output error

after for example:

glmark2 -b ideas

always my i965 'crashes'.


It returns at leas to the command line but the driver
stops 'hardware' 3D functions.

intel_do_flush_locked failed: Input/output error

(Kernel 3.18, newest driverstack)


[30472.820066] [drm] stuck on render ring
[30472.820965] [drm] GPU HANG: ecode 0:0x874df8fe, in glmark2-es2 [11103], reason: Ring hung, action: reset
[30472.820967] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[30472.820968] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[30472.820969] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[30472.820971] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[30472.820972] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[30472.821069] [drm:i915_reset] *ERROR* Failed to reset chip: -19
[31393.405354] es2gears[11313]: segfault at 3 ip 00007f6be38438c0 sp 00007fff82898340 error 4 in egl_gallium.so[7f6be36ca000+80c000]
[31648.328773] es2gears[11497]: segfault at 3 ip 00007f233f1fa8c0 sp 00007fff765c93e0 error 4 in egl_gallium.so[7f233f081000+80c000]

attaced /sys/class/drm/card0/error

maybe this helps
Comment 1 Andreas 2014-10-23 14:03:25 UTC
Certainly similar to bus:
85367	DRI	DRM/Inte	intel-gfx-bugs@lists.freede...	NEW	---	frequent crash of i965 driver(intel_do_flush_locked failed: Input/output error)	13:48:43
76554	DRI	DRM/Inte	chris@chris-wilson.co.uk	REOP	---	[gm45] [drm:init_ring_common]: *ERROR* render ring initialization failed	23:17:40
84557	Mesa	Drivers/	mattst88@gmail.com	NEW	---	[HSW] "Emit ELSE/ENDIF JIP with type D on Gen 7" causes Atomic Afterlife and GPU hangs	2014-10-03
51722	Mesa	Drivers/	idr@freedesktop.org	NEW	---	[ILK] GPU hang in assaultcube	2014-09-21
71759	Mesa	Drivers/	idr@freedesktop.org	NEW	---	Intel driver fails with "intel_do_flush_locked failed: No such file or directory" if buffer imported with EGL_NATIVE_PIXMAP_KHR	2014-09-15
41736	Mesa	Drivers/	intel-3d-bugs@lists.freedes...	NEW	---	mesa xdemo manywin aborts with intel_do_flush_locked error	2014-08-11
52939	Mesa	Drivers/	intel-3d-bugs@lists.freedes...	NEW	---	[snb] death by blorp while playing Psychonauts	2014-08-11
81578	Mesa	Drivers/	idr@freedesktop.org	NEW	---	intel_do_flush_locked when trying to use clEnqueueAcquireGLObjects	2014-07-20
80233	Mesa	Drivers/	dri-devel@lists.freedesktop...	NEW	---	DirectX wine crush	2014-06-19
47236	Mesa	Drivers/	idr@freedesktop.org	NEED	---	Crashes when using EGL and GLES2 from multiple threads	2014-03-17
74993	Mesa	Drivers/	idr@freedesktop.org	NEED	---	Firefox 'exit' on i915 ...	2014-02-14
61386	Mesa	Drivers/	dri-devel@lists.freedesktop...	NEW	---	i855 GPU hang with AoE2 under Wine	2013-02-25
53348
Comment 2 Andreas 2014-10-24 23:57:38 UTC
The 3.18.0-031800rc1-genericKernel gives some problem Hints:


[   55.273712] CUSE: failed to register chrdev region
[   55.608771] [drm:intel_pipe_config_compare] *ERROR* mismatch in pipe_src_w (expected 0, found 4096)
[   55.608774] ------------[ cut here ]------------
[   55.608820] WARNING: CPU: 1 PID: 1522 at /home/apw/COD/linux/drivers/gpu/drm/i915/intel_display.c:10969 check_crtc_state+0x291/0x380 [i915]()
[   55.608822] pipe state doesn't match!
[   55.608824] Modules linked in: snd_hrtimer zram lz4_compress rfcom..isdhci sky2
[   55.608880] CPU: 1 PID: 1522 Comm: Xorg Tainted: G           OE  3.18.0-031800rc1-generic #201410192135
[   55.608882] Hardware name: FUJITSU SIEMENS LIFEBOOK T4220/FJNB1D4, BIOS Version 1.18  02/23/2009
[   55.608884]  0000000000002ad9 ffff88021760f8a8 ffffffff817a1613 0000000000000007
[   55.608888]  ffff88021760f8f8 ffff88021760f8e8 ffffffff81074cfc ffff88021760f918
[   55.608891]  ffff880220fa8000 ffff8800ca0b0b38 ffff8800ca0b0800 ffff880220fa8708
[   55.608894] Call Trace:
[   55.608902]  [<ffffffff817a1613>] dump_stack+0x46/0x58
[   55.608907]  [<ffffffff81074cfc>] warn_slowpath_common+0x8c/0xc0
[   55.608910]  [<ffffffff81074de6>] warn_slowpath_fmt+0x46/0x50
[   55.608938]  [<ffffffffc08a256d>] ? intel_lvds_get_config+0x4d/0xf0 [i915]
[   55.608962]  [<ffffffffc086e2d1>] check_crtc_state+0x291/0x380 [i915]
[   55.608989]  [<ffffffffc087e8f5>] intel_modeset_check_state+0x65/0xa0 [i915]
[   55.609014]  [<ffffffffc087e955>] intel_set_mode+0x25/0x30 [i915]
[   55.609039]  [<ffffffffc087f446>] intel_crtc_set_config+0x1e6/0x370 [i915]
[   55.609044]  [<ffffffff817acda6>] ? mutex_lock+0x16/0x37
[   55.609065]  [<ffffffffc05ddf90>] drm_mode_set_config_internal+0x60/0x100 [drm]
[   55.609080]  [<ffffffffc05e1aa0>] drm_mode_setcrtc+0x290/0x4e0 [drm]
[   55.609092]  [<ffffffffc05d2e46>] drm_ioctl+0x2e6/0x590 [drm]
[   55.609107]  [<ffffffffc05e1810>] ? drm_mode_setplane+0x240/0x240 [drm]
[   55.609111]  [<ffffffff81201c85>] do_vfs_ioctl+0x75/0x2c0
[   55.609116]  [<ffffffff8120c2a5>] ? __fget_light+0x25/0x70
[   55.609119]  [<ffffffff81201f61>] SyS_ioctl+0x91/0xb0
[   55.609123]  [<ffffffff817aef6d>] system_call_fastpath+0x16/0x1b
[   55.609125] ---[ end trace b05afc4c96235de3 ]---
[   55.610032] [drm:drm_calc_timestamping_constants] *ERROR* crtc 11: Can't calculate constants, dotclock = 0!
[   55.610147] [drm:i9xx_crtc_mode_set] *ERROR* Couldn't find PLL settings for mode!
[   55.618765] [drm:intel_pipe_config_compare] *ERROR* mismatch in pipe_src_w (expected 0, found 4096)
[   55.618767] ------------[ cut here ]------------
[   55.618790] WARNING: CPU: 1 PID: 1522 at /home/apw/COD/linux/drivers/gpu/drm/i915/intel_display.c:10969 check_crtc_state+0x291/0x380 [i915]()
[   55.618791] pipe state doesn't match!
[   55.618792] Modules linked in: snd_hrtimer zram lz4_compress rfcomm bnep binfmt_misc wacom_w8001 coretemp kvm_intel arc4 serport kvm pcmcia snd_hda_codec_realtek snd_hda_codec_generic joydev snd_hda_intel yenta_socket snd_hda_controller serio_raw pcmcia_rsrc iwl4965 snd_hda_codec pcmcia_core snd_hwdep snd_pcm iwlegacy i915 mac80211 snd_seq_midi snd_seq_midi_event irda cfg80211 drm_kms_helper snd_rawmidi snd_seq drm lpc_ich btusb snd_seq_device snd_timer crc_ccitt fujitsu_laptop fujitsu_tablet i2c_algo_bit snd bluetooth video soundcore tpm_infineon shpchp cuse parport_pc ppdev mac_hid lp parport btrfs xor raid6_pq mmc_block hid_generic usbhid hid psmouse ahci libahci pata_acpi sdhci_pci sdhci sky2
[   55.618830] CPU: 1 PID: 1522 Comm: Xorg Tainted: G        W  OE  3.18.0-031800rc1-generic #201410192135
[   55.618832] Hardware name: FUJITSU SIEMENS LIFEBOOK T4220/FJNB1D4, BIOS Version 1.18  02/23/2009
[   55.618833]  0000000000002ad9 ffff88021760f8a8 ffffffff817a1613 0000000000000007
[   55.618836]  ffff88021760f8f8 ffff88021760f8e8 ffffffff81074cfc ffff88021760f918
[   55.618838]  ffff880220fa8000 ffff8800ca0b0b38 ffff8800ca0b0800 ffff880220fa8708
[   55.618840] Call Trace:
[   55.618844]  [<ffffffff817a1613>] dump_stack+0x46/0x58
[   55.618847]  [<ffffffff81074cfc>] warn_slowpath_common+0x8c/0xc0
[   55.618849]  [<ffffffff81074de6>] warn_slowpath_fmt+0x46/0x50
[   55.618870]  [<ffffffffc08a256d>] ? intel_lvds_get_config+0x4d/0xf0 [i915]
[   55.618894]  [<ffffffffc086e2d1>] check_crtc_state+0x291/0x380 [i915]
[   55.618913]  [<ffffffffc087e8f5>] intel_modeset_check_state+0x65/0xa0 [i915]
[   55.618931]  [<ffffffffc087e955>] intel_set_mode+0x25/0x30 [i915]
[   55.618949]  [<ffffffffc087f4a4>] intel_crtc_set_config+0x244/0x370 [i915]
[   55.618952]  [<ffffffff817acda6>] ? mutex_lock+0x16/0x37
[   55.618964]  [<ffffffffc05ddf90>] drm_mode_set_config_internal+0x60/0x100 [drm]
[   55.618974]  [<ffffffffc05e1aa0>] drm_mode_setcrtc+0x290/0x4e0 [drm]
[   55.618982]  [<ffffffffc05d2e46>] drm_ioctl+0x2e6/0x590 [drm]
[   55.618993]  [<ffffffffc05e1810>] ? drm_mode_setplane+0x240/0x240 [drm]
[   55.618996]  [<ffffffff811f0f60>] ? __fput+0x170/0x250
[   55.618998]  [<ffffffff81201c85>] do_vfs_ioctl+0x75/0x2c0
[   55.619001]  [<ffffffff81091d7c>] ? task_work_run+0xac/0xe0
[   55.619003]  [<ffffffff8120c2a5>] ? __fget_light+0x25/0x70
[   55.619005]  [<ffffffff81201f61>] SyS_ioctl+0x91/0xb0
[   55.619008]  [<ffffffff817aef6d>] system_call_fastpath+0x16/0x1b
[   55.619009] ---[ end trace b05afc4c96235de4 ]---
[   55.619177] [drm:i965_irq_handler] *ERROR* pipe A underrun
[   56.196131] [drm:i9xx_check_fifo_underruns] *ERROR* pipe A underrun
[   56.701636] systemd-logind[1135]: Failed to start unit user@112.service: Unknown unit: user@112.service
[   56.701643] systemd-logind[1135]: Failed to start user service: Unknown unit: user@112.service
[   56.707358] systemd-logind[1135]: New session c1 of user lightdm.
[   56.707380] systemd-logind[1135]: Linked /tmp/.X11-unix/X0 to /run/user/112/X11-display.
Comment 3 Kenneth Graunke 2014-11-15 07:50:41 UTC
Does it work if you run:

always_flush_cache=true glmark2 -b ideas

If not, does it work if you run:

always_flush_batch=true glmark2 -b ideas
Comment 4 Felix Schwarz 2015-01-11 22:40:09 UTC
(In reply to Kenneth Graunke from comment #3)
> Does it work if you run:
> 
> always_flush_cache=true glmark2 -b ideas
> 
> If not, does it work if you run:
> 
> always_flush_batch=true glmark2 -b ideas

For me using "always_flush_cache=true" works (glmark2 finishes without lockup). I'm using kernel-3.17.8-300.fc21.x86_64, mesa 10.4.1, xorg server 1.16.2.901 and xorg-x11-drv-intel-2.99.916-3.20141117.fc21.x86_64.
Comment 5 Kenneth Graunke 2015-01-17 11:26:06 UTC
Created attachment 112383 [details]
Trimmed apitrace which reproduces the hang (344 GL calls)

Still no idea what's going on.  I managed to trim down an apitrace to a mere 344 GL calls, and only two draw calls...still reproduces the issue.

Either draw call appears to work fine by itself - you apparently have to do them together.
Comment 6 Kenneth Graunke 2015-01-17 11:48:59 UTC
The draws are:

341 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 0, count = 18)
342 glUniformMatrix4fv(...)
343 glVertexAttribPointer(...)
344 glDrawElements(mode = GL_TRIANGLE_STRIP, count = 18, type = GL_UNSIGNED_SHORT, indices = NULL)

Removing the glUniformMatrix4fv call between the two makes the hang disappear.  This corresponds to removing the CONSTANT_BUFFER packet between the two 3DPRIMITIVE packets...
Comment 7 Kenneth Graunke 2015-01-19 21:24:15 UTC
I can't find anything indicating what we're doing wrong, but today I committed a workaround for the problem.  It should be fixed in master with:

commit c4fd0c9052dd391d6f2e9bb8e6da209dfc7ef35b
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Sat Jan 17 23:21:15 2015 -0800

    i965: Work around mysterious Gen4 GPU hangs with minimal state changes.
    
    Gen4 hardware appears to GPU hang frequently when using Chromium, and
    also when running 'glmark2 -b ideas'.  Most of the error states contain
    3DPRIMITIVE commands in quick succession, with very few state packets
    between them - usually VERTEX_BUFFERS/ELEMENTS and CONSTANT_BUFFER.
    
    I trimmed an apitrace of the glmark2 hang down to two draw calls with a
    glUniformMatrix4fv call between the two.  Either draw by itself works
    fine, but together, they hang the GPU.  Removing the glUniform call
    makes the hangs disappear.  In the hardware state, this translates to
    removing the CONSTANT_BUFFER packet between the two 3DPRIMITIVE packets.
    
    Flushing before emitting CONSTANT_BUFFER packets also appears to make
    the hangs disappear.  I observed a slowdown in glxgears by doing it all
    the time, so I've chosen to only do it when BRW_NEW_BATCH and
    BRW_NEW_PSP are unset (i.e. we haven't done a CS_URB_STATE change or
    already flushed the whole pipeline).
    
    I'd much rather understand the problem, but at this point, I don't see
    how we'd ever be able to track it down further.  We have no real tools,
    and the hardware people moved on years ago.  I've analyzed 20+ error
    states and read every scrap of documentation I could find.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80568
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85367
    Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
    Acked-by: Matt Turner <mattst88@gmail.com>
    Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Comment 8 Kenneth Graunke 2015-01-19 21:27:21 UTC
*** Bug 73699 has been marked as a duplicate of this bug. ***
Comment 9 Matt Turner 2015-03-21 23:48:32 UTC
*** Bug 89706 has been marked as a duplicate of this bug. ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.