93074 – GPU hang then corruption

Bug 93074 - GPU hang then corruption

Summary: GPU hang then corruption

Status:	CLOSED DUPLICATE of bug 90841

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	unspecified
Hardware:	x86 (IA32) Linux (All)

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-11-23 01:56 UTC by iainw
Modified:	2017-07-24 22:44 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
GPU crash dump from /sys/class/drm/card0/error (759.91 KB, text/plain) 2015-11-23 01:56 UTC, iainw	no flags	Details
View All

Description iainw 2015-11-23 01:56:05 UTC

Created attachment 120043 [details]
GPU crash dump from /sys/class/drm/card0/error

I've just experienced another momentary blankness followed by a scrambled screen.  Forcing windows to redraw often helps restore order, but scrolling vim windows and things often brings back the scrambled text, and highlighting articles in claws-mail also scrambles them across the screen (including outside the claws-mail window).  My web browser seems largely immune to scrambling problems, once I've refreshed the initially scrambled window.

This happens on average once a week, and usually as I'm just winding up work, but I've not identified a specific time yet.  It's been happening for at least a few months now, and even after a number of xorg and mesa updates, it's no better or worse than it first was.

Here's my dmesg:

[25644.997531] [drm] stuck on render ring
[25645.004709] [drm] GPU HANG: ecode 3:0:0x6affbfc1, in Xorg [295], reason: Ring hung, action: reset
[25645.004720] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[25645.004725] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[25645.004730] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[25645.004735] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[25645.004740] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[25645.004848] ------------[ cut here ]------------
[25645.004912] WARNING: CPU: 0 PID: 3846 at drivers/gpu/drm/i915/intel_display.c:3291 intel_crtc_wait_for_pending_flips+0x16c/0x200 [i915]()
[25645.004918] WARN_ON(ret)
[25645.004923] Modules linked in:
[25645.004929]  sha256_generic hmac drbg ansi_cprng ctr ccm joydev mousedev iTCO_wdt iTCO_vendor_support uvcvideo videobuf2_vmalloc videobuf2_memops arc4 videobuf2_core v4l2_common videodev rt2800pci rt2800mmio media rt2800lib coretemp evdev uas rt2x00pci input_leds rt2x00mmio mac_hid rt2x00lib pcspkr serio_raw psmouse i915 mac80211 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core cfg80211 i2c_i801 snd_hwdep snd_pcm lpc_ich rng_core drm_kms_helper snd_timer eeprom_93cx6 crc_ccitt atl1e thermal drm snd eeepc_laptop soundcore sparse_keymap battery intel_agp intel_gtt led_class agpgart rfkill shpchp ac i2c_algo_bit video button acpi_cpufreq processor sch_fq_codel ip_tables x_tables ext4 crc16 mbcache jbd2 ata_generic pata_acpi sd_mod usb_storage atkbd libps2 ata_piix
[25645.005092]  libata ehci_pci uhci_hcd ehci_hcd scsi_mod usbcore usb_common i8042 serio
[25645.005120] CPU: 0 PID: 3846 Comm: kworker/u4:1 Not tainted 4.2.5-1-ARCH #1
[25645.005128] Hardware name: ASUSTeK Computer INC. 901/901, BIOS 2103    06/11/2009
[25645.005180] Workqueue: i915-hangcheck i915_hangcheck_elapsed [i915]
[25645.005190]  c1631967 b8222fb0 00000000 f2c97d50 c14c8e8d f2c97d90 f2c97d80 c1058457
[25645.005210]  f874675a f2c97db0 00000f06 f8751b28 00000cdb f86edb1c f86edb1c f4fa0000
[25645.005229]  f4df1034 00000001 f2c97d9c c10584ce 00000009 f2c97d90 f874675a f2c97db0
[25645.005247] Call Trace:
[25645.005265]  [<c14c8e8d>] dump_stack+0x48/0x69
[25645.005277]  [<c1058457>] warn_slowpath_common+0x87/0xc0
[25645.005335]  [<f86edb1c>] ? intel_crtc_wait_for_pending_flips+0x16c/0x200 [i915]
[25645.005389]  [<f86edb1c>] ? intel_crtc_wait_for_pending_flips+0x16c/0x200 [i915]
[25645.005402]  [<c10584ce>] warn_slowpath_fmt+0x3e/0x60
[25645.005456]  [<f86edb1c>] intel_crtc_wait_for_pending_flips+0x16c/0x200 [i915]
[25645.005484]  [<f82ddba4>] ? drm_modeset_lock_all_crtcs+0x84/0x90 [drm]
[25645.005540]  [<f86eeea4>] intel_crtc_disable_planes+0x34/0xf0 [i915]
[25645.005593]  [<f86ef00a>] intel_prepare_reset+0x6a/0x80 [i915]
[25645.005641]  [<f86c2877>] i915_handle_error+0x147/0x6e0 [i915]
[25645.005657]  [<c10ad457>] ? vprintk_default+0x37/0x40
[25645.005705]  [<f86c3093>] i915_hangcheck_elapsed+0x233/0x410 [i915]
[25645.005719]  [<c106da4a>] process_one_work+0x11a/0x3f0
[25645.005730]  [<c106dd57>] worker_thread+0x37/0x470
[25645.005740]  [<c106dd20>] ? process_one_work+0x3f0/0x3f0
[25645.005750]  [<c1072df6>] kthread+0xa6/0xc0
[25645.005761]  [<c1079ff5>] ? finish_task_switch+0x55/0x190
[25645.005773]  [<c14cddc1>] ret_from_kernel_thread+0x21/0x30
[25645.005783]  [<c1072d50>] ? kthread_worker_fn+0x140/0x140
[25645.005792] ---[ end trace ae2bba7cddb23771 ]---
[25645.280403] drm/i915: Resetting chip after gpu hang

My crash dump is attached.

Comment 1 iainw 2015-11-26 21:25:53 UTC

Just happened again, so I don't think clock time or uptime are relevant. Stack trace is identical, but this happened on CPU 1 (single core machine, but hyperthreading makes it appear as 0 and 1), and unsurprisingly the PID was different.

I've taken a copy of the crash dump, which I can attach if anyone needs it.

Comment 2 iainw 2015-12-26 22:16:26 UTC

This happened again last evening.  Very nearly a month since the last crash!

Comment 3 iainw 2015-12-30 10:42:38 UTC

And again, same traceback, same work queue.  I've no idea what I'm doing different on the days that it does fail compared to the days where I can get 18+ hours of uptime without it failing.

Comment 4 Chris Wilson 2015-12-31 08:28:37 UTC


*** This bug has been marked as a duplicate of bug 90841 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.