102451 – [gen4] GPU Hang on plasma/Xorg -- out of bounds write

Bug 102451 - [gen4] GPU Hang on plasma/Xorg -- out of bounds write

Summary: [gen4] GPU Hang on plasma/Xorg -- out of bounds write

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	unspecified
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Intel 3D Bugs Mailing List
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2017-08-28 08:31 UTC by Giulio Bernardi
Modified:	2018-03-06 21:13 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
/sys/class/drm/card0/error (12.93 KB, application/gzip) 2017-08-28 08:31 UTC, Giulio Bernardi	Details
View All

Description Giulio Bernardi 2017-08-28 08:31:44 UTC

Created attachment 133831 [details]
/sys/class/drm/card0/error

I got this crash while restoring a minimized window in KDE Plasma (the screen froze when the window was half-way in the process of being "zoomed", partially transparent).

I am using kernel 4.12 since some days, I never got this problem with the previous kernel 4.11. Kernel and xorg are from Fedora 25:

kernel-4.12.8-200.fc25.x86_64
xorg-x11-server-Xorg-1.19.3-1.fc25.x86_64

Here is the dmesg excerpt. Attached you may find the (gzipped) /sys/class/drm/card0/error file.


[115531.801820] [drm] GPU HANG: ecode 4:0:0x001d1d1d, in Xorg [1796], reason: Hang on rcs, action: reset
[115531.801823] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[115531.801824] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[115531.801826] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[115531.801827] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[115531.801828] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[115531.801906] drm/i915: Resetting chip after gpu hang
[115531.802756] ------------[ cut here ]------------
[115531.802759] kernel BUG at ./include/linux/dma-fence.h:419!
[115531.802768] invalid opcode: 0000 [#1] SMP
[115531.802819] Modules linked in: bnep ccm fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc zfs(POE) zunicode(POE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) dm_crypt uvcvideo btusb btrtl btbcm videobuf2_vmalloc btintel videobuf2_memops videobuf2_v4l2 bluetooth videobuf2_core videodev media ecdh_generic iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm irqbypass acer_wmi sparse_keymap joydev i2c_i801 lpc_ich arc4 snd_hda_codec_realtek
[115531.802922]  iwldvm mac80211 snd_hda_codec_generic iwlwifi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm cfg80211 rfkill snd_timer snd soundcore shpchp wmi acpi_cpufreq tpm_tis tpm_tis_core tpm binfmt_misc vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) i915 serio_raw i2c_algo_bit drm_kms_helper drm r8169 mii video
[115531.802922] CPU: 0 PID: 10068 Comm: kworker/0:0 Tainted: P           OE   4.12.8-200.fc25.x86_64 #1
[115531.802922] Hardware name: Acer TravelMate 8371/TravelMate 8371, BIOS V1.28 08/11/2010
[115531.802922] Workqueue: events_long i915_hangcheck_elapsed [i915]
[115531.802922] task: ffff9589a4e88000 task.stack: ffffb49b0b5e4000
[115531.802922] RIP: 0010:skip_request+0x66/0x70 [i915]
[115531.802922] RSP: 0000:ffffb49b0b5e7b70 EFLAGS: 00013202
[115531.802922] RAX: 0000000000000003 RBX: ffff958b9c245b00 RCX: 0000000000000000
[115531.802922] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb49b1000b860
[115531.802922] RBP: ffffb49b0b5e7b80 R08: 000000000000000a R09: ffffb49b1000b848
[115531.802922] R10: 0000000000000010 R11: 0000000000000387 R12: ffffb49b10004000
[115531.802922] R13: ffff958bb23e28f8 R14: ffffffffa58762a0 R15: ffff958b9c245b00
[115531.802922] FS:  0000000000000000(0000) GS:ffff958bbfc00000(0000) knlGS:0000000000000000
[115531.802922] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[115531.802922] CR2: 00007f9986bad010 CR3: 0000000208e09000 CR4: 00000000000406f0
[115531.802922] Call Trace:
[115531.802922]  i915_gem_reset+0x1d7/0x380 [i915]
[115531.802922]  ? g4x_do_reset+0x241/0x250 [i915]
[115531.802922]  ? i915_do_reset+0x120/0x120 [i915]
[115531.802922]  ? bit_wait_io_timeout+0xa0/0xa0
[115531.807196]  i915_reset+0xde/0x170 [i915]
[115531.807196]  i915_reset_and_wakeup+0x17c/0x190 [i915]
[115531.807196]  i915_handle_error+0x1df/0x220 [i915]
[115531.807196]  ? scnprintf+0x49/0x80
[115531.807196]  hangcheck_declare_hang+0xdb/0x100 [i915]
[115531.807196]  i915_hangcheck_elapsed+0x29f/0x2d0 [i915]
[115531.807196]  process_one_work+0x18c/0x3a0
[115531.807196]  worker_thread+0x4e/0x3b0
[115531.807196]  kthread+0x109/0x140
[115531.807196]  ? process_one_work+0x3a0/0x3a0
[115531.807196]  ? kthread_park+0x60/0x60
[115531.807196]  ret_from_fork+0x25/0x30
[115531.807196] Code: e5 8b 93 c8 01 00 00 31 ff 31 c0 29 c2 4c 01 e7 31 f6 e8 7e ad 2e e5 48 8b 43 48 a8 01 75 0c c7 43 58 fb ff ff ff 5b 41 5c 5d c3 <0f> 0b 0f 1f 84 00 00 00 00 00 66 66 66 66 90 48 8d 74 37 ff 48 
[115531.807196] RIP: skip_request+0x66/0x70 [i915] RSP: ffffb49b0b5e7b70
[115531.849902] ---[ end trace 883defc894584620 ]---

Comment 1 Chris Wilson 2017-08-28 08:56:58 UTC

That kernel OOPS should be fixed in v4.13 (it's just warning that the state of the request changed halfway through the reset request), with any luck it will mark the driver as wedged as userspace managed to clobber everything in the first few megabytes of the GTT.

Comment 2 Elizabeth 2018-03-06 21:08:20 UTC

Hello Guilio, any update on this? Where you able to upgrade at least to kernel version 4.13? Is this still reproducible with that version?

Comment 3 Giulio Bernardi 2018-03-06 21:13:37 UTC

(In reply to Elizabeth from comment #2)
> Hello Guilio, any update on this? Where you able to upgrade at least to
> kernel version 4.13? Is this still reproducible with that version?

Sorry, I forgot about this report! I never had other crashes after upgrading to 4.13 (running 4.15.6 now), so I believe the report can be closed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.