Bug 105278 - [ivb] Possibly nvidia/primus-induced GPU hang on rcs0, ecode 7:0:0x85fffff8 in chromium
Summary: [ivb] Possibly nvidia/primus-induced GPU hang on rcs0, ecode 7:0:0x85fffff8 i...
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-27 20:31 UTC by Dorian Wouters
Modified: 2019-09-25 19:09 UTC (History)
2 users (show)

See Also:
i915 platform: IVB
i915 features: GPU hang


Attachments
/sys/class/drm/card0/error contents (33.48 KB, text/plain)
2018-02-27 20:31 UTC, Dorian Wouters
Details

Description Dorian Wouters 2018-02-27 20:31:21 UTC
Created attachment 137665 [details]
/sys/class/drm/card0/error contents

Bug description: My entire display froze while switching between windows in X11. Nothing else seems to have hanged, as music was still playing and everything came back to normal after SIGKILLing Blender which was running on the nVidia GPU.

Details / Reproducing steps:
- Blender 2.79 was running on the nVidia GPU through primus with primusrun. CUDA was used to render Blender Cycles images. The Blender window was inactive for a while and did not render any other image since at least 10 minutes.
- I (accidentally) switched to the Blender window by clicking below the other window icon I tried to click on in (a vertical) xfce4-panel, then used the mouse wheel to get to another window above it in the list, scrolling through 4 other windows before reaching Chromium's, where the hang happened
- Xorg did not visually respond to VT switch requests in the minute or so following the freeze, but it turned out later switching itself worked; I left tty2 active (still without visual feedback; X11 on tty1)
- I suspended then resumed the laptop, same display before and after
- I ssh'd into my machine, where I ran:
  * `htop`, which did not show any CPU usage other than itself, sshd, firefox and pulseaudio (which were playing music in the background)
  * `perf top` showed no graphics-related perf event samples
  * `killall -9 blender`
- At this point the display did not update but was on tty2 (expected killing blender would unclog the graphics stack and make the console render)
- Alt+F1, and X11 resumes
- Ctrl+Alt+F2 and tty2 displays properly
- Back to X11, read dmesg and report this bug

System environment (package versions as reported by `pacman`):
-- chipset: HD4000 (part of an Intel i5-3317U; Ivy Bridge)
-- system architecture: 64-bit
-- xf86-video-intel: 1:2.99.917+812+g75795523-1
-- xserver: 1.19.6+13+gd0d1a694f-1
-- mesa: 17.3.5-1
-- libdrm: 2.4.90-3
-- kernel: 4.15.5-1-ARCH #1 SMP PREEMPT Thu Feb 22 22:15:20 UTC 2018 x86_64
-- Linux distribution: Arch Linux
-- Machine or mobo model: ASUS K56CB
-- Display connector: LVDS panel
-- nvidia: 390.25-13
-- nvidia GPU: GeForce 740M
-- primus: 20151110-7
-- bumblebee: 3.2.1-16
-- bbswitch: 0.8-113
-- blender: 17:2.79-9
-- compton (X11 compositor in use): 0.1_beta2.5-10
-- chromium: 64.0.3282.167-1

Additional info:
In the process of resetting the i915, a fence wait timed out:
[95008.506693] i915 0000:00:02.0: Resetting chip after gpu hang
[95010.549217] asynchronous wait on fence i915:[global]:6fd684 timed out
[95016.501277] i915 0000:00:02.0: Resetting chip after gpu hang

Starting up or using primus-forwarded software sometimes creates graphics corruption on some windows, which is fixed when a redraw happens but that also seems to happen all types of graphics buffers on the i915 like font cache/atlases, some applications like Steam are particularly affected by this problem. It is not unexpected that more than just buffer content gets corrupted.
Booting with intel_iommu enabled prevents graphical output as soon as the kernel switches away from efifb to inteldrmfb (that is, early in boot); maybe it could have a beneficial impact on the graphics corruption problem if it worked...
Comment 1 Elizabeth 2018-03-14 22:59:24 UTC
Hi, could you try mesa 17.3.6 or latest 18.0? Any way to trigger this more reliably? It still happens if you're not using the nVidia gpu?
Comment 2 Vladimir Los 2018-03-27 07:53:56 UTC
Dorian, is this bug reproduced easy and regularly on your hw/sw configuration?
Comment 3 Vladimir Los 2018-03-27 12:14:22 UTC
There were no bugs reproduced with similar SW configuration.
Mesa 17.3.5 (and 17.3.7) and blender 17:2.79-10 were installed.
Comment 4 GitLab Migration User 2019-09-25 19:09:53 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1700.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.