Bug 107586 - [kbl] GPU HANG: ecode 9:0:0x85dffffb
Summary: [kbl] GPU HANG: ecode 9:0:0x85dffffb
Status: RESOLVED NOTOURBUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-16 03:48 UTC by Dmitry D
Modified: 2018-09-11 15:45 UTC (History)
1 user (show)

See Also:
i915 platform: KBL
i915 features: GPU hang


Attachments
drm crash dump (39.61 KB, text/plain)
2018-08-16 03:48 UTC, Dmitry D
Details
error dump + dmesg on drm-tip kernel (69.65 KB, application/x-compressed)
2018-08-16 20:58 UTC, Dmitry D
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dmitry D 2018-08-16 03:48:53 UTC
Created attachment 141129 [details]
drm crash dump

Hello,

I have video player app that works fine on i3-7100U and Atom z8350. The same HDD image hangs on i5-7600 in 3 seconds. I tried all kernels 4.12-4.18 but the same:
[   42.723507] [drm] GPU HANG: ecode 9:0:0x85dffffb, in vplayer [1867], reason: hang on rcs0, action: reset
[   42.723508] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   42.723508] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   42.723508] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   42.723509] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   42.723509] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   42.723515] i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
[   46.048174] asynchronous wait on fence i915:Xorg[1827]/0:41 timed out

I found that it hangs only if my app trying to download image from iGPU to system memory via OpenGL Pixel Buffer Object. If I disable PBO and downloading image without PBO then it works fine.

Best Regards,
Dmitry
Comment 1 Jani Saarinen 2018-08-16 06:42:57 UTC
Please try to reproduce the error using latest drm-tip (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e log_buf_len=4M, and if the problem persists attach the full dmesg from boot.
Comment 2 Denis 2018-08-16 08:26:13 UTC
hi Dmitry. In addition to Jani, I would ask you to add information about your mesa version (glxinfo -B) and what is the exact player you using?
Comment 3 Lionel Landwerlin 2018-08-16 10:04:42 UTC
If you can add the source/binary to your player that would be really helpful to reproduce. Thanks!
Comment 4 Dmitry D 2018-08-16 20:25:56 UTC
Binary version of player you can find here https://wetransfer.com/downloads/ddd2be7c8c351ffc6cfedaa12bba68f320180816201210/1175061648453aec8c4b05184c87583320180816201210/b236a2

app required Lua 5.1, vaapi and avahi-daemon avahi-utils libxerces-c3.1 libnss3 libgconf-2-4 libatk1.0-0 libcups2 libgtk2.0-0 packages

run_bad.sh GPU hangs in 3 seconds on 4.12-4.16 kernels. On drm-tip kernel sometimes it works but sometimes GPU hangs. The problem with 4.17+ kernels is that once GPU hangs system hangs as well and no way to get any error dumps.
Comment 5 Dmitry D 2018-08-16 20:42:08 UTC
glxinfo -B:

name of display: :0.0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Intel Open Source Technology Center (0x8086)
    Device: Mesa DRI Intel(R) HD Graphics 630 (Kaby Lake GT2)  (0x5912)
    Version: 18.0.5
    Accelerated: yes
    Video memory: 3072MB
    Unified memory: yes
    Preferred profile: core (0x1)
    Max core profile version: 4.5
    Max compat profile version: 3.0
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 630 (Kaby Lake GT2)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.0.5
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.0 Mesa 18.0.5
OpenGL shading language version string: 1.30
OpenGL context flags: (none)

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 18.0.5
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
Comment 6 Dmitry D 2018-08-16 20:58:20 UTC
Created attachment 141153 [details]
error dump + dmesg on drm-tip kernel

This is requested dump on drm-tip kernel
Comment 7 Denis 2018-08-17 14:50:29 UTC
Hi. I tried to run this player on debian-testing and manjaro. It asked for a huge amount of old packages and files, like "libva-drm.so.1". On debian I sorted them out, but player simply crashed. Manjaro still in progress.
But I easily setup everything on:
Ubuntu 16.04
KBL (HD 620)
kernel 4.15 and 4.13
mesa 18.1.5 and 18.0.5

I couldn't reproduce your issue, player works fine (except lags with audio). No hangs.

I have one more GPU - UHD 630 (with manjaro) - will try on it.
Comment 8 Dmitry D 2018-08-17 14:58:22 UTC
Hi,

Yes it should work on Ubuntu 16.04. But problem exist only on I5-7600 (UHD 630). I have i3-7100U(HD 620) and it works fine but I5-7600 not.
Comment 9 Denis 2018-08-20 10:37:11 UTC
hi Dmitry. Here is my new config:

CPU:       Quad core Intel Core i3-8100 (-MCP-) cache: 6144 KB 
           clock speeds: max: 3600 MHz 1: 1037 MHz 2: 3012 MHz 3: 3003 MHz
           4: 3042 MHz
Graphics:  Card: Intel Device 3e91
           Display Server: X.Org 1.19.6 drivers: (unloaded: fbdev,vesa)
           Resolution: 1920x1080@60.00hz
           GLX Renderer: Mesa DRI Intel HD Graphics (Coffeelake 3x8 GT2)
           GLX Version: 3.0 Mesa 18.0.5

That's UHD630 gpu, and it also works fine (I couldn't reproduce the issue. ubuntu 16.04)
BTW - in your case (your CPU model) - you really have HD630 (not Uhd).

What is your distro and display server (X or Wayland?)
Comment 10 Dmitry D 2018-08-20 12:47:25 UTC
Hello,

>BTW - in your case (your CPU model) - you really have HD630 (not Uhd).
Yes you are right it is Intel HD Graphics 630 (not UHD)

>What is your distro and display server (X or Wayland?)
X
Comment 11 Dmitry D 2018-08-21 15:16:20 UTC
Distro is Ubuntu 16.04

If you want I can give you SSH access to this system.
Comment 12 Denis 2018-08-30 15:09:43 UTC
hm, we can try this. BTW, did you try older mesa versions? Could be so that it is new bug, and before it worked fine?
Try to build 17 mesa version, for example, and check on it.

#git clone git://anongit.freedesktop.org/git/mesa/mesa
#cd mesa
#git checkout mesa-17.3.6 (for example)
#./autogen.sh --with-gallium-drivers="" --with-dri-drivers=i965 --prefix=<path to bins>
#make -j4
#sudo make install
#export LD_LIBRARY_PATH=<path to bins>/lib/
Comment 13 Dmitry D 2018-08-30 15:19:31 UTC
Hello,

I tried mesa 17.x version before. It was 9 month old system with mesa 17. After initial tests I found that it have an issue and only after that I have updated system to latest packages and latest mesa 18. But new system has the same issue and I have opened a bug report.
Comment 14 Denis 2018-08-31 08:54:43 UTC
got you. Then we can try ssh. I never ran x apps via ssh, but glance search says that it should be possible, so at least I can try.
Comment 15 Dmitry D 2018-08-31 14:17:40 UTC
>I never ran x apps via ssh
ssh it is how I work with this system 99% of time to tun my X app and it works :). How I can give you access? Could you send me PM to get all ssh info?
Comment 16 Denis 2018-09-03 15:31:48 UTC
reproduced the issue on your PC. Checked 13 and 12 mesa versions - and bug also exists there, continue checking
Comment 17 asimiklit 2018-09-04 11:42:25 UTC
Hi

I tried to reproduce this issue on the under ubunta 16.04 (glxinfo -B is below) but it is not reproducible. I tried it several times. I waited for an approximately 10 minutes for each run of 'run_bad.sh' but nothing.

Linux 4.15.0-33-generic #36~16.04.1-Ubuntu SMP Wed Aug 15 17:21:05 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Intel Open Source Technology Center (0x8086)
    Device: Mesa DRI Intel(R) HD Graphics 630 (Kaby Lake GT2)  (0x591b)
    Version: 18.0.5
    Accelerated: yes
    Video memory: 3072MB
    Unified memory: yes
    Preferred profile: core (0x1)
    Max core profile version: 4.5
    Max compat profile version: 3.0
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.2
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) HD Graphics 630 (Kaby Lake GT2) 
OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.0.5
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 3.0 Mesa 18.0.5
OpenGL shading language version string: 1.30
OpenGL context flags: (none)

OpenGL ES profile version string: OpenGL ES 3.2 Mesa 18.0.5
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
Comment 18 Dmitry D 2018-09-04 12:49:51 UTC
Is it possible that just mine CPU and it is broken? Or some montherboard/CPU incompatibility?
Comment 19 asimiklit 2018-09-04 13:10:56 UTC
I guess that following difference between our CPUs is a root cause why I unable to reproduce this bug:
Your CPU is 0x5912 - Desktop
My CPU is 0x591b - Mobile
Comment 20 Dmitry D 2018-09-11 13:43:23 UTC
Hello,

I have replaced CPU and motherboard by warranty but problem was still present. After that I found new BIOS 8.0 for my motherboard ASRock H110M-STX that only changes is "Enhance system stability". And after update system works without problem. I don't know how it is possible but it works.
Comment 21 Mark Janes 2018-09-11 15:45:18 UTC
For the engineers at Global Logic that have done so much good work tracking down these bugs -- Thank You!

It's always good to verify BIOS before investigating bugs that are not widely reproduced.  Motherboard vendors often rush to market before their platforms are stable.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.