Bug 107159 - [GLK] GPU HANG in kodi
Summary: [GLK] GPU HANG in kodi
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-08 21:55 UTC by Erik Sandlund
Modified: 2018-07-14 23:19 UTC (History)
1 user (show)

See Also:
i915 platform: GLK
i915 features: GPU hang


Attachments
/sys/class/drm/card0/error (39.86 KB, text/plain)
2018-07-08 21:55 UTC, Erik Sandlund
Details
dmesg Ubuntu 18.04 server (784.08 KB, text/plain)
2018-07-09 08:07 UTC, Erik Sandlund
Details
dmesg with error (1.60 MB, text/plain)
2018-07-09 08:08 UTC, Erik Sandlund
Details
/sys/class/drm/card0/error Ubuntu (83.53 KB, text/plain)
2018-07-09 08:09 UTC, Erik Sandlund
Details
i965: flush render target before ISP disable (2.22 KB, patch)
2018-07-09 08:45 UTC, Lionel Landwerlin
Details | Splinter Review
Mesa 18.1.3 error log (83.60 KB, text/plain)
2018-07-10 00:02 UTC, Erik Sandlund
Details
Mesa 18.1.3 dmesg log (2.37 MB, text/plain)
2018-07-10 00:03 UTC, Erik Sandlund
Details
Mesa 18.1.3 xorg log (22.60 KB, text/plain)
2018-07-10 00:03 UTC, Erik Sandlund
Details
Mesa 18.2 /sys/class/drm/card0/error (65.79 KB, text/plain)
2018-07-13 11:06 UTC, Erik Sandlund
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Erik Sandlund 2018-07-08 21:55:36 UTC
Created attachment 140515 [details]
/sys/class/drm/card0/error

Hi, I'm using a NUC7PJYH for Kodi. It worked fine for a couple of weeks but lately the i915-driver hangs when I stress it. I have tried reinstalling the system (Ubuntu 18.04 Server) and tried various LibreElec-versions. I can reproduce error by running glxgears on openbox. Depending on driver-settings for Xorg Intel-driver the time until crash varies. If I load up Kodi it crashes if I move around in the menus. Videos seems to play ok.

I supply an error-log from a LibreElec "Milhouse build" since I figure it's the least touched by my messing around.
Comment 1 Francesco Balestrieri 2018-07-09 05:19:36 UTC
Can you also send a dmesg from boot with kernel options drm.debug=0x1e log_buf_len=4M?

And it would be great if you could try to reproduce using drm-tip (https://cgit.freedesktop.org/drm-tip)
Comment 2 Erik Sandlund 2018-07-09 08:07:55 UTC
Created attachment 140517 [details]
dmesg Ubuntu 18.04 server
Comment 3 Erik Sandlund 2018-07-09 08:08:41 UTC
Created attachment 140518 [details]
dmesg with error
Comment 4 Erik Sandlund 2018-07-09 08:09:25 UTC
Created attachment 140519 [details]
/sys/class/drm/card0/error Ubuntu
Comment 5 Erik Sandlund 2018-07-09 08:10:41 UTC
I've tried drm-tip from http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-tip/current/ and reproduced the error. Can install it again later tonight and provide logs.
Comment 6 Lionel Landwerlin 2018-07-09 08:45:20 UTC
The first error state reminds me of https://bugs.freedesktop.org/show_bug.cgi?id=106243.
This hang seems to have happened after the fix so looks like we might need a bigger hammer before disable the indirect state pointers...

On the other hand the second error state seems to indicate that the hang happened in a batch buffer that isn't part of the error state (could be a batch from i915?).

Would you be able to try the attached patch for Mesa?

Thanks a lot!
Comment 7 Lionel Landwerlin 2018-07-09 08:45:46 UTC
Created attachment 140520 [details] [review]
i965: flush render target before ISP disable
Comment 8 Lionel Landwerlin 2018-07-09 10:22:41 UTC
If you could give your settings for the xorg intel driver and your version of Mesa that would really useful too.
Thanks!
Comment 9 Erik Sandlund 2018-07-10 00:02:25 UTC
Created attachment 140531 [details]
Mesa 18.1.3 error log
Comment 10 Erik Sandlund 2018-07-10 00:03:01 UTC
Created attachment 140532 [details]
Mesa 18.1.3 dmesg log
Comment 11 Erik Sandlund 2018-07-10 00:03:24 UTC
Created attachment 140533 [details]
Mesa 18.1.3 xorg log
Comment 12 Erik Sandlund 2018-07-10 00:06:11 UTC
Patch applied in logs above. I'm not used to compiling and applying patches though so I don't know if everything worked. Bug still occurs though.

No grub-settings except for debug-string. No xorg-settings except Driver Intel and TearFree on.

Mesa is now 18.1.3. Was Standard Bionic before, 18.0.0-rc5.
Comment 13 Lionel Landwerlin 2018-07-10 00:14:35 UTC
(In reply to Erik Sandlund from comment #12)
> Patch applied in logs above. I'm not used to compiling and applying patches
> though so I don't know if everything worked. Bug still occurs though.
> 

Looking at the traces, it seems the patch wasn't applied :(
Comment 14 Erik Sandlund 2018-07-10 21:42:14 UTC
What should I look for to see if patch is in use? I've re-added patch and recompiled but I don't want to flood this bug report with my not-so-useful attachments.
Comment 15 Lionel Landwerlin 2018-07-10 23:52:22 UTC
Apologies, I must have downloaded the wrong attachment (or mess up locally).
Looks like you're now hitting a different issue.

I'm looking at where the GPU stopped to figure out what's wrong.

Here is how to do it :

If you compile the mesa repository with the intel tools activated (I usually use meson) :

$ cd mesa
$ meson -Dgles2=true -Ddri-drivers=i915,i965 -platforms=x11,drm,wayland,surfaceless -Dgallium-drivers= --buildtype=release -vulkan-drivers=intel -Dtools=intel -Dbuild-tests=true build .
$ ninja -C build

Then you can run the aubinator_error_decode tool :

$ ./build/src/intel/tools/aubinator_error_decode /path/to/my/card0/error

Then search "ACTHD:", if I take the last error state you posted in should be this line : 

  ACTHD: 0x00000000 001389f4

Then search with that address : 001389f4

0x00135b04:  0x78150009:  3DSTATE_CONSTANT_VS 

This is the instruction triggering the GPU hang.

In the previous error state, it was a PIPE_CONTROL (which was related to the other bug I mentioned).

So looks like the patch helps.
Is your machine hanging as often with this patch?
Comment 16 Erik Sandlund 2018-07-11 09:29:47 UTC
Yes, no difference really. Directly after compile it worked pretty good but after a few minutes it hung and started to hang more often after that. I see gfx corruption on the screen which sometimes looks the same even after a reboot.

I've compiled drm-tip with https://patchwork.freedesktop.org/patch/237548/ applied. No differene though really. I also tried your patch on mesa 18.2.0 with no real difference.

Should I supply more logs?
Comment 17 Erik Sandlund 2018-07-13 11:06:06 UTC
Created attachment 140623 [details]
Mesa 18.2 /sys/class/drm/card0/error

/sys/class/drm/card0/error

OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.2.0-devel (git-0288fe8d04)

name:           i915
vermagic:       4.18.0-rc4+ SMP mod_unload
Comment 18 Lionel Landwerlin 2018-07-13 11:19:05 UTC
(In reply to Erik Sandlund from comment #16)
> Yes, no difference really. Directly after compile it worked pretty good but
> after a few minutes it hung and started to hang more often after that. I see
> gfx corruption on the screen which sometimes looks the same even after a
> reboot.
> 
> I've compiled drm-tip with https://patchwork.freedesktop.org/patch/237548/
> applied. No differene though really. I also tried your patch on mesa 18.2.0
> with no real difference.
> 
> Should I supply more logs?

Hi,

Thanks a lot for all the traces, I don't think we'll need more traces at this point.
I think we need to find what's right fix here, your last error state shows that the patch I've attached doesn't help.
Comment 19 Erik Sandlund 2018-07-14 23:19:29 UTC
This might be a hardware issue since Windows 10 also produces strange artifacts and hangs after Intel-driver install.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.