Bug 107159 - [GLK] GPU HANG in kodi
Summary: [GLK] GPU HANG in kodi
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard: Triaged
Keywords:
Depends on:
Blocks:
 
Reported: 2018-07-08 21:55 UTC by Erik Sandlund
Modified: 2019-09-25 19:12 UTC (History)
1 user (show)

See Also:
i915 platform: GLK
i915 features: GPU hang


Attachments
/sys/class/drm/card0/error (39.86 KB, text/plain)
2018-07-08 21:55 UTC, Erik Sandlund
Details
dmesg Ubuntu 18.04 server (784.08 KB, text/plain)
2018-07-09 08:07 UTC, Erik Sandlund
Details
dmesg with error (1.60 MB, text/plain)
2018-07-09 08:08 UTC, Erik Sandlund
Details
/sys/class/drm/card0/error Ubuntu (83.53 KB, text/plain)
2018-07-09 08:09 UTC, Erik Sandlund
Details
i965: flush render target before ISP disable (2.22 KB, patch)
2018-07-09 08:45 UTC, Lionel Landwerlin
Details | Splinter Review
Mesa 18.1.3 error log (83.60 KB, text/plain)
2018-07-10 00:02 UTC, Erik Sandlund
Details
Mesa 18.1.3 dmesg log (2.37 MB, text/plain)
2018-07-10 00:03 UTC, Erik Sandlund
Details
Mesa 18.1.3 xorg log (22.60 KB, text/plain)
2018-07-10 00:03 UTC, Erik Sandlund
Details
Mesa 18.2 /sys/class/drm/card0/error (65.79 KB, text/plain)
2018-07-13 11:06 UTC, Erik Sandlund
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Erik Sandlund 2018-07-08 21:55:36 UTC
Created attachment 140515 [details]
/sys/class/drm/card0/error

Hi, I'm using a NUC7PJYH for Kodi. It worked fine for a couple of weeks but lately the i915-driver hangs when I stress it. I have tried reinstalling the system (Ubuntu 18.04 Server) and tried various LibreElec-versions. I can reproduce error by running glxgears on openbox. Depending on driver-settings for Xorg Intel-driver the time until crash varies. If I load up Kodi it crashes if I move around in the menus. Videos seems to play ok.

I supply an error-log from a LibreElec "Milhouse build" since I figure it's the least touched by my messing around.
Comment 1 Francesco Balestrieri 2018-07-09 05:19:36 UTC
Can you also send a dmesg from boot with kernel options drm.debug=0x1e log_buf_len=4M?

And it would be great if you could try to reproduce using drm-tip (https://cgit.freedesktop.org/drm-tip)
Comment 2 Erik Sandlund 2018-07-09 08:07:55 UTC
Created attachment 140517 [details]
dmesg Ubuntu 18.04 server
Comment 3 Erik Sandlund 2018-07-09 08:08:41 UTC
Created attachment 140518 [details]
dmesg with error
Comment 4 Erik Sandlund 2018-07-09 08:09:25 UTC
Created attachment 140519 [details]
/sys/class/drm/card0/error Ubuntu
Comment 5 Erik Sandlund 2018-07-09 08:10:41 UTC
I've tried drm-tip from http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-tip/current/ and reproduced the error. Can install it again later tonight and provide logs.
Comment 6 Lionel Landwerlin 2018-07-09 08:45:20 UTC
The first error state reminds me of https://bugs.freedesktop.org/show_bug.cgi?id=106243.
This hang seems to have happened after the fix so looks like we might need a bigger hammer before disable the indirect state pointers...

On the other hand the second error state seems to indicate that the hang happened in a batch buffer that isn't part of the error state (could be a batch from i915?).

Would you be able to try the attached patch for Mesa?

Thanks a lot!
Comment 7 Lionel Landwerlin 2018-07-09 08:45:46 UTC
Created attachment 140520 [details] [review]
i965: flush render target before ISP disable
Comment 8 Lionel Landwerlin 2018-07-09 10:22:41 UTC
If you could give your settings for the xorg intel driver and your version of Mesa that would really useful too.
Thanks!
Comment 9 Erik Sandlund 2018-07-10 00:02:25 UTC
Created attachment 140531 [details]
Mesa 18.1.3 error log
Comment 10 Erik Sandlund 2018-07-10 00:03:01 UTC
Created attachment 140532 [details]
Mesa 18.1.3 dmesg log
Comment 11 Erik Sandlund 2018-07-10 00:03:24 UTC
Created attachment 140533 [details]
Mesa 18.1.3 xorg log
Comment 12 Erik Sandlund 2018-07-10 00:06:11 UTC
Patch applied in logs above. I'm not used to compiling and applying patches though so I don't know if everything worked. Bug still occurs though.

No grub-settings except for debug-string. No xorg-settings except Driver Intel and TearFree on.

Mesa is now 18.1.3. Was Standard Bionic before, 18.0.0-rc5.
Comment 13 Lionel Landwerlin 2018-07-10 00:14:35 UTC
(In reply to Erik Sandlund from comment #12)
> Patch applied in logs above. I'm not used to compiling and applying patches
> though so I don't know if everything worked. Bug still occurs though.
> 

Looking at the traces, it seems the patch wasn't applied :(
Comment 14 Erik Sandlund 2018-07-10 21:42:14 UTC
What should I look for to see if patch is in use? I've re-added patch and recompiled but I don't want to flood this bug report with my not-so-useful attachments.
Comment 15 Lionel Landwerlin 2018-07-10 23:52:22 UTC
Apologies, I must have downloaded the wrong attachment (or mess up locally).
Looks like you're now hitting a different issue.

I'm looking at where the GPU stopped to figure out what's wrong.

Here is how to do it :

If you compile the mesa repository with the intel tools activated (I usually use meson) :

$ cd mesa
$ meson -Dgles2=true -Ddri-drivers=i915,i965 -platforms=x11,drm,wayland,surfaceless -Dgallium-drivers= --buildtype=release -vulkan-drivers=intel -Dtools=intel -Dbuild-tests=true build .
$ ninja -C build

Then you can run the aubinator_error_decode tool :

$ ./build/src/intel/tools/aubinator_error_decode /path/to/my/card0/error

Then search "ACTHD:", if I take the last error state you posted in should be this line : 

  ACTHD: 0x00000000 001389f4

Then search with that address : 001389f4

0x00135b04:  0x78150009:  3DSTATE_CONSTANT_VS 

This is the instruction triggering the GPU hang.

In the previous error state, it was a PIPE_CONTROL (which was related to the other bug I mentioned).

So looks like the patch helps.
Is your machine hanging as often with this patch?
Comment 16 Erik Sandlund 2018-07-11 09:29:47 UTC
Yes, no difference really. Directly after compile it worked pretty good but after a few minutes it hung and started to hang more often after that. I see gfx corruption on the screen which sometimes looks the same even after a reboot.

I've compiled drm-tip with https://patchwork.freedesktop.org/patch/237548/ applied. No differene though really. I also tried your patch on mesa 18.2.0 with no real difference.

Should I supply more logs?
Comment 17 Erik Sandlund 2018-07-13 11:06:06 UTC
Created attachment 140623 [details]
Mesa 18.2 /sys/class/drm/card0/error

/sys/class/drm/card0/error

OpenGL core profile version string: 4.5 (Core Profile) Mesa 18.2.0-devel (git-0288fe8d04)

name:           i915
vermagic:       4.18.0-rc4+ SMP mod_unload
Comment 18 Lionel Landwerlin 2018-07-13 11:19:05 UTC
(In reply to Erik Sandlund from comment #16)
> Yes, no difference really. Directly after compile it worked pretty good but
> after a few minutes it hung and started to hang more often after that. I see
> gfx corruption on the screen which sometimes looks the same even after a
> reboot.
> 
> I've compiled drm-tip with https://patchwork.freedesktop.org/patch/237548/
> applied. No differene though really. I also tried your patch on mesa 18.2.0
> with no real difference.
> 
> Should I supply more logs?

Hi,

Thanks a lot for all the traces, I don't think we'll need more traces at this point.
I think we need to find what's right fix here, your last error state shows that the patch I've attached doesn't help.
Comment 19 Erik Sandlund 2018-07-14 23:19:29 UTC
This might be a hardware issue since Windows 10 also produces strange artifacts and hangs after Intel-driver install.
Comment 20 GitLab Migration User 2019-09-25 19:12:17 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1737.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.