Bug 84818 - [gen4] GPU hang in Chrome
Summary: [gen4] GPU hang in Chrome
Status: RESOLVED WORKSFORME
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: medium blocker
Assignee: Ian Romanick
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-09 02:09 UTC by Tim Landscheidt
Modified: 2015-10-19 08:02 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Excerpt from /var/log/messages. (49.01 KB, text/plain)
2014-10-09 02:09 UTC, Tim Landscheidt
Details
/sys/class/drm/card0/error after the error occurs (with drm.debug=0xe). (846.28 KB, text/plain)
2014-10-09 19:12 UTC, Tim Landscheidt
Details
dmesg after the error occurs (with drm.debug=0xe). (186.39 KB, text/plain)
2014-10-09 19:16 UTC, Tim Landscheidt
Details

Description Tim Landscheidt 2014-10-09 02:09:59 UTC
Created attachment 107584 [details]
Excerpt from /var/log/messages.

On a Fedora 19 box, (guessing:) since the update to google-chrome-stable-38.0.2125.101-1.i386, opening YouTube videos causes the screen to go blank.  This also blanks all terminals ([Ctrl-Alt-F2] & Co.), so I have to blindly log in and type "shutdown -r now" to reboot the machine.

When this occurs, /var/log/messages starts with:

| Oct  9 01:13:50 passepartout kernel: [  431.004031] [drm] stuck on render ring
| Oct  9 01:13:50 passepartout kernel: [  431.004039] [drm] GPU crash dump saved to /sys/class/drm/card0/error
| Oct  9 01:13:50 passepartout kernel: [  431.004041] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
| Oct  9 01:13:50 passepartout kernel: [  431.004043] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
| Oct  9 01:13:50 passepartout kernel: [  431.004045] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
| Oct  9 01:13:50 passepartout kernel: [  431.004046] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
| Oct  9 01:13:51 passepartout kernel: [  431.508040] [drm:i915_reset] *ERROR* Failed to reset chip: -110
| Oct  9 01:13:55 passepartout kernel: [  435.733338] Watchdog[2408]: segfault at 0 ip b6387197 sp afd6fdd0 error 6 in chrome[b257e000+51ab000]
| Oct  9 01:14:00 passepartout kernel: [  441.006052] [drm:i915_gem_wait_for_error] *ERROR* Timed out waiting for the gpu reset to complete
| Oct  9 01:14:01 passepartout kernel: [  441.326032] [drm:i915_gem_wait_for_error] *ERROR* Timed out waiting for the gpu reset to complete
| Oct  9 01:14:01 passepartout kernel: [  441.377037] [drm] GMBUS [i915 gmbus vga] timed out, falling back to bit banging on pin 2
| [...]

(For the complete log cf. attachment.)

*After* the reboot /sys/class/drm/card0/error reads "no error state collected", so if you need this file, please state so as then I'll have to set up a script to copy it somewhere (because all the screens are blank so I can't do that interactively).
Comment 1 Paulo Zanoni 2014-10-09 11:24:20 UTC
Unfortunately we really need error state file. It gets created after the GPU hang happens, and it goes away after you reboot the machine.

If you got another machine or a smartphone, you could try to use SSH/SCP to grab the error state file after the hang happens. Even other operating systems have SSH/SCP clients, so it shouldn't be that hard.

Also, it would be nice to have the complete log file. Please boot your Kernel with the "drm.debug=0xe" parameter (you can pass it from Grub), and then after the error happens, run "dmesg > dmesg.txt" and attach the file here (in addition to the error state file).
Comment 2 Tim Landscheidt 2014-10-09 19:12:04 UTC
Created attachment 107628 [details]
/sys/class/drm/card0/error after the error occurs (with drm.debug=0xe).
Comment 3 Tim Landscheidt 2014-10-09 19:16:13 UTC
Created attachment 107629 [details]
dmesg after the error occurs (with drm.debug=0xe).
Comment 4 Paulo Zanoni 2014-10-09 19:31:19 UTC
Thank you for the files!

I forgot to ask earlier: is there any way you could test a newer Kernel? Please test the most recent possible Kernel and check if the problem still happens.

You could try grabbing one from the Fedora development version, or Linus, or download http://cgit.freedesktop.org/drm-intel and compile the "drm-intel-nightly" branch.
Comment 5 Tim Landscheidt 2014-10-10 02:37:23 UTC
I tried kernel-3.14.20-100.fc19.i686 from Fedora's updates-testing and after about an hour of crash-free YouTube I was about to state my cautious optimism here, when the error occured again.  Unfortunately, I hadn't set up the catch-the-files script then, so I have no data apart from /var/log/messages that starts again with:

| Oct 10 01:59:23 passepartout kernel: [ 5488.004035] [drm] stuck on render ring
| Oct 10 01:59:23 passepartout kernel: [ 5488.004043] [drm] GPU crash dump saved to /sys/class/drm/card0/error
| Oct 10 01:59:23 passepartout kernel: [ 5488.004045] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
| Oct 10 01:59:23 passepartout kernel: [ 5488.004047] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
| Oct 10 01:59:23 passepartout kernel: [ 5488.004049] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
| Oct 10 01:59:23 passepartout kernel: [ 5488.004050] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
| Oct 10 01:59:24 passepartout kernel: [ 5488.508033] [drm:i915_reset] *ERROR* Failed to reset chip: -110
| [...]

Testing non-packaged kernels will take me a few days because I won't do that on my (somewhat :-)) working system.

(If there is a stress test for the graphics system that is more reproducable than watching YouTube, pointers are appreciated.  Spending literally an hour without knowing if the different kernel has fixed the issue is very unproductive as the looming blank screen prevents doing anything non-trivial :-(.)
Comment 6 Paulo Zanoni 2014-10-10 14:07:49 UTC
(In reply to Tim Landscheidt from comment #5)
> I tried kernel-3.14.20-100.fc19.i686 from Fedora's updates-testing and after
...
> Testing non-packaged kernels will take me a few days because I won't do that
> on my (somewhat :-)) working system.
> 

I understand your reasons, but 3.14 is considered quite "ancient" by the people who develop code: you're like 1000 commits behind us.

You could try to test the Kernels that are already packaged for Fedora Rawhide. I have never tried this, so I can't guarantee that testing this won't kill your cat, but it could be worth a try:

http://fedora.c3sl.ufpr.br/linux/development/rawhide/i386/os/Packages/k/kernel-3.18.0-0.rc0.git1.1.fc22.i686.rpm (the package)
http://fedora.c3sl.ufpr.br/linux/development/rawhide/i386/os/Packages/k/ (the folder containing the package, in case the above link gets obsolete)
https://mirrors.fedoraproject.org/publiclist/Fedora/development/i386/ (the whole list of mirrors, in case both links above get obsolete)

> (If there is a stress test for the graphics system that is more reproducable
> than watching YouTube, pointers are appreciated.  Spending literally an hour
> without knowing if the different kernel has fixed the issue is very
> unproductive as the looming blank screen prevents doing anything non-trivial
> :-(.)

We have intel-gpu-tools (http://cgit.freedesktop.org/xorg/app/intel-gpu-tools), which contains many tests, but I don't know if any of them is going to reproduce the problem you are facing. You could also launch the youtube and then leave the computer alone, and periodically check for a possible hang.
Comment 7 Rodrigo Vivi 2014-10-15 20:02:20 UTC
Please retest with a more recent kernel as Paulo requested. Please collect and attach new error state.
Preferrably you could get it with latest drm-intel-nightly branch from cgit.freedesktop.org/drm-intel
Comment 8 Janus Troelsen 2014-12-25 18:06:11 UTC
Could this be a duplicate of https://bugs.freedesktop.org/show_bug.cgi?id=80568 ?
Comment 9 Tim Landscheidt 2015-10-18 17:55:25 UTC
I was unable to test a new kernel, but in the mean time I have moved to a new machine with a different graphic system, so I will not be able to reproduce this bug in the future.  Looking at https://bugs.freedesktop.org/page.cgi?id=fields.html#bug_status, I am unsure which status would be appropriate for this report, so I'll leave that to someone else.  Thanks!
Comment 10 Jani Nikula 2015-10-19 08:02:53 UTC
(In reply to Tim Landscheidt from comment #9)
> I was unable to test a new kernel, but in the mean time I have moved to a
> new machine with a different graphic system, so I will not be able to
> reproduce this bug in the future.  Looking at
> https://bugs.freedesktop.org/page.cgi?id=fields.html#bug_status, I am unsure
> which status would be appropriate for this report, so I'll leave that to
> someone else.  Thanks!

Thanks for the follow-up, and sorry that we weren't able to get to the bottom of this. Since the bug has been without updates for so long, I'm closing this (using arbitrarily chosen WORKSFORME).


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.