Bug 76928 - [BDW] GPU HANG: ecode 0:0x4cb94c99 in kernel log
Summary: [BDW] GPU HANG: ecode 0:0x4cb94c99 in kernel log
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-02 06:12 UTC by Noor Manseel Mohammed
Modified: 2017-07-24 22:55 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg log (229.12 KB, text/plain)
2014-04-02 06:12 UTC, Noor Manseel Mohammed
no flags Details
xorg.0.log (21.76 KB, text/plain)
2014-04-02 06:14 UTC, Noor Manseel Mohammed
no flags Details
cat /sys/class/drm/card0/error > /tmp/error (2.81 MB, application/x-bzip2)
2014-04-03 05:47 UTC, Noor Manseel Mohammed
no flags Details
dmesg for ecode 0:0x43c6e6cf (81.50 KB, text/plain)
2014-04-24 12:29 UTC, Noor Manseel Mohammed
no flags Details
cat /sys/class/drm/card0/error > /tmp/error2 (243.86 KB, application/x-bzip2)
2014-05-23 06:35 UTC, Noor Manseel Mohammed
no flags Details
dmesg2 (82.47 KB, text/plain)
2014-05-23 06:35 UTC, Noor Manseel Mohammed
no flags Details
dmesg3.log is for 3.15-rc7 (222ccbc) drm-intel-nightly (78.91 KB, text/plain)
2014-06-11 07:05 UTC, Noor Manseel Mohammed
no flags Details
error dump (267.57 KB, text/plain)
2014-06-11 07:07 UTC, Noor Manseel Mohammed
no flags Details
dmesg (78.81 KB, text/plain)
2014-07-25 10:36 UTC, Noor Manseel Mohammed
no flags Details
cat /sys/class/drm/card0/error > error4.log (260.41 KB, text/plain)
2014-07-25 10:37 UTC, Noor Manseel Mohammed
no flags Details

Description Noor Manseel Mohammed 2014-04-02 06:12:58 UTC
Created attachment 96758 [details]
dmesg log

Getting GPU Hang error in dmesg on Broadwell Y sku. 

"GPU HANG: ecode 0:0x4cb94c99, reason: Ring hung, action: reset"

But the system still continue to operate.

The crash dump file "/sys/class/drm/card0" is empty.

OS: Ubuntu 13.10 (64 bit)
Linux Kernel: 3.14.0-rc7
Processor: BDW Y 
RAM: 4 GB
Comment 1 Noor Manseel Mohammed 2014-04-02 06:14:17 UTC
Created attachment 96759 [details]
xorg.0.log
Comment 2 Chris Wilson 2014-04-02 06:38:00 UTC
Use "cat /sys/class/drm/card0/error > /tmp/error"
Comment 3 Noor Manseel Mohammed 2014-04-03 05:47:49 UTC
Created attachment 96823 [details]
cat /sys/class/drm/card0/error > /tmp/error
Comment 4 Gordon Jin 2014-04-13 03:18:24 UTC
Noor, please clear "NEEDINFO" status when you provide the answer.
Comment 5 Noor Manseel Mohammed 2014-04-24 12:27:50 UTC
Upgraded the kernel to 3.15.0-rc2 intel-drm-nightly.

Now getting a different error code with GPU hang:

[   80.502986] [drm:gen8_irq_handler] *ERROR* Pipe A FIFO underrun
[   97.778228] [drm] stuck on render ring
[   97.780656] [drm] GPU HANG: ecode 0:0x43c6e6cf, reason: Ring hung, action: reset
Comment 6 Noor Manseel Mohammed 2014-04-24 12:29:16 UTC
Created attachment 97899 [details]
dmesg for ecode 0:0x43c6e6cf
Comment 7 Noor Manseel Mohammed 2014-04-24 12:34:05 UTC
cat /sys/class/drm/card0/error > /tmp/error for this error is too big to attach here (above 3 MB). Kindly let me know if I need to share it in any other way.
Comment 8 Daniel Vetter 2014-05-19 13:57:04 UTC
(In reply to comment #7)
> cat /sys/class/drm/card0/error > /tmp/error for this error is too big to
> attach here (above 3 MB). Kindly let me know if I need to share it in any
> other way.

You might need to compress it with gzip or something like that.
Comment 9 Noor Manseel Mohammed 2014-05-22 10:21:19 UTC
We upgraded the kernel to latest intel-drm-nightly:

commit f79ba79cf037eea9ee757ad37730b00f43d5ef80
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri May 16 21:54:54 2014 +0200

    drm-intel-nightly: 2014y-05m-16d-21h-54m-36s integration manifest

Unable to see the GPU hang error in dmesg anymore. 

'cat /sys/class/drm/card0/error > /tmp/error' shows "no error state collected" in the output file.
Comment 10 Noor Manseel Mohammed 2014-05-23 06:32:18 UTC
Looks like the issue is still there, on long run.

I am attaching the dmesg (dmesg2.txt) as well as the 'cat /sys/class/drm/card0/error > /tmp/error2' output (error2.bz2).
Comment 11 Noor Manseel Mohammed 2014-05-23 06:35:16 UTC
Created attachment 99615 [details]
cat /sys/class/drm/card0/error > /tmp/error2
Comment 12 Noor Manseel Mohammed 2014-05-23 06:35:57 UTC
Created attachment 99616 [details]
dmesg2
Comment 13 Ben Widawsky 2014-05-30 23:09:32 UTC
From the dmesg2, it looks like the system is hanging really quickly. Please try to update your kernel to the rc7 based one (it has the null context), update your BIOS, and update your silicon to the latest.

Then report back if the bug still persists with the error state. Thanks.
Comment 14 Noor Manseel Mohammed 2014-06-11 07:05:44 UTC
Created attachment 100863 [details]
dmesg3.log is for 3.15-rc7 (222ccbc) drm-intel-nightly
Comment 15 Noor Manseel Mohammed 2014-06-11 07:07:34 UTC
Created attachment 100864 [details]
error dump

cat /sys/class/drm/card0/error > error3.log
Comment 16 Noor Manseel Mohammed 2014-06-11 07:09:32 UTC
I am able to reproduce the GPU HANG on 3.15-rc7 (222ccbc - drm-intel-nightly) as well. 
I have attached dmesg3.log and error3.log.bz2

Looks like this crash happened while I played a video in VLC player.
Comment 17 Noor Manseel Mohammed 2014-07-25 10:35:15 UTC
I am able to re-produce this issue on mainline kernel (3.16.0-rc6-mainline, master-82e13c7) while opening vlc player. I am attaching the logs dmesg4.log and error4.log.bz2
Comment 18 Noor Manseel Mohammed 2014-07-25 10:36:22 UTC
Created attachment 103435 [details]
dmesg

dmesg4
Comment 19 Noor Manseel Mohammed 2014-07-25 10:37:42 UTC
Created attachment 103436 [details]
cat /sys/class/drm/card0/error > error4.log
Comment 20 Gavin Hindman 2014-08-19 15:19:06 UTC
Ben are you still looking at this issue?  Has anyone in OTC been able to reproduce this?
Comment 21 Gavin Hindman 2014-08-19 15:20:23 UTC
Noor - please also update the issue with steps to reproduce the issue.
Comment 22 Glenn Williamson 2014-08-19 16:00:55 UTC
Per your request .. 

Steps:
---------
1. Execute commands:
sudo -s
echo disk > /sys/power/state
2. Wait 60 seconds
3. Resume the DUT using keyboard
4. Wait a moment
Actual result:
-----------------
4. I tried several times, there are several results:
DUT resumes but is frozen with screen on (Terminal is restored), mouse and keyboard are not responding
DUT reboots (image is not restored)
Boot/Resume is stopped with some logs on the screen
Expected result:
-------------------
4. DUT successfully suspends to disk and resumes
Comment 23 Noor Manseel Mohammed 2014-08-21 11:24:57 UTC
Gavin,

The hang looks to happen quite randomly. Especially when I am trying to run graphics operation like playing videos using VLC player.

Wrt Glenn's comment, I can confirm the GPU hang happens while trying to resume from S4. But also would like to mention that I am getting the GPU hang even when not trying any S4 cycle.
Comment 24 Rodrigo Vivi 2014-09-24 20:00:37 UTC
Noor, it's been a while with no update on this sorry. But could you please check the state of latest -nightly?

Forgetting the suspend-resume for now. Or reporting another bug please. I'm afraid the one Glenn was getting was related to PSR suspend-resume.
Comment 25 Vijayakannan Ayyathurai 2014-11-05 06:51:12 UTC
I am not hitting the GPU Hang issue with the mentioned configuration for more than 20 hours. I used mplayer to run the High resolution Video.

But issue is reproducible(10/10) while running the Video using VLC player. 
VLC player - VLC media player 2.1.4 Rincewind (revision 2.1.4-0-g2a072be)

"""
 [148638.947640] drm/i915: Resetting chip after gpu hang
 [148644.978512] [drm] GPU HANG: ecode 0:0x85dffffb, in Xorg [1066], reason: Ring hung, action: reset
 [148644.979107] [drm:i915_context_is_banned [i915]] *ERROR* gpu hanging too fast, banning!
 [148644.985697] drm/i915: Resetting chip after gpu hang
"""

Though the used vlc player is upto date. So the issue is specific with VLC player. Do let me know if needs more information.

Used configuraiton:

Board  : BDW Y LPDDR3 (PV) CRB FAB2 WITH MEMORY H11021-201 
CPU    : BROADWELL 2+2 ULX E3 MOBILE QDF: QNJ5 

Libdrm                   :               (master)libdrm-2.4.58-4-g00847fa48b83a85b0cb882594a12ed1511f780dbq
Mesa                     :               (master) 600066af93fe60abbfff5be82527b529e1e44916
Xserver                  :               (master) e9db7682028bb0464c211c1f7bb6983fcfb6f37b
Xf86-video-intel         :               61436c2fabe117b85404eecb06158ba0a63a7741 
Cairo                    :               (master) b4e218c3e8402e149115a59406796b751118237f
Libva                    :               (master) ccd93de5a707e92a629cccd595757c8d436fa3cc
Libva_intel_driver       :               (master) 24cba20a119c96556ae4dc9a90043896ea70e567
Kernel                   :               (drm-intel-nightly) 32cefad9992e67b4ee7487adf465bd7e189c9c1c

Thanks,
Vijay
Comment 26 Rodrigo Vivi 2015-01-21 23:08:27 UTC
Thanks for the test. Let's consider it fixed.

Feel free to reopen in case you still face the issue with latest -nightly.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.