Bug 93967 - [IVB] Hangup and Graphics Reset
Summary: [IVB] Hangup and Graphics Reset
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-02 16:52 UTC by Jonathan Schmidt
Modified: 2017-08-31 19:05 UTC (History)
2 users (show)

See Also:
i915 platform: IVB
i915 features: GPU hang


Attachments
compressed version of [drm] GPU crash dump saved to /sys/class/drm/card0/error (317.15 KB, text/plain)
2016-02-02 16:52 UTC, Jonathan Schmidt
no flags Details
screenshot of Plasma/Kwin (14.42 KB, image/png)
2016-02-02 16:53 UTC, Jonathan Schmidt
no flags Details
output of xrandr --verbose (10.22 KB, text/plain)
2016-02-02 16:54 UTC, Jonathan Schmidt
no flags Details
121461: compressed version of [drm] GPU crash dump saved to /sys/class/drm/card0/error (243.21 KB, application/octet-stream)
2016-02-02 19:41 UTC, Jonathan Schmidt
no flags Details

Description Jonathan Schmidt 2016-02-02 16:52:11 UTC
Created attachment 121461 [details]
compressed version of [drm] GPU crash dump saved to /sys/class/drm/card0/error

Several actions like playing steam games in a window for a while or performing any number of actions in a Chromium window can result in this. It happens often, up to several times a day. There used to be a kernel panic error on my system which I am pretty sure has been "replaced" by this error now in which the system will only freeze for a short time. It seems as if the freezing is interrupted by ALT+TAB. Then a KWin Window Manager notification will appear telling me "Desktop effects were restarted due to a graphics reset", inspecting that notification for further detail states: "Compositing has been suspended | Another application has requested to suspend compositing" and "Graphics resect | A graphics reset event has occured".

The application themselves usually do not crash. Sometimes the system is able to continue performing, other times windows cannot be brought back to front or into background or minimized, effectively freezing any further user interaction other than rebooting.

I know of no easy way to replicate the problem. dmesg told me "[  137.796122] [drm] GPU crash dump saved to /sys/class/drm/card0/error" and so I am attaching that file.

My architecture is x86_64. My kernel is 4.2.0-27-generic. My Linux distribution is Kubuntu. My motherboard is an ASRock Z77 Pro3. Display connector is HDMI2 connected primary 1920x1200+0+0 (normal left inverted right x axis y axis) 160mm x 90mm.

Please note that I tried to get a lot of other information according to https://01.org/linuxgraphics/documentation/how-report-bugs but the only other I am able to provide is output from xrandr --verbose. Trying to get Intel Register Dump, dump VBIOS, or GPU error state have resulted in "Permission denied" messages. I am not comfortable with editing the kernel command line myself yet.
Comment 1 Jonathan Schmidt 2016-02-02 16:53:18 UTC
Created attachment 121462 [details]
screenshot of Plasma/Kwin
Comment 2 Jonathan Schmidt 2016-02-02 16:54:00 UTC
Created attachment 121463 [details]
output of xrandr --verbose
Comment 3 Jonathan Schmidt 2016-02-02 17:03:01 UTC
I should add that my current Kubuntu version is 15.10. Kernel crashes used to happen when I was still on 14.
Comment 4 Jonathan Schmidt 2016-02-02 19:41:37 UTC
Created attachment 121465 [details]
121461: compressed version of [drm] GPU crash dump saved to /sys/class/drm/card0/error
Comment 5 yann 2016-09-13 13:43:14 UTC
Hung is happening in blt ring batch with active head at 0x011f7004:

batchbuffer:
batch buffer (blitter ring (submitted by Xorg [1066])) at 0x00000000_011f7000
0x011f7000:      0x54f08806: XY_SRC_COPY_BLT (rgb enabled, alpha enabled, src tile 1, dst tile 1)
0x011f7004:      0x03cc0780:    format 8888, pitch 1920, rop 0xcc, clipping disabled,
0x011f7008:      0x035a013d:    dst (317,858)
0x011f700c:      0x048c030f:    dst (783,1164)
0x011f7010:      0x01ad9000:    dst offset 0x01ad9000
0x011f7014:      0x035a013d:    src (317,858)
0x011f7018:      0x00000780:    src pitch 1920
0x011f701c:      0x0d0df000:    src offset 0x0d0df000
0x011f7020:      0x05000000: MI_BATCH_BUFFER_END

Related fences (dst & src):
fence[1] = 23a203b01ad9001
    valid, x-tiled, pitch: 7680, start: 0x01ad9000, size: 9216000

fence[5] = d9a803b0d0df001
    valid, x-tiled, pitch: 7680, start: 0x0d0df000, size: 9216000

Chris any advice on this hang?
Comment 6 Chris Wilson 2017-03-28 13:03:57 UTC
The problem here is a TLB miss (or something like that) as the command the GPU executed does not match the command we submitted. The error state lacks the evidence for the usual seqno reordering errors, but it does hint towards VT-d. Unfortunately, it is too old to include that information explicitly.
Comment 7 Elizabeth 2017-07-27 20:50:25 UTC
Hello Jonathan, 
Sorry for the way too long delay. As mentioned in comment #6 the information to work with this is quite old, and last kernel reported is 4.2. Is this bug still valid with the newest kernel versions https://www.kernel.org/? Still reproducible? 
Thank you.
Comment 8 Elizabeth 2017-08-31 19:05:00 UTC
Closing as no new occurrences of this problem have been reported since February 2016. If problem arise with latest kernel versions, please file a new bug with HW and SW information and relevant logs. Thank you.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.