Bug 88641

Summary: [snb 3.17] ringbuffer wrap overshoot? batchbuffer incoherence?
Product: DRI Reporter: Arjan Bruin <arjanbruin86>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: SNB i915 features: GPU hang
Attachments:
Description Flags
GPU crash dump
none
GPU crash dump 2
none
Two more gpu crash dumps none

Description Arjan Bruin 2015-01-20 22:07:55 UTC
Created attachment 112564 [details]
GPU crash dump

Error occurs consistently in firefox/chromium and are sometimes recoverable by closing dwm (like the error attached). Usually black blocks are visible beforehand.
Comment 1 Chris Wilson 2015-01-20 22:21:14 UTC
It looks very much like the ringbuffer was overwritten with new commands whilst the GPU was still executing them, and the CS parser went off into never-never land.
Comment 2 Chris Wilson 2015-01-20 22:22:13 UTC
Could you please grab a few more error states from firefox/chromium to see if they are all of this type?
Comment 3 Arjan Bruin 2015-01-20 23:03:08 UTC
Created attachment 112568 [details]
GPU crash dump 2
Comment 4 Chris Wilson 2015-01-21 09:39:05 UTC
(In reply to Arjan Bruin from comment #3)
> Created attachment 112568 [details]
> GPU crash dump 2

This looks different. Here there isn't the evidence of confusion in the ringbuffer, but that the CS kept reading past the batchbuffer - indicative that the batchbuffer is incoherent.

Did this issue start occurring recently? Perhaps a change in 3.17.y? If so, is there any chance you could bisect it?
Comment 5 Arjan Bruin 2015-01-21 19:25:45 UTC
I haven't had this install for very long. Before gentoo it had arch linux and didn't have the same problem. But so much can be different.

By opening 15 firefox youtube windows using a script I can reproduce quite reliably. Is this too artificial too base the bisect on? I'll attach two error dumps I got this way.
Comment 6 Arjan Bruin 2015-01-21 19:28:22 UTC
Created attachment 112616 [details]
Two more gpu crash dumps
Comment 7 Chris Wilson 2015-01-21 21:52:54 UTC
Both of those errors have inconsistent values in the batch vs being read by the GPU (what I call incoherence). Looks like you have a reasonable method of reproducing.

I think this most likely a kernel issue, and I would start investigating with different kernel versions.
Comment 8 Arjan Bruin 2015-01-25 12:13:37 UTC
linux-stable 3.18.0 (b2776bf) hasn't shown any problems so far.
3.19-rc5 does hang but that may be another issue. I'll try to capture an error state.
Comment 9 Jani Nikula 2016-04-21 12:17:20 UTC
(In reply to Arjan Bruin from comment #8)
> linux-stable 3.18.0 (b2776bf) hasn't shown any problems so far.
> 3.19-rc5 does hang but that may be another issue. I'll try to capture an
> error state.

Long time no updates, closing.

If the problem still persists with latest kernels, please reopen.
Comment 10 Jari Tahvanainen 2017-07-03 10:45:43 UTC
Closing >1 year old resolved+fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.