Bug 88641 - [snb 3.17] ringbuffer wrap overshoot? batchbuffer incoherence?
Summary: [snb 3.17] ringbuffer wrap overshoot? batchbuffer incoherence?
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-20 22:07 UTC by Arjan Bruin
Modified: 2017-07-03 10:45 UTC (History)
1 user (show)

See Also:
i915 platform: SNB
i915 features: GPU hang


Attachments
GPU crash dump (2.09 MB, text/plain)
2015-01-20 22:07 UTC, Arjan Bruin
no flags Details
GPU crash dump 2 (2.07 MB, text/plain)
2015-01-20 23:03 UTC, Arjan Bruin
no flags Details
Two more gpu crash dumps (639.57 KB, application/octet-stream)
2015-01-21 19:28 UTC, Arjan Bruin
no flags Details

Description Arjan Bruin 2015-01-20 22:07:55 UTC
Created attachment 112564 [details]
GPU crash dump

Error occurs consistently in firefox/chromium and are sometimes recoverable by closing dwm (like the error attached). Usually black blocks are visible beforehand.
Comment 1 Chris Wilson 2015-01-20 22:21:14 UTC
It looks very much like the ringbuffer was overwritten with new commands whilst the GPU was still executing them, and the CS parser went off into never-never land.
Comment 2 Chris Wilson 2015-01-20 22:22:13 UTC
Could you please grab a few more error states from firefox/chromium to see if they are all of this type?
Comment 3 Arjan Bruin 2015-01-20 23:03:08 UTC
Created attachment 112568 [details]
GPU crash dump 2
Comment 4 Chris Wilson 2015-01-21 09:39:05 UTC
(In reply to Arjan Bruin from comment #3)
> Created attachment 112568 [details]
> GPU crash dump 2

This looks different. Here there isn't the evidence of confusion in the ringbuffer, but that the CS kept reading past the batchbuffer - indicative that the batchbuffer is incoherent.

Did this issue start occurring recently? Perhaps a change in 3.17.y? If so, is there any chance you could bisect it?
Comment 5 Arjan Bruin 2015-01-21 19:25:45 UTC
I haven't had this install for very long. Before gentoo it had arch linux and didn't have the same problem. But so much can be different.

By opening 15 firefox youtube windows using a script I can reproduce quite reliably. Is this too artificial too base the bisect on? I'll attach two error dumps I got this way.
Comment 6 Arjan Bruin 2015-01-21 19:28:22 UTC
Created attachment 112616 [details]
Two more gpu crash dumps
Comment 7 Chris Wilson 2015-01-21 21:52:54 UTC
Both of those errors have inconsistent values in the batch vs being read by the GPU (what I call incoherence). Looks like you have a reasonable method of reproducing.

I think this most likely a kernel issue, and I would start investigating with different kernel versions.
Comment 8 Arjan Bruin 2015-01-25 12:13:37 UTC
linux-stable 3.18.0 (b2776bf) hasn't shown any problems so far.
3.19-rc5 does hang but that may be another issue. I'll try to capture an error state.
Comment 9 Jani Nikula 2016-04-21 12:17:20 UTC
(In reply to Arjan Bruin from comment #8)
> linux-stable 3.18.0 (b2776bf) hasn't shown any problems so far.
> 3.19-rc5 does hang but that may be another issue. I'll try to capture an
> error state.

Long time no updates, closing.

If the problem still persists with latest kernels, please reopen.
Comment 10 Jari Tahvanainen 2017-07-03 10:45:43 UTC
Closing >1 year old resolved+fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.