90259 – [gen2] Incoherent batch buffers, kernel w/a is n/a and no pin-ioctl to w/a

Bug 90259 - [gen2] Incoherent batch buffers, kernel w/a is n/a and no pin-ioctl to w/a

Summary: [gen2] Incoherent batch buffers, kernel w/a is n/a and no pin-ioctl to w/a

Status:	CLOSED WONTFIX

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Duplicates (1):	93056 (view as bug list)
Depends on:
Blocks:

Reported:	2015-04-30 18:29 UTC by tka
Modified:	2017-03-03 16:49 UTC (History)
CC List:	2 users (show)

See Also:
i915 platform:	I865G
i915 features:	GPU hang

Attachments
dmesg (33.45 KB, text/plain) 2015-04-30 18:29 UTC, tka	no flags	Details
/sys/class/drm/card0/error (695.03 KB, text/plain) 2015-04-30 18:30 UTC, tka	no flags	Details
New /sys/class/drm/card0/error (695.29 KB, text/plain) 2015-05-03 14:01 UTC, tka	no flags	Details
/sys/class/drm/card0/error (698.06 KB, text/plain) 2015-05-30 19:27 UTC, tka	no flags	Details
dmesg (33.51 KB, text/plain) 2015-05-30 19:28 UTC, tka	no flags	Details
/sys/class/drm/card0/error [20151115] (695.41 KB, text/plain) 2015-11-15 15:27 UTC, tka	no flags	Details
View All

Description tka 2015-04-30 18:29:26 UTC

Created attachment 115484 [details]
dmesg

Trying to log into the system after boot-up (using slim), the system hangs for a one or two seconds. From dmesg:

[   26.704033] [drm] stuck on render ring
[   26.708701] [drm] GPU HANG: ecode 2:0:0x40907fc1, in X [1310], reason: Ring hung, action: reset
[   26.708705] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   26.708707] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   26.708708] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   26.708710] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   26.708712] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   26.708809] drm/i915: Resetting chip after gpu hang
[   26.708823] [drm:i915_reset] *ERROR* Failed to reset chip: -19

This happens with kernel versions 4.0.0 and 4.0.1. With 3.19.x kernels, the hang does not occur.

X.Org 1.17.1

DDX 2.99.197 and git (e7016d30f3a0ae817c77ccbd962f776ac3e7e100)

Comment 1 tka 2015-04-30 18:30:09 UTC

Created attachment 115485 [details]
/sys/class/drm/card0/error

Comment 2 Chris Wilson 2015-05-03 11:02:28 UTC

commit b400dd22c28a08d7644b4ede076be9d8c2b8ca9d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun May 3 11:58:25 2015 +0100

    sna: Clear has-pinned-batches if we can no longer actually pin
    
    Insert rant about useful kernel interfaces being removed without
    justification.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=88411
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Comment 3 tka 2015-05-03 13:58:46 UTC

The error is still there:

[   42.704035] [drm] stuck on render ring
[   42.708714] [drm] GPU HANG: ecode 2:0:0x067fffc1, in X [1296], reason: Ring hung, action: reset
[   42.708719] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   42.708720] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   42.708722] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   42.708724] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   42.708726] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   42.708804] drm/i915: Resetting chip after gpu hang
[   42.708817] [drm:i915_reset] *ERROR* Failed to reset chip: -19

And now there are rendering issues after log in. The desktop background is not drawn, the cursor is invisible until an application is opened (I guess, it becomes visible once it changes its form), and some other issues.

Comment 4 tka 2015-05-03 14:01:28 UTC

Created attachment 115524 [details]
New /sys/class/drm/card0/error

GPU dump after applying the patch. I can't tell if something important changed.

Comment 5 Chris Wilson 2015-05-03 14:37:22 UTC

Something overwrote at least the first 32MiB of the GTT with ~0.

Comment 6 tka 2015-05-30 19:27:33 UTC

Created attachment 116186 [details]
/sys/class/drm/card0/error

(In reply to Chris Wilson from comment #5)
> Something overwrote at least the first 32MiB of the GTT with ~0.

Now I can say that this overwriting happens, but very rarely. Most of the time, this does not happen. The GPU hang however is still present every time. Here is a current /sys/class/drm/card0/error (Linux 4.0.4, xf-86-video-intel git-fb1643f0f904eb258da71cd0b8deb8d3ec6dafed).

Comment 7 tka 2015-05-30 19:28:22 UTC

Created attachment 116187 [details]
dmesg

And the corresponding dmesg output

Comment 8 tka 2015-06-27 22:39:14 UTC

The GPU still hangs with kernel version 4.1.

Comment 9 tka 2015-11-15 15:23:44 UTC

The GPU still hangs right after logging in with Linux 4.3.0 and the current git version of xf86-video-intel (0340718366d7cb168a46930eb7be22f2d88354d8).

Do you need any additional information? If so, let me know.

Or is the hardware so old that no one cares anymore?

Comment 10 tka 2015-11-15 15:27:53 UTC

Created attachment 119684 [details]
/sys/class/drm/card0/error [20151115]

Here is a fresh GPU crash dump (linux: 4.3.0, xf86-video-intel: git-0340718366d7cb168a46930eb7be22f2d88354d8).

Comment 11 Chris Wilson 2015-11-21 17:15:14 UTC

*** Bug 93056 has been marked as a duplicate of this bug. ***

Comment 12 yann 2017-02-24 08:21:53 UTC

We seem to have neglected the bug a bit, apologies.

tka, since There were improvements pushed in kernel & xf86-video-intel that will benefit to your system, so please re-test with latest kernel & xf86-video-intel -alternatively you may use modesetting/glamor with latest mesa version- and mark as REOPENED if you can reproduce (and attach fresh gpu error dump & kernel log) and RESOLVED/* if you cannot reproduce.

Comment 13 yann 2017-03-03 16:49:26 UTC

(In reply to yann from comment #12)
> We seem to have neglected the bug a bit, apologies.
> 
> tka, since There were improvements pushed in kernel & xf86-video-intel that
> will benefit to your system, so please re-test with latest kernel &
> xf86-video-intel -alternatively you may use modesetting/glamor with latest
> mesa version- and mark as REOPENED if you can reproduce (and attach fresh
> gpu error dump & kernel log) and RESOLVED/* if you cannot reproduce.

Timeout. Assuming that this is not occurring anymore. If this issue happens again, re-test with latest kernel & xf86-video-intel -alternatively you may use modesetting/glamor with latest mesa version- and REOPEN if you can reproduce (and attach fresh gpu error dump & kernel log)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.