Bug 91126 - [i865G] stuck on render ring. Failed to reset chip: -19
Summary: [i865G] stuck on render ring. Failed to reset chip: -19
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: low normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-27 16:17 UTC by Götz
Modified: 2017-07-24 22:46 UTC (History)
1 user (show)

See Also:
i915 platform: I865G
i915 features: GPU hang


Attachments
drm-card0-error (47.32 KB, application/octet-stream)
2015-06-27 16:17 UTC, Götz
no flags Details
dmesg (13.44 KB, application/octet-stream)
2015-06-27 16:18 UTC, Götz
no flags Details
Xorg.0.log (4.89 KB, application/x-bzip2)
2015-06-27 16:19 UTC, Götz
no flags Details
graphics corruption (287.26 KB, text/plain)
2015-07-03 16:47 UTC, Götz
no flags Details

Description Götz 2015-06-27 16:17:09 UTC
Created attachment 116749 [details]
drm-card0-error

After updating xf86-video-intel from  2.99.917-5  to  1:2.99.917+364+gb24e758-1  (in Arch Linux) the GPU hang is produced when playing a video with XV (the fastest way with this card), with a 'Failed to reset chip: -19'.

GPU crash dump attached. If you need, I can try to narrow to the culprit commit.


[  919.990039] [drm] stuck on render ring
[  919.995225] [drm] GPU HANG: ecode 2:0:0x017300c1, in Xorg [374], reason: Ring hung, action: reset
[  919.995230] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  919.995232] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  919.995234] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  919.995236] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  919.995238] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  919.995341] drm/i915: Resetting chip after gpu hang
[  919.995425] [drm:i915_reset [i915]] *ERROR* Failed to reset chip: -19


System environment:
-- chipset: i865G
-- system architecture: i686
-- xf86-video-intel: 2.99.917+364+gb24e758-1
-- xserver: 1.17.2-1
-- mesa: 10.6.0-1
-- libdrm: 2.4.61-1
-- kernel: 4.0.6-1-ARCH
-- Linux distribution: Arch Linux
Comment 1 Götz 2015-06-27 16:18:36 UTC
Created attachment 116750 [details]
dmesg
Comment 2 Götz 2015-06-27 16:19:51 UTC
Created attachment 116751 [details]
Xorg.0.log
Comment 3 Chris Wilson 2015-06-27 16:26:57 UTC
Hmm. I haven't touched gen2 specific paths since February. Could you do a bisect and see what pops up? First check by compiling 2.99.917 for yourself.
Comment 4 Götz 2015-07-01 23:47:53 UTC
I have trouble testing some commits. 
From version 2.99.917 to 2.99.917-200-gbacaf7f (March 17) I get the a segmentation fault starting the X server:

[   846.289] (EE) Backtrace:
[   846.290] (EE) 0: /usr/lib/xorg-server/Xorg (OsSigHandler+0x32) [0x81dfdd2]
[   846.291] (EE) 1: linux-gate.so.1 (?+0x32) [0xb7734be1]
[   846.291] (EE) 
[   846.291] (EE) Segmentation fault at address 0x0
[   846.291] (EE) 
Fatal server error:
[   846.291] (EE) Caught signal 11 (Segmentation fault). Server aborting

For version 2.99.917 up until February ~25, I had to include the patch: udev integration depends on fstat and sys/stat.h [1]

Version 2.99.917-201-g7fe2b29 (March 19) up to the latest commit works, but the GPU crashes and fails to be reset.

Not sure what to do, is it a bug in the server? Should I report a bug for the server crash, but for a problem that is no longer happening with the latest code?

[1] http://lists.x.org/archives/xorg-commit/2015-February/037661.html
[2] http://lists.x.org/archives/xorg-commit/2015-February/037668.html
Comment 5 Götz 2015-07-02 00:00:31 UTC
Applying commit 7fe2b29 "sna: Protect against ABI breakage in recent versions of libdrm" (March 19) to 2.99.917 avoids the server segfault :)

But why does the Arch package work? It doesn't contain that patch.[1]

I will report back when the bisect process finishes.

[1] https://projects.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/xf86-video-intel&id=b58c23d9fa101cc0d9b7778edf47497ea17cfc74
Comment 6 Chris Wilson 2015-07-02 06:38:21 UTC
(In reply to Götz from comment #5)
> Applying commit 7fe2b29 "sna: Protect against ABI breakage in recent
> versions of libdrm" (March 19) to 2.99.917 avoids the server segfault :)
> 
> But why does the Arch package work? It doesn't contain that patch.[1]

It will just depend on which version of the libdrm headers it was built against. A fresh build of -intel will die, but at the time the package was built it was fine.
Comment 7 Götz 2015-07-03 16:47:53 UTC
Created attachment 116932 [details]
graphics corruption

(In reply to Chris Wilson from comment #6)
> It will just depend on which version of the libdrm headers it was built
> against. A fresh build of -intel will die, but at the time the package was
> built it was fine.

Ah, interesting, thanks!

The bisecting finished showing 2.99.917-4-g986cb23 (Jan 6) "sna: Enable mmap(wc) support by default", as the cause for the issue. 

Additionally with the commit appeared graphics corruption, which persisted thru 2.99.917-17-ge53087e, but it was not present with 2.99.917-35-g6a6efd3 or after.
Comment 8 Götz 2016-07-07 19:49:52 UTC
I haven't seen this anymore with the latest software versions, and xf86-video-intel from git master.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.