107311 – seemingly random GPU hangs, no input

Bug 107311 - seemingly random GPU hangs, no input

Summary: seemingly random GPU hangs, no input

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/AMDgpu (show other bugs)
Version:	DRI git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium major
Assignee:	Default DRI bug account
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-07-20 17:46 UTC by Roshless
Modified:	2018-10-27 10:48 UTC (History)
CC List:	0 users

See Also:
i915 platform:
i915 features:

Attachments
dmesg output from last crash (21.41 KB, text/plain) 2018-07-20 17:46 UTC, Roshless	no flags	Details
different message (16.61 KB, text/plain) 2018-07-20 17:55 UTC, Roshless	no flags	Details
different order if it means anything (5.13 KB, text/plain) 2018-07-20 17:58 UTC, Roshless	no flags	Details
View All

Description Roshless 2018-07-20 17:46:25 UTC

Created attachment 140736 [details]
dmesg output from last crash

Hello. I been following issues here for more than a month now, looking for any fixes or at least workarounds for my problem. I've had multiple errors, but most of them are now:

[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=648885, last emitted seq=648887
[drm] GPU recovery disabled.
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, last signaled seq=94380, last emitted seq=94383
[drm] GPU recovery disabled.

Arch Linux, mesa-git and others from this repository (https://pkgbuild.com/~lcarlier/mesa-git/), kernel 4.18.0-rc5, Sapphire Radeon RX 580 8G NITRO+. I've tested this issue on multiple configurations, both stable mesa and kernel crashed at least 2 times daily, all crashed for me, but at least now I get at least full day between them. For some reason all crashes occurred on either: watching videos on youtube, -normal light webpage browsing, -watching videos with mpv (acceleration disabled).

I've used amdgpu before on HD 7850 for a few months without any crashes ever (probably without DC) on the same setup.

If it's some program fault, please at least point me to a method how to find out which.

Comment 1 Roshless 2018-07-20 17:55:04 UTC

Created attachment 140737 [details]
different message

Comment 2 Roshless 2018-07-20 17:58:15 UTC

Created attachment 140738 [details]
different order if it means anything

Comment 3 dwagner 2018-07-20 19:28:40 UTC

From how you describe it, you are experiencing the same bugs that I reported in https://bugs.freedesktop.org/show_bug.cgi?id=102322

Comment 4 Roshless 2018-07-29 10:02:00 UTC

(In reply to dwagner from comment #3)
> From how you describe it, you are experiencing the same bugs that I reported
> in https://bugs.freedesktop.org/show_bug.cgi?id=102322

Indeed, this is the same bug it seems. amdgpu.vm_update_mode=3 prevents crashes, though I'd rather look for application that crashes my system/ look forward to working driver.

*** This bug has been marked as a duplicate of bug 102322 ***

Comment 5 Roshless 2018-07-29 12:50:08 UTC

Spoke too soon, just got another crash, classic message

[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, last signaled seq=2423708, last emitted seq=2423710

On amdgpu.vm_update_mode=3 I got 3 days (about 12h per day) + 2/3h today without crashing.

Comment 6 Roshless 2018-10-27 10:48:08 UTC

Resolved my problem by connecting only 8 pins, leaving additional 6 not connected. At least it's not crashing now.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.