Bug 106666

Summary: amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:56 vmid:3 pas_id:0), [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=327845, last emitted seq=327847
Product: DRI Reporter: udo <udovdh>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg
none
messages
none
dmesg
none
xorg log
none
dmesg
none
dmesg w/mesa 18.0.2-1.fc28
none
dmesg w/mesa 18.2.0-0.11.git41dabdc.fc28 none

Description udo 2018-05-26 11:30:19 UTC
Created attachment 139789 [details]
dmesg

While watching a youtube video the screen froze but the youtube audio continued for a while. The box was reachable via ssh. No restart of xorg did fix the situation.
Comment 1 udo 2018-05-26 11:40:27 UTC
Created attachment 139790 [details]
messages

no xorg log messages for this one
Comment 2 udo 2018-05-30 15:53:37 UTC
Created attachment 139863 [details]
dmesg

Another hang.
Was reading slashdot,
Comment 3 udo 2018-05-30 15:55:03 UTC
messages file has similar messages.
Comment 4 udo 2018-05-30 15:55:32 UTC
Created attachment 139864 [details]
xorg log
Comment 5 udo 2018-06-02 03:56:29 UTC
Also happens on 4.17-rc7.
Comment 6 udo 2018-06-02 03:58:18 UTC
These issues happen multiple times per day.
Not when I am away, but when I am using the PC.
Thus making the system unusable due to unreliability.
Comment 7 udo 2018-06-02 03:58:59 UTC
Also when no video activity is going on, thus bug can happen.
Comment 8 udo 2018-06-02 03:59:17 UTC
video as in youtube, vlc, xine, etc.
Comment 9 udo 2018-06-03 02:40:27 UTC
This bug also happens in amd-staging-drm-next kernel of 02-jun-2018.
Comment 10 udo 2018-06-03 02:50:17 UTC
Created attachment 139982 [details]
dmesg
Comment 11 dwagner 2018-06-03 20:58:07 UTC
The symptoms you describe sound very much the same as the ones I experience - I reported them in https://bugs.freedesktop.org/show_bug.cgi?id=102322
Comment 12 Michel Dänzer 2018-06-05 16:37:38 UTC
Per https://bugs.freedesktop.org/show_bug.cgi?id=105251#c9 , make sure you have current microcode and LLVM
Comment 13 udo 2018-06-06 02:25:30 UTC
Thanks.
For my Ryzen 5 2400g that means all vega* files from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/amdgpu ?

LLVM is at 6.0.0, it it the fedora 28 llvm.
I should switch to git llvm? (again)
Comment 14 Michel Dänzer 2018-06-06 07:08:36 UTC
(In reply to udo from comment #13)
> For my Ryzen 5 2400g that means all vega* files from
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
> tree/amdgpu ?

Mostly the raven* ones, but it doesn't hurt to grab the vega* ones as well, or indeed all of them.


> LLVM is at 6.0.0, it it the fedora 28 llvm.
> I should switch to git llvm? (again)

The referenced comment says 6.0.0 is fine. YMMV.
Comment 15 udo 2018-06-06 13:05:05 UTC
Only the file from this commit https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu/vega10_vce.bin?id=1fa9ce33895f3634398a360715331cc222d243d6 was different from the Fedora rpm linux-firmware-20180525-85.git7518922b.fc28.noarch.

I'll test with 4.16.14.
Comment 16 Sebastian Frysztak 2018-06-10 10:28:06 UTC
Hi everyone,

I have the same issue with 2400g, and I believe I have found a way to reproduce it.

Please try the following apitrace of a game called Deadly Premonition [1]. Just decompress it (it's almost 1.5 GB) and run it with 32-bit version of apitrace:

apitrace replay deadly_premonition.trace

Please let me know if it works.

[1] https://mega.nz/#!O3JSCbZI!TEYmOxpvjCosO3AhW1kvdcxdMCEkpqNTxrl9tWdDVYE
Comment 17 udo 2018-06-10 14:31:08 UTC
Latest firmwares do influence the situation in a positive way.
Comment 18 udo 2018-06-13 15:41:00 UTC
I think that, with two times 3 days of amdgpu-problem free uptime, we can say that updates firmwares do fix this issue.
Comment 19 Christian König 2018-06-14 08:10:50 UTC
Ok then let's mark this as resolved for now.
Comment 20 sam 2018-06-14 15:18:37 UTC
Created attachment 140158 [details]
dmesg w/mesa 18.0.2-1.fc28

I am still suffering from this issue, even with latest firmware from comment 15.
Kernel: 4.16.14-300.fc28.x86_64
Mesa: tested both 18.0.2-1.fc28 (current Fedora release) and 18.2.0-0.11.git41dabdc.fc28 (from che/mesa copr repo). Bug occurred on both.
Firmware: 20180525-85.git7518922b.fc28 (current Fedora release), but with vega10_vce.bin from comment 15 replacing the one installed by the package.

My graphics card is a PowerColor RX Vega 64.

I am attaching two dmesgs: one with each mentioned tested Mesa version.
Comment 21 sam 2018-06-14 15:19:46 UTC
Created attachment 140159 [details]
dmesg w/mesa 18.2.0-0.11.git41dabdc.fc28
Comment 22 Michel Dänzer 2018-06-14 15:43:35 UTC
(In reply to sam.psylo from comment #20)
> I am still suffering from this issue, even with latest firmware from comment
> 15.

Please file your own report. This report is resolved.


> Firmware: 20180525-85.git7518922b.fc28 (current Fedora release), but with
> vega10_vce.bin from comment 15 replacing the one installed by the package.

Did you double-check that's the only file which differs from the ones you have?
Comment 23 sam 2018-06-14 16:28:29 UTC
(In reply to Michel Dänzer from comment #22)
> (In reply to sam.psylo from comment #20)
> > I am still suffering from this issue, even with latest firmware from comment
> > 15.
> 
> Please file your own report. This report is resolved.

Will do. Sorry about that.

> > Firmware: 20180525-85.git7518922b.fc28 (current Fedora release), but with
> > vega10_vce.bin from comment 15 replacing the one installed by the package.
> 
> Did you double-check that's the only file which differs from the ones you
> have?

Not directly, but I did take a look at the linux-firmware git history and that was the only firmware changed after May 25 (the date that the fedora package was based on).

I'll make a new report for the bug, and continue there.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.