Bug 100437

Summary: IO_PAGE_FAULT is spammed in dmesg
Product: DRI Reporter: Christian Lanig <freedesktop>
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED NOTOURBUG QA Contact:
Severity: normal    
Priority: medium CC: j.gjorgji
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Excerpt of DMESG none

Description Christian Lanig 2017-03-28 21:18:58 UTC
Created attachment 130516 [details]
Excerpt of DMESG

Hardware:
AMD Ryzen 7 1700X
ASRock AB350 Pro4 Bios v1.40 (because 2.20 breaks ECC)
AMD RX 480
Crucial CT2K16G4XFD824A 2x16GiB ECC

OS: Ubuntu 17.04, Kernel 4.11 RC4, Padoka PPA

After installing my new motherboard and CPU I just went through the system messages to see how well it's supported or if there are some issues to be sorted out.
Unfortunately it looks like there are issues with AMDGPU, there are really massive warnings and errors.

It starts with these several times:
[    1.482118] AMD-Vi: Event logged [
[    1.482118] IO_PAGE_FAULT device=0c:00.0 domain=0x0003 address=0x000000f4001e6a00 flags=0x0010]

And ends in telling me that something is wrong with the powerplay:
[    1.526878] amdgpu: [powerplay] [AVFS] Something is broken. See log!
[    1.528706] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table!

I hope that this is the right location to report this.
Comment 1 Michel Dänzer 2017-03-29 01:46:21 UTC
The IOMMU page faults and PowerPlay issue are probably not directly related and should be tracked separately.
Comment 2 Christian Lanig 2017-03-29 08:05:47 UTC
Thanks for your answer, I have created a separate report:
https://bugs.freedesktop.org/show_bug.cgi?id=100443
Comment 3 Christian Lanig 2017-03-29 20:07:39 UTC
This issue is gone after I updated my OS - don't know if it is related. I can't reproduce it anymore, tried with several reboots.

However, the powerplay issue stays so it's definitely unrelated.
Comment 4 Gjorgji Jankovski 2017-05-07 06:15:24 UTC
This is definitely not fixed yet, at least for me.

Ryzen 7 1700
MSI B350 TOMAHAWK
RX 480

Fedora 26
Kernel: 4.11.0-1.fc26.x86_64

My error message is this:

[    2.339023] AMD-Vi: Event logged [
[    2.339024] IO_PAGE_FAULT device=23:00.0 domain=0x0003 address=0x000000f4001f0400 flags=0x0010]


Reappears a lot in dmesg.
Comment 5 Christian Lanig 2017-05-08 11:42:58 UTC
The bug is not fixed but it only happens on a cold boot for me. When I reboot the PC these messages disappear.

There also seem to be reports about comparable behavior with older AMD systems. It is perhaps a Bios bug, because something obviously works as soon as the PC isn't shut down entirely.
It seems unlikely to be related to the GPU driver because the driver should not depend on whether the computer has been turned off or just reboots.

It might still be helpful to know if your issues disappear by rebooting your machine as well.
Comment 6 Gjorgji Jankovski 2017-05-08 21:37:32 UTC
Indeed it doesn't happen on reboots and it's definitely not related to the graphics error as I'm pretty sure i had the exact one in an sandy bridge system with the same GPU.
Comment 7 Corbin 2017-05-17 14:12:37 UTC
Same error as well.

System Specs :
Gentoo Linux x86_64, Kernel 4.9.x
AMDGPU v1.2 Driver
AMD 990FX Chipset
AMD FX-9590 CPU
AMD RX480 Video Card
AMDGPU Firmware blobs compiled into the kernel.
GART IOMMU support not included in the kernel / modules.

errors copied from '/var/log/dmesg' :
----
[    0.987710] pci 0000:01:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
----
[    0.988027] PCI: CLS 64 bytes, default 64
[    0.988362] iommu: Adding device 0000:00:00.0 to group 0
[    0.988530] iommu: Adding device 0000:00:02.0 to group 1
[    0.988689] iommu: Adding device 0000:00:0a.0 to group 2
[    0.988872] iommu: Adding device 0000:00:0d.0 to group 3
[    0.989029] iommu: Adding device 0000:00:11.0 to group 4
[    0.989194] iommu: Adding device 0000:00:12.0 to group 5
[    0.989318] iommu: Adding device 0000:00:12.2 to group 5
[    0.989484] iommu: Adding device 0000:00:13.0 to group 6
[    0.989615] iommu: Adding device 0000:00:13.2 to group 6
[    0.989780] iommu: Adding device 0000:00:14.0 to group 7
[    0.989943] iommu: Adding device 0000:00:14.1 to group 8
[    0.990128] iommu: Adding device 0000:00:14.3 to group 9
[    0.990292] iommu: Adding device 0000:00:14.4 to group 10
[    0.990458] iommu: Adding device 0000:00:14.5 to group 11
[    0.990637] iommu: Adding device 0000:00:15.0 to group 12
[    0.990763] iommu: Adding device 0000:00:15.1 to group 12
[    0.990935] iommu: Adding device 0000:00:16.0 to group 13
[    0.991058] iommu: Adding device 0000:00:16.2 to group 13
[    0.991252] iommu: Adding device 0000:01:00.0 to group 14
[    0.991387] iommu: Adding device 0000:01:00.1 to group 14
[    0.991560] iommu: Adding device 0000:02:00.0 to group 15
[    0.991742] iommu: Adding device 0000:03:00.0 to group 16
[    0.991863] iommu: Adding device 0000:06:00.0 to group 12
[    0.991982] iommu: Adding device 0000:07:04.0 to group 12
[    1.063849] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[    1.063962] AMD-Vi: Interrupt remapping enabled
[    1.064145] AMD-Vi: Lazy IO/TLB flushing enabled
[    1.065331] perf: AMD NB counters detected
[    1.065622] LVT offset 0 assigned for vector 0x400
[    1.065836] perf: AMD IBS detected (0x000000ff)
----
[    7.337130] [drm] amdgpu kernel modesetting enabled.
[    7.337326] [drm] initializing kernel modesetting (POLARIS10 0x1002:0x67DF 0x1682:0x9480 0xC7).
[    7.337334] [drm] register mmio base: 0xFEA00000
[    7.337334] [drm] register mmio size: 262144
[    7.337336] [drm] doorbell mmio base: 0xD0000000
[    7.337336] [drm] doorbell mmio size: 2097152
[    7.337341] [drm] probing gen 2 caps for device 1002:5a16 = 31cd02/0
[    7.337342] [drm] probing mlw for device 1002:5a16 = 31cd02
[    7.337348] [drm] UVD is enabled in VM mode
[    7.337348] [drm] VCE enabled in VM mode
----
[    7.350857] ATOM BIOS: D00901
[    7.350863] [drm] GPU post is not needed
[    7.351266] [TTM] Zone  kernel: Available graphics memory: 16464010 kiB
[    7.351266] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[    7.351266] [TTM] Initializing pool allocator
[    7.351270] [TTM] Initializing DMA pool allocator
[    7.351281] amdgpu 0000:01:00.0: VRAM: 8192M 0x0000000000000000 - 0x00000001FFFFFFFF (8192M used)
[    7.351283] amdgpu 0000:01:00.0: GTT: 16078M 0x0000000200000000 - 0x00000005ECE227FF
[    7.351285] [drm] Detected VRAM RAM=8192M, BAR=256M
[    7.351285] [drm] RAM width 256bits GDDR5
[    7.351292] [drm] amdgpu: 8192M of VRAM memory ready
[    7.351293] [drm] amdgpu: 16078M of GTT memory ready.
[    7.351297] [drm] GART: num cpu pages 4116002, num gpu pages 4116002
[    7.352437] AMD-Vi: Event logged [
[    7.352438] IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f400100000 flags=0x0010]
[    7.352438] AMD-Vi: Event logged [
[    7.352439] IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f400101400 flags=0x0010]
[    7.352440] AMD-Vi: Event logged [
[    7.352440] IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f400100040 flags=0x0010]
[    7.352440] AMD-Vi: Event logged [
[    7.352441] IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f400100080 flags=0x0010]
[    7.352441] AMD-Vi: Event logged [
[    7.352442] IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f400100240 flags=0x0010]
[    7.352442] AMD-Vi: Event logged [
[    7.352443] IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f400100280 flags=0x0010]
[    7.352443] AMD-Vi: Event logged [
[    7.352444] IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f4001000c0 flags=0x0010]
( the errors just continue, skipped in this posting )
[    7.352866] AMD-Vi: Event logged [
[    7.352867] IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f40010ddc0 flags=0x0010]
[    7.352867] AMD-Vi: Event logged [
[    7.352868] IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f40010df80 flags=0x0010]
[    7.352868] AMD-Vi: Event logged [
[    7.352868] IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f40010de00 flags=0x0010]
[    7.352869] AMD-Vi: Event logged [
[    7.352869] IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f40010de40 flags=0x0010]
[    7.352869] AMD-Vi: Event logged [
[    7.352870] IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f40010cb40 flags=0x0010]
[    7.352870] AMD-Vi: Event logged [
[    7.352871] IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x000000f40010cbc0 flags=0x0010]
[    7.352910] [drm] PCIE GART of 16078M enabled (table at 0x0000000000040000).
[    7.352917] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    7.352918] [drm] Driver supports precise vblank timestamp query.
[    7.352947] amdgpu 0000:01:00.0: amdgpu: using MSI.
[    7.352960] [drm] amdgpu: irq initialized.
[    7.352964] Can't find requested voltage id in vdd_dep_on_sclk table!
[    7.353108] amdgpu: powerplay initialized
[    7.353428] [drm] AMDGPU Display Connectors
----
Notice that the errors start right after :
[    7.351297] [drm] GART: num cpu pages 4116002, num gpu pages 4116002
And end after :
[    7.352910] [drm] PCIE GART of 16078M enabled (table at 0x0000000000040000).

I hope more debug info helps find the problem.
Comment 8 Öyvind Saether 2017-05-18 15:43:34 UTC
Motherboard: MSI-A88X-G45-GAMING
APU: AMD A8-7600
Graphics card: MSI RX 470 8GB
Kernel: 4.11.0-2.fc26.x86_64
Screens: 3 (1080 1440 1080)

I can confirm that on cold boot I too get AMD-Vi: Event logged and IO_PAGE_FAULT device messages in my dmesg and various software using the GPU hangs and stutters (Chromium with GPU acceleration turned on, mpv etc).

This goes away when rebooting the powered on system. In both cases these appear in dmesg:

[    1.795617] amdgpu: [powerplay] [AVFS] Something is broken. See log!
[    1.797502] amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table!

It seems a bit odd that this bug has the "workaround" which is turn it on then reboot it. 

If there are some useful commands I can type in to dump something that would help or logs I should paste then feel free to let me know.
Comment 9 Greg Turner 2017-05-22 22:00:12 UTC
Just another datapoint on this...

I have this disease with a (non-reference) Asus RX480 in an Asus Sabertooth Gen3 R2.0 (or something like that, I can never remember exactly what it's called).  It does not fix itself when I reboot.

If I turn off IOMMU entirely in the BIOS (not just the memory hole), then it goes away (along, of course, with my iommu functionality, which I'm not thrilled about).

I can also boot and avoid the spam with iommu=pt or iommu=soft.

I have also seen brief "flashing screen" artifacts in xorg, which get more frequent if I stress the system (doesn't need to be graphics related).  Whether these correlate with the log spam, I'm not entirely sure.  I think it may, but it might instead have been related to a firmware loading problem that I've since resolved... I'd have to try it again to be sure.

Like the OP, when I get the log spam I also see this:

May 22 14:23:21 moneypit kernel: amdgpu: [powerplay] [AVFS] Something is broken. See log!
May 22 14:23:21 moneypit kernel: amdgpu: [powerplay] Can't find requested voltage id in vdd_dep_on_sclk table!
Comment 10 Erno K 2017-05-23 09:23:02 UTC
I started seeing AMD-Vi: Event logged and IO_PAGE_FAULT spam after updating Asus B350 Plus bios to 0609 from 0406. Before that it only showed something like Southbridge I/O Apic not found, firmware bug!
Comment 11 Christian Lanig 2017-06-12 11:31:57 UTC
It seems to have been disappeared with the newest bios with AGESA 1.0.0.6 on my mainboard.
Comment 12 Christian Lanig 2017-12-15 08:04:53 UTC
I close this issue because it was a bug in the early BIOS firmware of Ryzen motherboards.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.