Created attachment 143905 [details]
The error log
We have an Acer Squirtle_SR laptop equipped with AMD A9-9420e RADEON R5, 5 COMPUTE CORES 2C+3G and [AMD/ATI] Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445] [1002:6900]. We test it with Linux kernel 5.1.0-rc4. The system hits the following error and makes system hang up:
Apr 09 11:28:57 endless kernel: AMD-Vi: Completion-Wait loop timed out
Apr 09 11:28:57 endless kernel: iwlwifi 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0xff814000 flags=0x0050]
The worst case is the disk's block may be disrupted, then we have to re-install the system if it cannot be recovered by fsck.
If we blacklist the amdgpu module, then system will not hit the error. But system has no GUI, and only shows console.
If iommu=soft is appended to the boot command, system works fine.
The [AMD/ATI] Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445] [1002:6900]
01:00.0 Display controller : Advanced Micro Devices, Inc. [AMD/ATI] Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445] [1002:6900] (rev c3)
Subsystem: Acer Incorporated [ALI] Topaz XT [Radeon R7 M260/M265 / M340/M360 / M440/M445] [1025:1217]
Physical Slot: 0
Flags: bus master, fast devsel, latency 0, IRQ 44
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=2M]
I/O ports at 3000 [size=256]
Memory at d1400000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at d1440000 [disabled] [size=128K]
Capabilities:  Vendor Specific Information: Len=08 <?>
Capabilities:  Power Management version 3
Capabilities:  Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities:  Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities:  Advanced Error Reporting
Capabilities:  #19
Capabilities: [2b0] Address Translation Service (ATS)
Capabilities: [2c0] Page Request Interface (PRI)
Capabilities: [2d0] Process Address Space ID (PASID)
Kernel driver in use: amdgpu
Kernel modules: amdgpu
Does booting with amdgpu.runpm=0 on the kernel command line in grub help?
Created attachment 143916 [details]
The dmesg of disabled amdgpu's runpm
(In reply to Alex Deucher from comment #2)
> Does booting with amdgpu.runpm=0 on the kernel command line in grub help?
System boots correctly with amdgpu.runpm=0 on the kernel command line.
Created attachment 143930 [details]
The dmesg of disabled pci ats
Also tested with 'pci=noats' on boot command which is mentioned in https://bugzilla.kernel.org/show_bug.cgi?id=194521#c24
System also boots fine.
Please attach the output of lspci -vnn
Created attachment 143946 [details]
Any thing else I can help more? Test or need more information, log? :)
(In reply to Alex Deucher from comment #8)
I think it's actually a problem with runtime pm and some pci state. I may ask you to help debug that when I get a chance.
Thanks Alex. We will have to return this unit to the vendor at some point, but we will try to hold onto it for another month so that we can run any tests you request.
Alternatively, we may be able to get an affected unit shipped to you on a 1-month loan. Would that be useful?