Hi, I have an OpenCL program which causes a steady stream of errors: kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x03f84801 kernel: VM fault (0x01, vmid 6) at page 135290494, read from 'TC7' (0x54433700) (132) kernel: amdgpu 0000:05:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C084001 kernel: amdgpu 0000:05:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x08105E7E kernel: amdgpu 0000:05:00.0: GPU fault detected: 147 0x03f08401 kernel: VM fault (0x01, vmid 3) at page 135290494, read from 'TC2' (0x54433200) (200) kernel: amdgpu 0000:04:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x060C8001 kernel: amdgpu 0000:04:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x08105E7E kernel: amdgpu 0000:04:00.0: GPU fault detected: 147 0x03f0c801 kernel: VM fault (0x01, vmid 6) at page 135290495, read from 'TC1' (0x54433100) (4) kernel: amdgpu 0000:05:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C004001 kernel: amdgpu 0000:05:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x08105E7F AMDGPU-Pro: 16.30.3.306809 Kernel: 4.7.0-rc6-mainline GPU: RX480
The program in question is "Claymore's Dual Ethereum+Decred/Siacoin GPU Miner v5.0 (Windows/Linux)": https://bitcointalk.org/index.php?topic=1433925.0 I have it running on 5 computers on 8 cards total and some single GPU setups hit it only sporadically, I have a dual GPU system where the program is basically unusable, but in general the program still works despite some slow down due to thousands of errors being generated and printed.
Also should mention that the same program works on Fiji/Hawaii hardware with catalyst/fglrx 15.12 and from various reports seems to work error-free on Windows.
This seems to be happening with mining. Different mining software creates the same error logs. Jolan Luff is mining with Claymore's, I'm using ethminer. On this forum thread is "langxxl" who is also using ethminer and getting the same errors as myself. https://forum.ethereum.org/discussion/8250/ubuntu-16-04-lts-rx-480-mining-ethereum-confirmed-working? It still hashes, just not very well with an average of 16Mh/s when it should be around 25Mh/s. ASUS 8GB RX480 Ubuntu 16.04 LTS AMDGPU-PRO v16.3 parity v1.2.2-beta ethminer v1.2.9 [ 1084.587016] VM fault (0x01, vmid 3) at page 47326788, read from 'TC0' (0x54433000) (8) [ 1084.587016] amdgpu 0000:07:00.0: GPU fault detected: 147 0x05708801 [ 1084.587016] amdgpu 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00E0983C [ 1084.587016] amdgpu 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x06088002 [ 1084.587016] VM fault (0x02, vmid 3) at page 14719036, read from 'TC6' (0x54433600) (136) [ 1084.587016] amdgpu 0000:07:00.0: GPU fault detected: 147 0x0d78c401 [ 1084.587016] amdgpu 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x03B9780A [ 1084.587016] amdgpu 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x060C4001 [ 1084.587016] VM fault (0x01, vmid 3) at page 62486538, read from 'TC3' (0x54433300) (196) [ 1084.587016] amdgpu 0000:07:00.0: GPU fault detected: 147 0x0c88c801 [ 1084.587016] amdgpu 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x016D8DF6 [ 1084.587016] amdgpu 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x06044001 [ 1084.587016] VM fault (0x01, vmid 3) at page 23956982, read from 'TC5' (0x54433500) (68) [ 1084.587016] amdgpu 0000:07:00.0: GPU fault detected: 147 0x09e08801 [ 1084.587016] amdgpu 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x006F7D1B [ 1084.587016] amdgpu 0000:07:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x060C4002 etc. etc...
Is this still happening with the 16.50 driver ?
I haven't tested with anything newer than 16.30.x yet. (I use Arch Linux and the unofficial package hasn't been updated) I didn't see any mention of OpenCL fixes in the changelog so I haven't tried to update myself. I did just check and it looks like 16.50.X may be coming soon. Will report back if no one else beats me to it.
Still happens for me with 16.50 Ubuntu 16.04.1 Radeon R9 380 claymore dualminer exactly same pc/setup works fine with an RX 470 [ 211.556980] VM fault (0x01, vmid 5) at page 135286044, read from 'TC2' (0x54433200) (0) [ 211.557252] amdgpu 0000:02:00.0: GPU fault detected: 147 0x08e0c001 [ 211.557253] amdgpu 0000:02:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x08104D1C [ 211.557253] amdgpu 0000:02:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A0C0001 [ 211.557254] VM fault (0x01, vmid 5) at page 135286044, read from 'TC5' (0x54433500) (192) [ 211.557257] amdgpu 0000:02:00.0: GPU fault detected: 147 0x08e84401 [ 211.557257] amdgpu 0000:02:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x08104D1D [ 211.557258] amdgpu 0000:02:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A044001
I don't think this is related to amdgpu-pro at all. I am getting the very same error when running the ethminer via Mesa/Clover. > [OpenCL] Device: AMD Radeon R9 380 Series (AMD TONGA / DRM 3.15.0 / 4.12.0, LLVM 3.9.1) / OpenCL 1.1 Mesa 17.2.0-devel (git-038c45a40e) > [ 50.993264] amdgpu 0000:01:00.0: GPU fault detected: 147 0x02e8c001 > [ 50.993264] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x03ABDF7A > [ 50.993264] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A080001 > [ 50.993265] amdgpu 0000:01:00.0: VM fault (0x01, vmid 5) at page 61595514, read from 'TC11' (0x54433131) (128) > [ 50.993267] amdgpu 0000:01:00.0: GPU fault detected: 147 0x04200001 > [ 50.993268] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x025E46F9 > [ 50.993268] amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A008001 > [ 50.993269] amdgpu 0000:01:00.0: VM fault (0x01, vmid 5) at page 39732985, read from 'TC0' (0x54433000) (8) > [ 50.994135] amdgpu 0000:01:00.0: IH ring buffer overflow (0x000C7820, 0x0000BE00, 0x00007830) I ran the following code (mesa-compatible version) https://github.com/EoD/ethminer/tree/fix_mesa_compilation with these parameters (demo mode) ./ethminer -G -Z
I'm getting the same output when running vertminer with amdgpu + OpenCL lib from amdgpu-pro ("Unsupported" setup as mixing binary with open source amdgpu). I believe the issue is in the kernel driver, as that's the common link between all these setups. >[ 1228.836110] amdgpu 0000:03:00.0: GPU fault detected: 147 0x09460402 >[ 1228.836110] amdgpu 0000:03:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001F0B43 >[ 1228.836111] amdgpu 0000:03:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x06008002 >[ 1228.836112] amdgpu 0000:03:00.0: VM fault (0x02, vmid 3) at page 2034499, read from '' (0x00000000) (8) >[ 1228.836122] amdgpu 0000:03:00.0: GPU fault detected: 147 0x0fe60402 >[ 1228.836123] amdgpu 0000:03:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 >[ 1228.836123] amdgpu 0000:03:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x06008002 >[ 1228.836124] amdgpu 0000:03:00.0: VM fault (0x02, vmid 3) at page 0, read from '' (0x00000000) (8)
It's not just miners that can cause it. I get similar messages when running Luxmark and the LuxVR mode. In my case, the device is a Radeon Pro WX 4100 ("Baffin"). Once these errors occur, all sorts of OpenGL applications freeze, including Chrome tabs. After that, it seems like I can only fix it by rebooting.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/8.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.