Using recent AMDGPU source code and recent firmware files for the E9260 GPU (Polaris11, deviceID:67E8h) the GPU fails to initialise correctly. Debugging shows that the RLC times out entering safe mode, further debugging shows this is because the SMU has uploaded garbage firmware. I've had a pcie interposer/bus analyser inline with this GPU and can see that the issue is that GTT addresses are not being translated. When the SMU request to load firmware is sent the PCIe read operations contain the GPU MC addresses, not the system page addresses as they have been programmed within the gart table. If we shunt the firmware bo and the SMUs firmware header allocations into VRAM rather than the GTT the firmware is correctly uploaded; the net result is that GFX ring tests then fail with similar results; The GPU can then be observed attempting to read from the GFX ring but the addresses used across the PCIe bus are the MC ring addresses, not the system page addresses.
(In reply to john.alexander from comment #0) > Using recent AMDGPU source code [...] Which commit of which branch exactly? If it's a regression, can you bisect? Please attach the corresponding output of dmesg.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/651.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.