Bit late with this, bisected on drm-next-4.13-wip R9 285 Tonga I am getting corrupted output from VCE encode, omx or vaapi since below commit. To re-produce this you need to use gstreamer and encode "fast and large" eg. 2160p from raw nv12. Slow things like ffmpeg or gst-vaapi without ! queue ! seem to hide the issue somewhat. 26d4ac55d2260f8685475b3f6e76e276a238cca7 is the first bad commit commit 26d4ac55d2260f8685475b3f6e76e276a238cca7 Author: Alex Deucher <alexander.deucher@amd.com> Date: Tue Nov 1 13:08:33 2016 -0400 drm/amdgpu/gmc8: use the vram location programmed by the vbios This makes mc programming much simpler in future patches. Since evergreen, the vbios has been programming the fb location to the proper vram size. The only reason to reprogram it would be to change the location.
Please attach your dmesg output. Are there any error messages in the output?
diff of (cut) dmesg-good dmesg-bad shows amongst other things < amdgpu 0000:01:00.0: VRAM: 2048M 0x0000000000000000 - 0x000000007FFFFFFF (2048M used) < amdgpu 0000:01:00.0: GTT: 3072M 0x0000000080000000 - 0x000000013FFFFFFF --- > amdgpu 0000:01:00.0: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used) > amdgpu 0000:01:00.0: GTT: 3072M 0x0000000000000000 - 0x00000000BFFFFFFF 826c826 < [drm] PCIE GART of 3072M enabled (table at 0x0000000000040000). --- > [drm] PCIE GART of 3072M enabled (table at 0x000000F400040000).
Created attachment 133608 [details] dmesg on bad
Created attachment 133609 [details] dmesg on good (commit before bad dmesg)
Created attachment 133610 [details] [review] possible fix Does this patch help?
Created attachment 133611 [details] dmesg with patch No, the encode fails differently though, throwing lots of amdgpu: The CS has been cancelled because the context is lost. and in dmesg [ 103.116736] [drm:amdgpu_vce_cs_reloc [amdgpu]] *ERROR* BO to small for addr 0x010cf1e000 156 155 This is actually familiar looking as current mesa + vaapi would do this since a patch from march. I am testing this using OMX and have never see that do it before. The issue I bisected was outputting with no errors from the encoder, a corrupt stream - it was playable and looked good to start with, it just degraded as time went on with the decoder throwing h264 errors.
Created attachment 133612 [details] [review] possible fix v2 Whoops, the original patch had a typo in it. Does this simplified version work any better?
Created attachment 133613 [details] dmesg with v2 patch No luck with v2. The errors are gone, but the original issue is the same.
Already following this Alex, but not the slightest idea either. Andy could you for a test disable multiple instance support in VCE (I need to dig through the Mesa source as well, but I think Leo asked that multiple times so you might know of hand). Apart from that I would say lets dump all the calculated addresses with good and bad and see what is different.
Disabling dual instance does avoid it.
This seems work OK on current drm-next-4.15-wip, don't know if it's luck or not yet. Perf is very slightly lower and I haven't been testing every iteration of new kernels due to testing vce stuff. There is also an unrelated to vce, powerplay/display regression on this kernel, which I'll try to find later and file a bug.
Oops the issue does still exist. I pasted the wrong command line, which also explains why it was slightly slower.
Re-reading this I notice I didn't paste the full diff between good and bad so here's a bit more - diff good bad though other rings do vary a bit in the second field, ring 12 (VCE?) is the only one that's different in the first field. < amdgpu 0000:01:00.0: fence driver on ring 12 use gpu addr 0x0000000000821f40, cpu addr 0xffffc9000364ef40 --- > amdgpu 0000:01:00.0: fence driver on ring 12 use gpu addr 0x000000f400821f40, cpu addr 0xffffc9000104ef40
OK with current 4.17-wip
Good that we finally found the root cause and thanks for testing.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.