Summary: | [amd-staging-drm-next] SDDM screen corruption (not usable) with RX580, amdgpu, dc=1 (of course), regression - [bisected] | ||
---|---|---|---|
Product: | DRI | Reporter: | Dieter Nützel <Dieter> |
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> |
Status: | CLOSED FIXED | QA Contact: | |
Severity: | critical | ||
Priority: | medium | CC: | alexdeucher, andrey.grodzovsky, ckoenig.leichtzumerken, sndirsch |
Version: | unspecified | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
Dieter Nützel
2018-09-29 03:38:40 UTC
Created attachment 141787 [details]
SDDM corruption 4.18.0-rc1 (with some recognizable parts)
Created attachment 141788 [details]
SDDM corruption 4.19.0-rc1
Created attachment 141789 [details]
dmesg-4.18.0-rc1-1.g7262353-default+.log
Created attachment 141790 [details]
dmesg-4.18.0-rc1-1.g7262353-default+.log3
Created attachment 141791 [details]
dmesg-4.19.0-rc1-1.g7262353-default+.log-25.10
Created attachment 141792 [details]
dmesg-4.19.0-rc1-1.g7262353-default+.log-25.09
Created attachment 141793 [details]
Xorg.0.log.25.09
(In reply to Dieter Nützel from comment #6) > Created attachment 141792 [details] > dmesg-4.19.0-rc1-1.g7262353-default+.log-25.09 Hi, could this be a hint? [ 6.716492] [drm] Fence fallback timer expired on ring kiq_2.1.0 Should I send a log with 'amdgpu.dc_log=1 drm.debug=6'? *** Bug 108533 has been marked as a duplicate of this bug. *** You have plenty of display managers and desktops in the Linux world. KDE stuff is slow, buggy and uses a lot of hardware resources. The Xfce desktop with lightdm and the Whisker menu is stable, fast, light and freely configurable. I have had never desktop problems with the amdgpu driver in four years of using it. I have RX560 and the mainline kernel 4.19.0 from kernel.org works fine. Use Mesa git too, like Oibaf ppa Mesa. Can you bisect 4.19? (In reply to Alex Deucher from comment #11) > Can you bisect 4.19? Well, I'll try that, too. I'm currently trying amd-staging-drm-next, again. Have some trouble with 4.19 final on my main home server (32 bit, pae), too. Now, I'm back to 4.18.16 on all systems. Maybe tomorrow I have some results. Thanks, Alex! DONE - amd-staging-drm-next 964d0fbf6301d3dc8dfad19ffab5a06d002d27f1 is the first bad commit commit 964d0fbf6301d3dc8dfad19ffab5a06d002d27f1 Author: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Date: Fri Jul 6 14:16:54 2018 -0400 drm/amdgpu: Allow to create BO lists in CS ioctl v3 This change is to support MESA performace optimization. Modify CS IOCTL to allow its input as command buffer and an array of buffer handles to create a temporay bo list and then destroy it when IOCTL completes. This saves on calling for BO_LIST create and destry IOCTLs in MESA and by this improves performance. v2: Avoid inserting the temp list into idr struct. v3: Remove idr alloation from amdgpu_bo_list_create. Remove useless argument from amdgpu_cs_parser_fini Minor cosmetic stuff. v4: Revert amdgpu_bo_list_destroy back to static Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> :040000 040000 d621cc2fb523ffcaa877faa8ae682878c268478e 6a758c959d69df05023339fa981b067fa027875c M drivers :040000 040000 7f32e65fd49cb9305b5c7440b161771f429aad09 9137d92f44e0b34512b3104141412595da64ce96 M include But 'git revert 964d0fbf6301' do NOT work on amd-staging-drm-next: error: Konnte "revert" nicht auf 964d0fbf6301... (drm/amdgpu: Allow to create BO lists in CS ioctl v3) ausführen Hinweis: nach Auflösung der Konflikte markieren Sie die korrigierten Pfade Hinweis: mit 'git add <Pfade>' oder 'git rm <Pfade>' und tragen Sie das Ergebnis mit Hinweis: 'git commit' ein SOURCE/amd-staging-drm-next> git status Auf Branch amd-staging-drm-next Ihr Branch ist auf demselben Stand wie 'origin/amd-staging-drm-next'. Sie sind gerade an einem Revert von Commit '964d0fbf6301'. (beheben Sie die Konflikte und führen Sie dann "git revert --continue" aus) (benutzen Sie "git revert --abort", um die Revert-Operation abzubrechen) zum Commit vorgemerkte Änderungen: (benutzen Sie "git reset HEAD <Datei>..." zum Entfernen aus der Staging-Area) geändert: drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c geändert: include/uapi/drm/amdgpu_drm.h Nicht zusammengeführte Pfade: (benutzen Sie "git reset HEAD <Datei>..." zum Entfernen aus der Staging-Area) (benutzen Sie "git add/rm <Datei>...", um die Auflösung zu markieren) von beiden geändert: drivers/gpu/drm/amd/amdgpu/amdgpu.h von beiden geändert: drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c von beiden geändert: drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c Which Git commit of Mesa are you using? Any local patches on top? (In reply to Michel Dänzer from comment #14) > Which Git commit of Mesa are you using? Any local patches on top? Every, since ever, as always (even, since _before_ Aug 22, 2018) ...;-) But kidding aside, _currently_ #0ff1ccca25 (with merged branch from Marek for testing purposes) 04ba4eae68 (HEAD -> ext_gpu_shader4) Merge branch 'ext_gpu_shader4' of git://people.freedesktop.org/~mareko/mesa into ext_gpu_shader4 0ff1ccca25 (origin/master, origin/HEAD, master) radv: call nir_link_xfb_varyings() Has it something to do with the DRM version? DRM 3.26.0 (4.18) vs. DRM 3.27.0 (4.19)? [-] diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 06aede1..529500c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -69,9 +69,10 @@ * - 3.24.0 - Add high priority compute support for gfx9 * - 3.25.0 - Add support for sensor query info (stable pstate sclk/mclk). * - 3.26.0 - GFX9: Process AMDGPU_IB_FLAG_TC_WB_NOT_INVALIDATE. + * - 3.27.0 - Add new chunk to to AMDGPU_CS to enable BO_LIST creation. */ #define KMS_DRIVER_MAJOR 3 -#define KMS_DRIVER_MINOR 26 +#define KMS_DRIVER_MINOR 27 #define KMS_DRIVER_PATCHLEVEL 0 [-] But this commit is _in_ since author Andrey Grodzovsky <andrey.grodzovsky@amd.com> 2018-07-06 14:16:54 -0400 committer Alex Deucher <alexander.deucher@amd.com> 2018-07-16 15:29:47 -0500 and I had it running (on stable and amd-staging-drm-next, daily), even with AMD testing code (Huang Rui ray.huang at amd.com), Aug 16, 2018. https://lists.freedesktop.org/archives/amd-gfx/2018-August/025411.html Do you need more logs? With which kernel parameter? System _is_ running, but with unusable gfx/dri screen. Question is do you have " winsys/amdgpu: pass the BO list via the CS ioctl on DRM >= 3.27.0" commit in your MESA tree ? I am not clear on that. (In reply to Andrey Grodzovsky from comment #16) > Question is do you have " winsys/amdgpu: pass the BO list via the CS ioctl > on DRM >= 3.27.0" commit in your MESA tree ? I am not clear on that. 461a864316 winsys/amdgpu: pass the BO list via the CS ioctl on DRM >= 3.27.0 commit 461a864316d5b70ea99c9e1dba7d71973af2aacc Author: Marek Olšák <marek.olsak@amd.com> Date: Thu Jul 12 00:50:52 2018 -0400 winsys/amdgpu: pass the BO list via the CS ioctl on DRM >= 3.27.0 Any other ideas? Thank you Andrey for looking into, it! Andrey, can you work with Dieter to figure out where the error is coming from? E.g. by attaching patches adding debugging printks. (In reply to Michel Dänzer from comment #18) > Andrey, can you work with Dieter to figure out where the error is coming > from? E.g. by attaching patches adding debugging printks. Yes, i will look into it. Make sure to load the right version of libdrm (ie. 2.4.93 or more recent). I had this problem today because I was loading and old version of libdrm. Something was installed in the wrong place. Please load the driver in debug mode so I can see the error code value in dmesg - when loading the kernel add drm.debug=0xff Also to trace where exactly the error originated from please install trace-cmd and beore starting X (assuming you get the failure and the dmesg error right on start) sudo trace-cmd start -p function_graph -l amdgpu_cs_ioctl and get the output from /sys/kernel/debug/tracing/trace (In reply to Andrey Grodzovsky from comment #21) > Please load the driver in debug mode so I can see the error code value in > dmesg - > when loading the kernel add drm.debug=0xff > > Also to trace where exactly the error originated from please install > trace-cmd and beore starting X (assuming you get the failure and the dmesg > error right on start) > sudo trace-cmd start -p function_graph -l amdgpu_cs_ioctl > and get the output from /sys/kernel/debug/tracing/trace My bad, the correct command is sudo trace-cmd start -p function_graph -g amdgpu_cs_ioctl (In reply to Samuel Pitoiset from comment #20) > Make sure to load the right version of libdrm (ie. 2.4.93 or more recent). I > had this problem today because I was loading and old version of libdrm. > Something was installed in the wrong place. Doh! Sorry! Ugh. Development systems... My latest AMDGPU-PRO OpenCL (amdgpu-pro-18.30-635379-sle-12.tar.xz) installation broad me /opt/amdgpu/lib64/ insgesamt 268 drwxr-xr-x 2 root root 4096 18. Aug 06:31 . drwxr-xr-x 4 root root 4096 18. Aug 06:31 .. lrwxrwxrwx 1 root root 22 8. Aug 18:33 libdrm_amdgpu.so.1 -> libdrm_amdgpu.so.1.0.0 -rwxr-xr-x 1 root root 69192 8. Aug 18:33 libdrm_amdgpu.so.1.0.0 lrwxrwxrwx 1 root root 22 8. Aug 18:33 libdrm_radeon.so.1 -> libdrm_radeon.so.1.0.1 -rwxr-xr-x 1 root root 68968 8. Aug 18:33 libdrm_radeon.so.1.0.1 lrwxrwxrwx 1 root root 15 8. Aug 18:33 libdrm.so.2 -> libdrm.so.2.4.0 -rwxr-xr-x 1 root root 99600 8. Aug 18:33 libdrm.so.2.4.0 lrwxrwxrwx 1 root root 15 8. Aug 18:33 libkms.so.1 -> libkms.so.1.0.0 -rwxr-xr-x 1 root root 22096 8. Aug 18:33 libkms.so.1.0.0 but the screen corruption appeared first around Aug 22, 2018. So it worked 'halfway' with upstream _and_ AMDGPU-PRO libdrm. Maybe the AMD developers could include a 'tag' or something like that to differentiate both version?! CONCLUSION After deleting /opt/amdgpu/lib64/ ALL is fine, again. (In reply to Andrey Grodzovsky from comment #22) > (In reply to Andrey Grodzovsky from comment #21) > > Please load the driver in debug mode so I can see the error code value in > > dmesg - > > when loading the kernel add drm.debug=0xff > > > > Also to trace where exactly the error originated from please install > > trace-cmd and beore starting X (assuming you get the failure and the dmesg > > error right on start) > > sudo trace-cmd start -p function_graph -l amdgpu_cs_ioctl > > and get the output from /sys/kernel/debug/tracing/trace > > My bad, the correct command is > sudo trace-cmd start -p function_graph -g amdgpu_cs_ioctl Andrey, do you need these logs even after my commit #23? (In reply to Dieter Nützel from comment #24) > (In reply to Andrey Grodzovsky from comment #22) > > (In reply to Andrey Grodzovsky from comment #21) > > > Please load the driver in debug mode so I can see the error code value in > > > dmesg - > > > when loading the kernel add drm.debug=0xff > > > > > > Also to trace where exactly the error originated from please install > > > trace-cmd and beore starting X (assuming you get the failure and the dmesg > > > error right on start) > > > sudo trace-cmd start -p function_graph -l amdgpu_cs_ioctl > > > and get the output from /sys/kernel/debug/tracing/trace > > > > My bad, the correct command is > > sudo trace-cmd start -p function_graph -g amdgpu_cs_ioctl > > Andrey, do you need these logs even after my commit #23? If everything is working fine after whatever you did then no. Fixed with winsys/amdgpu: Stop using amdgpu_bo_handle_type_kms_noimport It only behaves any different from amdgpu_bo_handle_type_kms with libdrm 2.4.93, and it breaks if an older version is picked up. https://cgit.freedesktop.org/mesa/mesa/commit/?id=32b0eb51a310ef3d6605cdb31c70a10202463e6d |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.