Created attachment 144901 [details]
Model: Lenovo Ideapad S340 15"
CPU: AMD Ryzen 5 3500U
Starting with kernel 5.2, laptop has a blank display after resuming from suspend. Problem doesn't appear with recent kernels up to 5.1.16. Attached is a kernel log and git bisect logs.
Created attachment 144902 [details]
first bisect log
Created attachment 144903 [details]
second bisect log
The fact that you got two different bisection results indicates that the problem might be not 100% reproducible, and you accidentally marked some affected commits as good. Please test a given commit longer / more often before declaring it "good".
The first bisect pointed to a merge commit so the second was done to bisect within the merged commits.
(In reply to cspack from comment #4)
> The first bisect pointed to a merge commit so the second was done to bisect
> within the merged commits.
That doesn't invalidate my previous comment. :) git bisect identifying a merge commit already indicates the same thing by itself. In particular, the fact that it identified a merge commit means that you declared all of its parent commits good.
(There *are* rare cases where a problem is actually introduced by a merge commit itself, but then the second bisection should have either identified the same merge commit again (if you tested it again), or failed, because all the other commits you tested should have been good again.)
I see your point, and you are correct. It seems the issue is not 100% reproducible. I will redo the bisect and test more thoroughly. Thank you.
@cspack I am currently repeating your bisection on similar hardware, however I have found 27eaa4927dc3be669ed70670241597ac73595caf to be bad. Could you please retest that commit as well?
Created attachment 144928 [details]
Result of git bisect
Model: HP EliteBook 745 G5
CPU/GPU: AMD Ryzen 7 PRO 2700U
I completed my bisection and this is the log.
The first bad commit seems to be this one. It's actually a fairly innocent commit, so it's probably causing a bug somewhere else.
df8368be1382b442384507a5147c89978cd60702 is the first bad commit
Author: Nicholas Kazlauskas <email@example.com>
Date: Wed Feb 27 12:56:36 2019 -0500
drm/amdgpu: Bump amdgpu version for per-flip plane tiling updates
To help xf86-video-amdgpu and mesa know DC supports updating the
tiling attributes for a framebuffer per-flip.
Cc: Michel Dänzer <firstname.lastname@example.org>
Signed-off-by: Nicholas Kazlauskas <email@example.com>
Acked-by: Alex Deucher <firstname.lastname@example.org>
Reviewed-by: Marek Olšák <email@example.com>
Signed-off-by: Alex Deucher <firstname.lastname@example.org>
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Created attachment 144929 [details]
Kernel log displaying issue
Compressed here is my kernel log, which shows repeating stack traces
Interestingly the same commit is blamed for anther issue
@Samuele Yes, after redoing the bisect I got the same result as you did. Thanks.
One thing to note, the problem doesn't seem to occur for me if a compositor isn't running. In my case, after disabling compton I could not reproduce the problem.
Similar thing for me: disabling composition in Plasma makes suspend/resume work again.
Another workaround is to switch to the text terminal (Ctrl+Alt+F2) before suspending.
Occuring on Thinkpad T495, Ryzej 3700U, openSUSE Tumbleweed (kernel 5.2.2-1)
Are any other known workarounds? Perhaps some kernel options?
This commit in xf86-video-amdgpu seem to be where things break https://github.com/freedesktop/xorg-xf86-video-amdgpu/commit/a2b32e72fdaff3007a79b84929997d8176c2d512
Adding amdgpu.dc=1 to kernel options seems fix the issue for me.
Created attachment 144967 [details]
xf86-video-amdgpu git bisect
(In reply to cspack from comment #16)
> Adding amdgpu.dc=1 to kernel options seems fix the issue for me.
Presumably you mean amdgpu.dc=0 ?
Your findings indicate that the kernel driver DC code doesn't handle flipping between buffers with different tiling parameters correctly in some cases.
With amdgpu.dc=0, X doesn't start ((EE) AMDGPU(0): No modes.)
(In reply to cspack from comment #19)
> With amdgpu.dc=0, X doesn't start ((EE) AMDGPU(0): No modes.)
Right (I realized the amdgpu kernel driver doesn't support display with your GPU without DC), but amdgpu.dc=1 is the default. It was probably just luck that it worked once, which is why your first bisect attempts failed.
The default is -1 according to the docs and /sys/module/amdgpu/parameters/dc. I assume it should effectively be the same but it seems to result in different behavior vs. setting it to 1. DC is enabled in both cases (the log shows "Display Core initialized"), but setting it to default results in a suspend/resume failure 100% of the time. Whereas setting it to 1 results in success most of time, although it did fail eventually after several reboots. Very strange.
(In reply to cspack from comment #21)
> The default is -1 according to the docs and
What I meant is it's enabled by default for you, so amdgpu.dc=1 has no effect.
> I assume it should effectively be the same but it seems to result in different
> behavior vs. setting it to 1.
The different behaviour is just luck, which is why you had trouble bisecting initially, not related to amdgpu.dc=1.
amdgpu.dc=1 had no effect on my machine. On my computer resume fails quite consistently
Any idea on what should be done to fix this, or even what is the cause?
Having the same issue on a ThinkPad T495s (BIOS 1.06) with a Ryzen 7 PRO 3700U, Kernel 5.2.8-arch1-1-ARCH, Mesa 19.1.4-1 and running sway (wayland) as a window manager.
dmesg shows me:
[drm] Fence fallback timer expired on ring sdma0
amdgpu 0000:05:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110).
[drm:amdgpu_device_ip_late_init_func_handler [amdgpu]] *ERROR* ib ring test failed (-110).
One thing to note is that setting amd_iommu=off as a kernel parameter makes this issue really rare but it'll still sometimes happen, maybe it's also just luck.
(In reply to miba_c from comment #24)
> Having the same issue on a ThinkPad T495s (BIOS 1.06) with a Ryzen 7 PRO
> 3700U, Kernel 5.2.8-arch1-1-ARCH, Mesa 19.1.4-1 and running sway (wayland)
> as a window manager.
> dmesg shows me:
> [drm] Fence fallback timer expired on ring sdma0
> amdgpu 0000:05:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test
> failed on gfx (-110).
> [drm:amdgpu_device_ip_late_init_func_handler [amdgpu]] *ERROR* ib ring test
> failed (-110).
> One thing to note is that setting amd_iommu=off as a kernel parameter makes
> this issue really rare but it'll still sometimes happen, maybe it's also
> just luck.
Please attach full log, also it looks log.
Created attachment 145065 [details]
failed suspend log
Attached full log
fwiw downgrading to 5.1.16 seems to fix the issue here too