Bug 106940 - Black screen on KMS with 4.18.0-rc1 with Kaveri+Topaz, amdgpu, dc=1
Summary: Black screen on KMS with 4.18.0-rc1 with Kaveri+Topaz, amdgpu, dc=1
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium blocker
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-17 11:47 UTC by SET
Modified: 2019-11-19 08:41 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Bad .config (191.56 KB, text/plain)
2018-06-18 11:10 UTC, SET
no flags Details
Bad dmesg (83.54 KB, text/plain)
2018-06-18 11:11 UTC, SET
no flags Details
Bad Xorg log (32.24 KB, text/plain)
2018-06-18 11:12 UTC, SET
no flags Details
Good .config (189.89 KB, text/plain)
2018-06-18 11:13 UTC, SET
no flags Details
Good dmesg (83.42 KB, text/plain)
2018-06-18 11:13 UTC, SET
no flags Details
Good Xorg log (32.81 KB, text/x-log)
2018-06-18 11:13 UTC, SET
no flags Details
Good and bad git commits (95 bytes, text/plain)
2018-06-18 11:14 UTC, SET
no flags Details
Bisection results (2.47 KB, text/plain)
2018-06-20 21:00 UTC, SET
no flags Details
patch to test (650 bytes, patch)
2018-08-15 18:22 UTC, Alex Deucher
no flags Details | Splinter Review
dmesg output after patch (4.76 KB, text/plain)
2018-08-15 19:55 UTC, SET
no flags Details
kernel ooops with attached patch on amd-staging-drm-next branch. (7.70 KB, text/plain)
2018-08-16 17:55 UTC, Przemek
no flags Details
disable eDP optimization on DCE8 (1.35 KB, patch)
2018-08-16 20:37 UTC, Alex Deucher
no flags Details | Splinter Review

Description SET 2018-06-17 11:47:23 UTC
Since kernel 4.18.0-rc1, my laptop boots in a black screen with Kaveri+Topaz GPUs, Xorg 1.20.0. The systems boots normally in the background. I can log in blindly in sddm, press CTRL+ALT+DEL when KDE has booted invisibly and shutdown the laptop with ENTER. I'm using the following options :

install radeon /bin/false
options amdgpu dc=1 si_support=1 cik_support=1
options radeon si_support=0 cik_support=0

Everything was just working until 4.18.0-rc1.

If I boot on another partition, I can inspect Xorg.0.log of the main partition, which contains no errors.

With no options at all, or if dc=0, I can get a KDE usable session, excepting suspend failing quite often.

Thanks for any input to resolve this. I can give more information upon your instructions.
Comment 1 SET 2018-06-17 11:52:44 UTC
This happens on both early and late KMS (Arch Linux).
Comment 2 Michel Dänzer 2018-06-18 08:30:16 UTC
Can you bisect?
Comment 3 Michel Dänzer 2018-06-18 08:47:43 UTC
Please attach the following for 4.18.0-rc1 and the last working kernel: The dmesg output, the .config file and the Xorg log file.
Comment 4 SET 2018-06-18 11:10:56 UTC
Created attachment 140195 [details]
Bad .config
Comment 5 SET 2018-06-18 11:11:54 UTC
Created attachment 140197 [details]
Bad dmesg
Comment 6 SET 2018-06-18 11:12:29 UTC
Created attachment 140198 [details]
Bad Xorg log
Comment 7 SET 2018-06-18 11:13:05 UTC
Created attachment 140199 [details]
Good .config
Comment 8 SET 2018-06-18 11:13:32 UTC
Created attachment 140200 [details]
Good dmesg
Comment 9 SET 2018-06-18 11:13:57 UTC
Created attachment 140201 [details]
Good Xorg log
Comment 10 SET 2018-06-18 11:14:33 UTC
Created attachment 140202 [details]
Good and bad git commits
Comment 11 SET 2018-06-18 14:23:48 UTC
I could bisect incompletely, until this error happens: 

../lib/str_error_r.c:25:3: error: passing argument 1 to restrict-qualified parameter aliases with argument 5 [-Werror=restrict]

So far :

bad : ce397d215ccd07b8ae3f71db689aedb85d56ab40
bad : 1c8c5a9d38f607c0b6fd12c91cbe1a4418762a21
bad : 135c5504a600ff9b06e321694fbcac78a9530cd4
bad : 315852b422972e6ebb1dfddaadada09e46a2681a

testing : 13b75aac5dd9a6448417769c43d21b2343ce1cc8 - can't compile

good : 6f2db7dc901a1b89fbc50f7b38f0f7ee17205703
good : e71a82d8c1fa28ab048227df929e4f07d98f1656
good : 5231804cf9e584f3e7e763a0d6d2fffe011c1bce
good : 29dcea88779c856c7dc92040a0c01233263101d4

In all the good commits, CONFIG_DRM_AMD_DC_PRE_VEGA was set to y. In all the bad commits, CONFIG_DRM_AMD_DC_PRE_VEGA is no longer an avilable option.
Comment 12 Felix Schwarz 2018-06-18 15:31:30 UTC
use "gi(In reply to SET from comment #11)
> I could bisect incompletely, until this error happens: 
> 
> ../lib/str_error_r.c:25:3: error: passing argument 1 to restrict-qualified
> parameter aliases with argument 5 [-Werror=restrict]

use "git bisect skip" for that version.
Comment 13 SET 2018-06-19 20:50:09 UTC
I'm giving up on bisecting, it's getting out of control, sorry.
Comment 14 Michel Dänzer 2018-06-20 07:50:18 UTC
(In reply to SET from comment #13)
> I'm giving up on bisecting, it's getting out of control, sorry.

What's the problem?
Comment 15 SET 2018-06-20 08:39:25 UTC
After narrowing good and bad commits between june 4th and 6th, it started to test commits in april, then december. Some commits don't compile and must be skipped. This is going endless, can't continue.
Comment 16 Michel Dänzer 2018-06-20 09:33:44 UTC
(In reply to SET from comment #15)
> After narrowing good and bad commits between june 4th and 6th, it started to
> test commits in april, then december.

That's normal, due to the non-linear Git history of the Linux kernel.

> Some commits don't compile and must be skipped. This is going endless, can't
> continue.

It will finish eventually.

Can you attach the current output of

 git bisect log

and tell us the commit it wants to test next?
Comment 17 SET 2018-06-20 12:35:29 UTC
This is where I'm up to :

git bisect start
# bad: [135c5504a600ff9b06e321694fbcac78a9530cd4] Merge tag 'drm-next-2018-06-06-1' of git://anongit.freedesktop.org/drm/drm
git bisect bad 135c5504a600ff9b06e321694fbcac78a9530cd4
# good: [c76f0b2cc2f1be1a8a20f0fe2c0f30919bc559fb] Merge tag 'drm-amdkfd-next-2018-05-14' of git://people.freedesktop.org/~gabbayo/linux into drm-next
git bisect good c76f0b2cc2f1be1a8a20f0fe2c0f30919bc559fb
# good: [92400b8c8b42e53abb0fcb4ac75cb85d4177a891] Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 92400b8c8b42e53abb0fcb4ac75cb85d4177a891
# good: [07c4dd3435aa387d3b58f4e941dc516513f14507] Merge tag 'usb-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
git bisect good 07c4dd3435aa387d3b58f4e941dc516513f14507
# bad: [cac18c82e0c5b39b69648942576dbd1d6f9d056e] drm/amdgpu: Specify vega20 uvd firmware
git bisect bad cac18c82e0c5b39b69648942576dbd1d6f9d056e
# bad: [cac18c82e0c5b39b69648942576dbd1d6f9d056e] drm/amdgpu: Specify vega20 uvd firmware
git bisect bad cac18c82e0c5b39b69648942576dbd1d6f9d056e
# bad: [cac18c82e0c5b39b69648942576dbd1d6f9d056e] drm/amdgpu: Specify vega20 uvd firmware
git bisect bad cac18c82e0c5b39b69648942576dbd1d6f9d056e
# bad: [cac18c82e0c5b39b69648942576dbd1d6f9d056e] drm/amdgpu: Specify vega20 uvd firmware
git bisect bad cac18c82e0c5b39b69648942576dbd1d6f9d056e
# skip: [7ab3fdde04218c4733e96712b651751c413d51c3] drm/amd/display: Update MST edid property every time
git bisect skip 7ab3fdde04218c4733e96712b651751c413d51c3
# skip: [e930793280799e66c3197e2ee6e70b1129f8aa12] drm/amdgpu: add VEGAM pci ids
git bisect skip e930793280799e66c3197e2ee6e70b1129f8aa12
# good: [53f071e19d566e7d0a4eada1bd8313a4cdb660a4] Merge drm/drm-next into drm-intel-next-queued
git bisect good 53f071e19d566e7d0a4eada1bd8313a4cdb660a4
# good: [53f071e19d566e7d0a4eada1bd8313a4cdb660a4] Merge drm/drm-next into drm-intel-next-queued
git bisect good 53f071e19d566e7d0a4eada1bd8313a4cdb660a4
# skip: [ba8f7ad0e5b25851299cd45a63b57d843e50b577] drm/amdgpu: add VEGAM UVD firmware support
git bisect skip ba8f7ad0e5b25851299cd45a63b57d843e50b577
# skip: [d10fb4a6f382474025f326bf90ee3b64396486ea] drm/amd/pp: Change pstate_clk frequency unit to 10KHz on Rv
git bisect skip d10fb4a6f382474025f326bf90ee3b64396486ea

I'll keep trying, hoping it ends before it is fixed by some other changes.
Comment 18 Michel Dänzer 2018-06-20 15:12:53 UTC
What's the compile error you're getting with the current commit?
Comment 19 SET 2018-06-20 17:11:43 UTC
It's the same error for every commit I had to skip :

../lib/str_error_r.c:25:3: error: passing argument 1 to restrict-qualified parameter aliases with argument 5 [-Werror=restrict]
   snprintf(buf, buflen, "INTERNAL ERROR: strerror_r(%d, %p, %zd)=%d", errnum, buf, buflen, err);
   ^~~~~~~~
cc1 : all warnings are treated as errors (this line is translated by me)
Comment 20 Alex Deucher 2018-06-20 17:25:24 UTC
It's a bug in new gcc.  You can apply this patch as a workaround:
https://github.com/torvalds/linux/commit/854e55ad289ef8888e7991f0ada85d5846f5afb9#diff-0b8e91d818ef68ac30763b79d9fabbad
Comment 21 SET 2018-06-20 20:59:40 UTC
After applying the patch, bisection move on reasonably. See the attached file bisect.log.
Comment 22 SET 2018-06-20 21:00:38 UTC
Created attachment 140251 [details]
Bisection results
Comment 23 Server Angels 2018-06-26 16:44:40 UTC
Just to add - I had the same experience with rc1 on a Vega / Ryzen system. System is responsive over the network but black screen. 

This is with Fedora and rawhide kernels. 

What exact debug would be useful apart from what OP provided?
Comment 24 Alex Deucher 2018-06-26 17:23:16 UTC
first bad commit: [f0c0761b38ac30b04d4fed436ff10e894ec0e525] drm/amd/display: Use dig enable to determine fast boot optimization.
Comment 25 Server Angels 2018-07-01 18:46:05 UTC
FYI this bug is still present in rc2.
Comment 26 Server Angels 2018-08-06 10:08:09 UTC
This is still not fixed with rc7 / Fedora Rawhide kernel. Is there any chance this patch could be reverted, as it has made 4.18 unusable with Vega10?
Comment 27 Przemek 2018-08-13 12:26:39 UTC
Now it is in stable 4.18.
Same situation here. Gentoo ~amd64 on Lenovo G50-45 A6-6310 APU with R4 Mullins.

Laptop monitor is off (eDP), but there is picture on external one, connected through hdmi port.

Reverting troublesome commit (drm/amd/display: Use dig enable to determine fast boot optimization.) make things work again.
Comment 28 Alex Deucher 2018-08-15 18:22:18 UTC
Created attachment 141123 [details] [review]
patch to test

Does this patch fix the issue?
Comment 29 SET 2018-08-15 19:52:41 UTC
The patch allows to view a normal booting.

However, the laptop hangs on suspend. Please see dmesg attachement. Switching back to radeon.
Comment 30 Server Angels 2018-08-15 19:54:15 UTC
Actually rc8 from fedora/rawhide fixed it for me. I wasn't sure at first as the initial upgrade didn't work, but a full power off / power off then did, which was bizarre.

I did add the kernel option amdgpu.dc=1 to make sure it was using the correct code. All been fine since.
Comment 31 SET 2018-08-15 19:55:07 UTC
Created attachment 141124 [details]
dmesg output after patch
Comment 32 Alex Deucher 2018-08-15 20:21:23 UTC
(In reply to Server Angels from comment #30)
> Actually rc8 from fedora/rawhide fixed it for me. I wasn't sure at first as
> the initial upgrade didn't work, but a full power off / power off then did,
> which was bizarre.
> 
> I did add the kernel option amdgpu.dc=1 to make sure it was using the
> correct code. All been fine since.

Are you saying it the black screen is fixed with rc8 or that rc8 fixes the suspend issue and the patch is still required to fix the black screen?
Comment 33 SET 2018-08-15 20:30:28 UTC
(In reply to Alex Deucher from comment #32)
> (In reply to Server Angels from comment #30)
> > Actually rc8 from fedora/rawhide fixed it for me. I wasn't sure at first as
> > the initial upgrade didn't work, but a full power off / power off then did,
> > which was bizarre.
> > 
> > I did add the kernel option amdgpu.dc=1 to make sure it was using the
> > correct code. All been fine since.
> 
> Are you saying it the black screen is fixed with rc8 or that rc8 fixes the
> suspend issue and the patch is still required to fix the black screen?

You are mentioning the 'suspend issue' which is not mentioned in comment #30.
Just in case it's a click mistake :

For me, the black screen issue is fixed with the patch applied to -rc8. Without the patch to -rc8, the black screen issue is still here.
Then there is the suspend issue, which is not related to this thread.
Comment 34 Przemek 2018-08-15 22:00:37 UTC
(In reply to Alex Deucher from comment #28)
> Created attachment 141123 [details] [review] [review]
> patch to test
> 
> Does this patch fix the issue?

Yes, this patch resolves "eDP display turned off on boot time" issue on my machine, kernel - gentoo 4.18.1.
Thanks Alex,
Przemek.
Comment 35 Alex Deucher 2018-08-16 06:36:17 UTC
does the screen light up properly on my amd-staging-drm-next branch?
https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
Comment 36 Server Angels 2018-08-16 09:21:33 UTC
To be clear - rc8 (without any other patches) resolves my black screen at boot time issue.

I did have a black screen when coming out of suspend issue, which is now resolved, but I didn't think this was driver related as it also happened in Windows on the same machine. There has been a AMD Windows driver update recently too, so maybe the issue was resolved on both?
Comment 37 Przemek 2018-08-16 17:53:13 UTC
(In reply to Alex Deucher from comment #35)
> does the screen light up properly on my amd-staging-drm-next branch?
> https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next

No, it doesn't, moreover, I have kernel oops with amd-staging-drm-next, but have no idea if I should open another bug report or not.

OOPS log attached (from enablig amd modesetting).
Comment 38 Przemek 2018-08-16 17:55:40 UTC
Created attachment 141145 [details]
kernel ooops with attached patch on amd-staging-drm-next branch.
Comment 39 Alex Deucher 2018-08-16 18:05:44 UTC
(In reply to Przemek from comment #37)
> (In reply to Alex Deucher from comment #35)
> > does the screen light up properly on my amd-staging-drm-next branch?
> > https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
> 
> No, it doesn't, moreover, I have kernel oops with amd-staging-drm-next, but
> have no idea if I should open another bug report or not.

Yes, please file a new bug for that.  thanks.
Comment 40 Alex Deucher 2018-08-16 20:37:46 UTC
Created attachment 141151 [details] [review]
disable eDP optimization on DCE8

Not sure why this is causing a problem on DCE8.  Try this patch.
Comment 41 Andrey Arapov 2018-09-26 08:39:10 UTC
Not sure if that helps, but could you try reverting this commit?
https://github.com/torvalds/linux/commit/e03fd3f300f6184c1264186a4c815e93bf658abb

My MacBookPro Mid 2017 started to experience the black screen issue since 4.18.0-rc1 up to 4.19.0-rc4 (and probably higher). Having that commit reverted has resolved the black screen issue I had, allowing me to use amdgpu.dc=1 again. 


Refs:

- https://github.com/Dunedan/mbp-2016-linux/issues/73#issuecomment-422397681
Comment 42 SET 2018-09-26 19:30:41 UTC
(In reply to Andrey Arapov from comment #41)
> Not sure if that helps, but could you try reverting this commit?
> https://github.com/torvalds/linux/commit/
> e03fd3f300f6184c1264186a4c815e93bf658abb
> 

Reverted that commit, still a black screen with 4.19-rc4 in the above context.
Using radeon for a useful laptop.
Comment 43 Nicholas Kazlauskas 2018-09-27 13:20:26 UTC
Did you get around to trying Alex's patch? Does the black screen still occur with it?
Comment 44 SET 2018-09-27 15:26:39 UTC
(In reply to Nicholas Kazlauskas from comment #43)
> Did you get around to trying Alex's patch? Does the black screen still occur
> with it?

Please see comment #29.

No black screen with the patch. But suspend will subsequently fail.
Comment 45 Martin Peres 2019-11-19 08:41:32 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/417.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.