Bug 101499 - Black screen when detaching HDMI cable (AMD A10-9620P)
Summary: Black screen when detaching HDMI cable (AMD A10-9620P)
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-19 16:27 UTC by Carlo Caione
Modified: 2019-11-19 08:19 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
journal_from_boot_to_HDMI_cable_detaching (311.72 KB, text/plain)
2017-06-19 16:27 UTC, Carlo Caione
no flags Details
dmps_off_on (2.33 KB, text/plain)
2017-06-19 16:29 UTC, Carlo Caione
no flags Details
Corruption using xf86-video-amdgpu HEAD (3.51 MB, image/jpeg)
2017-06-20 09:46 UTC, Carlo Caione
no flags Details
journal_HDMI_detaching_corruption (68.80 KB, text/plain)
2017-06-21 15:28 UTC, Carlo Caione
no flags Details
dm_plane_helper_prepare_fb with AMDGPU_GEM_DOMAIN_GTT (3.75 MB, image/jpeg)
2017-06-22 09:00 UTC, Carlo Caione
no flags Details
dm_plane_helper_prepare_fb with AMDGPU_GEM_DOMAIN_GTT (249.09 KB, text/plain)
2017-06-27 10:29 UTC, Carlo Caione
no flags Details

Description Carlo Caione 2017-06-19 16:27:19 UTC
Created attachment 132061 [details]
journal_from_boot_to_HDMI_cable_detaching

We are working with new laptops that have the AMD Bristol Ridge
chipset with this SoC:

AMD A10-9620P RADEON R5, 10 COMPUTE CORES 4C+6G

When the HDMI cable is attached and then detached, the internal panel goes black and the only way to have it back is switching to another TTY and then back to xorg or doing 'xset dpms force off; xset dpms force on'.

This is reproducible also on the ~agd5f/drm-next-4.13-wip branch.

In the attachment the whole journal when booting the kernel compiled from ~agd5f/drm-next-4.13-wip, landing in xorg and then attaching and detaching the HDMI cable.
Comment 1 Carlo Caione 2017-06-19 16:29:19 UTC
Created attachment 132062 [details]
dmps_off_on

This is what we have in the log when we give 'xset dpms force off; xset dpms force on'.
Comment 2 Michel Dänzer 2017-06-20 04:02:13 UTC
> Jun 19 17:09:49 endless kernel: [drm] Detected VRAM RAM=32M, BAR=32M

I suspect the core problem is that there's only 32 MB of VRAM available. Is it possible to increase this in the BIOS setup?
Comment 3 Carlo Caione 2017-06-20 06:11:43 UTC
> I suspect the core problem is that there's only 32 MB of VRAM available.
> Is it possible to increase this in the BIOS setup?
It is not. There is nothing in the BIOS related to VRAM.
Comment 4 Michel Dänzer 2017-06-20 06:15:35 UTC
(In reply to Carlo Caione from comment #3)
> There is nothing in the BIOS related to VRAM.

FWIW, it wouldn't say "VRAM" but rather "integrated graphics memory" or something like that.
Comment 5 Carlo Caione 2017-06-20 06:21:48 UTC
> FWIW, it wouldn't say "VRAM" but rather "integrated graphics memory" or
> something like that.
Yeah :) Let me put this way: there is nothing in the BIOS related to graphic controller / GPU / video in general.

FWIW the BIOS is InsydeH20 v0.09
Comment 6 Carlo Caione 2017-06-20 06:24:26 UTC
Also I guess you are looking at the wrong controller:

[    2.111381] amdgpu 0000:00:01.0: VRAM: 32M 0x000000F400000000 - 0x000000F401FFFFFF (32M used)
[    2.111390] [drm] Detected VRAM RAM=32M, BAR=32M
[    2.111511] [drm] amdgpu: 32M of VRAM memory ready
[    6.560772] amdgpu 0000:03:00.0: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
[    6.560785] [drm] Detected VRAM RAM=2048M, BAR=256M
[    6.560805] [drm] amdgpu: 2048M of VRAM memory ready

So I guess in this laptop there is an integrated controller with 32MB of VRAM and the GPU with 2GB?
Comment 7 Michel Dänzer 2017-06-20 06:38:44 UTC
There are two GPUs, the integrated one in the APU (Carrizo family) and a dedicated one (Polaris 12 family). Xorg is using the integrated one, and there's no way around that, because only the integrated GPU has display outputs hooked up. The dedicated GPU could only be used via PRIME render offloading.
Comment 8 Carlo Caione 2017-06-20 06:45:03 UTC
Interesting. Ok then, back to square one: no BIOS options to tweak the integrated graphics memory / controller.

On a side note: is it normal so many error messages in the journal? Like:

kernel: amdgpu 0000:00:01.0: ffff9cccade4d800 pin failed
kernel: [drm:amdgpu_crtc_page_flip_target [amdgpu]] *ERROR* failed to pin new abo buffer before flip
gdm-Xorg-:0[672]: (WW) AMDGPU(0): flip queue failed: Cannot allocate memory
gdm-Xorg-:0[672]: (WW) AMDGPU(0): Page flip failed: Cannot allocate memory
gdm-Xorg-:0[672]: (EE) AMDGPU(0): present flip failed
...
gdm-Xorg-:0[672]: (WW) AMDGPU(0): get vblank counter failed: Invalid argument

or

kernel: amdgpu: [powerplay] min_core_set_clock not set
Comment 9 Michel Dänzer 2017-06-20 07:24:44 UTC
(In reply to Carlo Caione from comment #8)
> On a side note: is it normal so many error messages in the journal? Like:
> 
> kernel: amdgpu 0000:00:01.0: ffff9cccade4d800 pin failed
> kernel: [drm:amdgpu_crtc_page_flip_target [amdgpu]] *ERROR* failed to pin
> new abo buffer before flip
> gdm-Xorg-:0[672]: (WW) AMDGPU(0): flip queue failed: Cannot allocate memory
> gdm-Xorg-:0[672]: (WW) AMDGPU(0): Page flip failed: Cannot allocate memory
> gdm-Xorg-:0[672]: (EE) AMDGPU(0): present flip failed
> ...
> gdm-Xorg-:0[672]: (WW) AMDGPU(0): get vblank counter failed: Invalid argument

I think all of these are triggered by VRAM being too small to fit the scanout buffers covering the laptop panel + external monitor.


> kernel: amdgpu: [powerplay] min_core_set_clock not set

Not sure about this one, might be harmless.
Comment 10 Carlo Caione 2017-06-20 09:10:28 UTC
> I think all of these are triggered by VRAM being too small to fit the
> scanout buffers covering the laptop panel + external monitor.
Probably I'm missing something, but when the HDMI is connected everything works fine, with the scanout buffer correctly displayed on the laptop panel + external monitor. The problem starts when we _disconnect_ the HDMI cable.

Also if it was a problem with VRAM being too small, why toggling the DPMS makes the laptop panel working fine again?
Comment 11 Michel Dänzer 2017-06-20 09:25:30 UTC
(In reply to Carlo Caione from comment #10)
> Probably I'm missing something, but when the HDMI is connected everything
> works fine, with the scanout buffer correctly displayed on the laptop panel
> + external monitor. The problem starts when we _disconnect_ the HDMI cable.

At least some of the errors you referenced in comment 8 already happen before that. They're related to failed attempts at page flipping. xf86-video-amdgpu manages to chug along regardless.

When you unplug the HDMI cable is presumably when

> Jun 19 17:10:31 endless gdm-Xorg-:0[672]: (EE) AMDGPU(0): failed to set mode: Invalid argument

appears, i.e. drmModeSetCrtc() fails, presumably (not 100% sure about this part though) because the new, smaller scanout buffer cannot fit into VRAM while the old, larger one is still being scanned out.

> Also if it was a problem with VRAM being too small, why toggling the DPMS
> makes the laptop panel working fine again?

Toggling DPMS off disables scanout, which allows the old scanout buffer to be moved out of VRAM, so the new one can be moved in.


Some details might differ from the above, but that should be roughly what's happening.
Comment 12 Carlo Caione 2017-06-20 09:46:37 UTC
Created attachment 132081 [details]
Corruption using xf86-video-amdgpu HEAD

Interesting. Thank you for explaining this and your time.

I just tried the HEAD of xf86-video-amdgpu and now instead of having a black screen I have the image corruption as shown in the picture.

Anything I can do to debug / have this fixed? Is it something I need to fix with the ODM (acer)?
Comment 13 Carlo Caione 2017-06-20 16:15:40 UTC
> I just tried the HEAD of xf86-video-amdgpu and now instead of
> having a black screen I have the image corruption as shown
> in the picture.
Just FYI this is due to commit b09fde0d81 ("Use reference counting for tracking KMS framebuffer lifetimes").
Comment 14 Michel Dänzer 2017-06-21 08:14:26 UTC
(In reply to Carlo Caione from comment #12)
> I just tried the HEAD of xf86-video-amdgpu and now instead of having a black
> screen I have the image corruption as shown in the picture.

Without seeing the corresponding Xorg log, I guess that's just a different symptom triggered by the same issue.


> Anything I can do to debug / have this fixed?

We need to make scanout work with buffers outside of VRAM somehow. I've kicked off an internal discussion about this.

With an amd-staging-* kernel branch and DC enabled, you can try tweaking dce_v11_0_crtc_do_set_base to pass AMDGPU_GEM_DOMAIN_GTT instead of / in addition to AMDGPU_GEM_DOMAIN_VRAM to amdgpu_bo_pin. The DC code should already handle this correctly, but we're not sure whether or not there are additional constraints on system memory used for scanout. If there are, it probably won't work correctly yet.


> Is it something I need to fix with the ODM (acer)?

If you can get a BIOS which allows setting up larger VRAM, that might allow you to move forward faster.
Comment 15 Carlo Caione 2017-06-21 15:28:47 UTC
Created attachment 132118 [details]
journal_HDMI_detaching_corruption

> Without seeing the corresponding Xorg log, I guess that's just
> a different symptom triggered by the same issue.
Attached the log. Yeah, not much different.

> With an amd-staging-* kernel branch and DC enabled, you can try tweaking
> dce_v11_0_crtc_do_set_base to pass AMDGPU_GEM_DOMAIN_GTT instead of / in
> addition to AMDGPU_GEM_DOMAIN_VRAM to amdgpu_bo_pin. The DC code should
> already handle this correctly, but we're not sure whether or not there are
> additional constraints on system memory used for scanout. If there are, it
> probably won't work correctly yet.
I tried amd-staging-4.11 and interestingly dce_v11_0_crtc_do_set_base is called only when DC is disabled. When DRM_AMD_DC=y the function is never called.

I tried also to make the s/AMDGPU_GEM_DOMAIN_VRAM/AMDGPU_GEM_DOMAIN_GTT/ change with DC disabled. What I get is that I have some kind of intermittent display corruption when _connecting_ the HDMI cable on both the screens but on detaching at least everything is fine on the laptop panel.
Comment 16 Michel Dänzer 2017-06-22 08:23:38 UTC
(In reply to Carlo Caione from comment #15)
> > With an amd-staging-* kernel branch and DC enabled, you can try tweaking
> > dce_v11_0_crtc_do_set_base to pass AMDGPU_GEM_DOMAIN_GTT instead of / in
> > addition to AMDGPU_GEM_DOMAIN_VRAM to amdgpu_bo_pin. [...]
> I tried amd-staging-4.11 and interestingly dce_v11_0_crtc_do_set_base is
> called only when DC is disabled.

Right, sorry, with DC you need to tweak dm_plane_helper_prepare_fb instead.
Comment 17 Carlo Caione 2017-06-22 09:00:59 UTC
Created attachment 132130 [details]
dm_plane_helper_prepare_fb with AMDGPU_GEM_DOMAIN_GTT

> Right, sorry, with DC you need to tweak dm_plane_helper_prepare_fb instead.
Yup, I tried this but not much luck. In attachment what I get when using AMDGPU_GEM_DOMAIN_GTT. Using AMDGPU_GEM_DOMAIN_GTT | AMDGPU_GEM_DOMAIN_VRAM is pretty much the same but the cursor is correctly displayed (even though I cannot move it).

The log is filled with:

[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer

/me scratch his head
Comment 18 Carlo Caione 2017-06-27 08:21:14 UTC
We have found another laptop with exactly the same issue (and again 32MB of VRAM for the embedded video controller). We have also requested to ACER a new BIOS with a bigger size of VRAM, waiting to receive it (hopefully).

Any news about the internal discussion about this issue? We are available to test any fix / workaround / proposal :)

Thanks,
Comment 19 Michel Dänzer 2017-06-27 08:29:19 UTC
No news I'm afraid.

(In reply to Carlo Caione from comment #17)
> In attachment what I get when using AMDGPU_GEM_DOMAIN_GTT.
[...]
> [drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin framebuffer

Do you also get those errors with AMDGPU_GEM_DOMAIN_GTT? If so, it might be interesting to track down the origin of the errors. Otherwise, it looks like there's still something missing.
Comment 20 Carlo Caione 2017-06-27 10:29:32 UTC
Created attachment 132281 [details]
dm_plane_helper_prepare_fb with AMDGPU_GEM_DOMAIN_GTT

> Do you also get those errors with AMDGPU_GEM_DOMAIN_GTT? If so, it might be 
> interesting to track down the origin of the errors. Otherwise, it looks like
> there's still something missing.

The error is not reproducible anymore using the latest amd-staging-4.11 branch and the master HEAD of xf86-video-amdgpu using AMDGPU_GEM_DOMAIN_GTT in dm_plane_helper_prepare_fb, but I still have the weird corruption with a white background screen and a corrupted square as pointer as in the previous attachment. In the attachment the journal when attaching and detaching the HDMI cable.

The only concerning errors I can see in there are:

[drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:2!                                    
[drm:hwss_wait_for_blank_complete [amdgpu]] *ERROR* DC: failed to blank crtc!                                                                         
[drm:dc_create [amdgpu]] *ERROR* DC: Number of connectors is zero!
Comment 21 Michel Dänzer 2017-06-30 07:43:48 UTC
Sounds like either DC is still missing something for scanning out from GTT, or there may indeed be additional constraints on the system memory used for it.

Harry, have you guys tested GTT scanout?
Comment 22 Carlo Caione 2017-06-30 09:44:43 UTC
Just to keep you updated we have verified that with a bigger VRAM this is not reproducible anymore.
Comment 23 Harry Wentland 2017-07-02 19:21:15 UTC
Michel, I don't remember trying GTT scanout. Not sure even what that really means for DC. Is that scatter/gather?
Comment 24 Michel Dänzer 2017-07-03 03:23:11 UTC
(In reply to Harry Wentland from comment #23)
> Not sure even what that really means for DC. Is that scatter/gather?

Yes, exactly.
Comment 25 Martin Peres 2019-11-19 08:19:45 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/194.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.