Summary: | [HSW] linux next-20170407+ boots to a blank screen | ||
---|---|---|---|
Product: | DRI | Reporter: | Rafael Ristovski <rafael.ristovski> |
Component: | DRM/Intel | Assignee: | Clinton Taylor <clinton.a.taylor> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | critical | ||
Priority: | high | CC: | dhinakaran.pandiyan, intel-gfx-bugs, jani.nikula, rafael.ristovski |
Version: | DRI git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | HSW | i915 features: | display/Other |
Attachments: |
Created attachment 130973 [details]
next-20170704 with fastboot=1, shows suspend/resume, display turns off after resume
The link m_n and data m_n values are different before and after suspend/resume. My hunch is when you booted with fastboot=1, commit[1] did not reduce data m_n but did so after suspend/resume. Before: [drm:intel_dump_pipe_config] dp m_n: lanes: 2; gmch_m: 2428257, gmch_n: 8388608, link_m: 134903, link_n: 524288, tu: 64 After: [drm:intel_dump_pipe_config] dp m_n: lanes: 2; gmch_m: 302776, gmch_n: 1048576, link_m: 33641, link_n: 131072, tu: 64 Try reverting this - commit 9a86cda07af2c63649932f0a4fc757701ef54c42 Author: Jani Nikula <jani.nikula@intel.com> Date: Mon Mar 27 14:33:25 2017 +0300 drm/i915/dp: reduce link M/N parameters Several major vendor USB-C->HDMI converters, in particular the DA200, fail to recover a 5.4 GHz 1 lane signal if the link N is greater than 0x80000. Created attachment 130979 [details] [review] test_revert (In reply to Rafael Ristovski from comment #0) > Created attachment 130972 [details] > next-20170324 with fastboot=1, shows suspend/resume, works fine > > Ever since next-20170407 (last good next-20170324, versions in-between not > tested) my laptops screen stopped turning on after resuming. > Upon removing i915.fastboot=1 from cmdline, the display now turns off > immediately after boot, which means enabling fastboot only covered up the > real issue. > > HW: > i7-4500U Haswell with HD 4400 I think you are right, fastboot adjusts the m_n values, which might explain why it worked with fastboot=1. I have attached the revert for you test. Please let me know if that works. I can confirm that reverting 9a86cda07af2c63649932f0a4fc757701ef54c42 fixes the issue. Both booting with fastboot off and suspend/resume work as intended. Feel free to send me any future patches to test, although this seems to be some weird edge-case scenario with my laptop. Unless of course I'm the only one using -next on a Haswell i7 :) Curious thing.. >>> 2428257. / 8388608 0.28947 >>> 302776. / 1048576 0.28874 >>> 2428257. / 8 303532.125 So this might be some rounding error? same for link_m/n, expected 33725, got 33641. (In reply to Maarten Lankhorst from comment #6) > Curious thing.. > > >>> 2428257. / 8388608 > 0.28947 > > >>> 302776. / 1048576 > 0.28874 > > >>> 2428257. / 8 > 303532.125 > > So this might be some rounding error? > > same for link_m/n, expected 33725, got 33641. 33641/131072=0.2567 is closer to the original pixel_clock/link_clock ratio (69300/270000=0.2567) than 33725/131072=0.2573. Also, in the attachment (next-20170324 with fastboot=1, shows suspend/resume, works fine) bios programs link_m: 134903 whereas driver programs link_m: 134567. Because both work fine, I guess the problem is elsewhere. I wonder if we should not be scaling down data m/n at all. The patch scales both data and link m/n to workaround a dongle issue, we could probably just scale down only link m/n since data m/n is for internal use. Created attachment 130986 [details] [review] scale down only link m/n Can you please test this? Sadly patch is a no-go. Same issue as before :/ I suppose it's the sink that's failing to recover the stream with smaller m/n values then. I'll send the revert to the mailing list for now. For extra lols, the BIOS sets up a pixel clock that is different from the mode specified in the VBT. Rafael, please remove any fastboot parameter, and provide dmesgs from boot, with timestamps and everything, for both 1) drm-next or whatever your baseline was, 2) same with 9a86cda07af2 ("drm/i915/dp: reduce link M/N parameters") reverted. The DP CTS defines this as the margin of error for M/N: | (Mvid / Nvid) * Ls - Fp | <= 0.006 * Fp Where Ls is the measured average link rate and Fp is the required pixel rate for the mode. So let's first ignore the fastboot=1 case, and any modes set up by the BIOS, to get an apples to apples comparison before/after the regressing commit. Rafael, please also try diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 85b9e2f521a0..6d608e3df597 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -6288,7 +6288,7 @@ static void compute_m_n(unsigned int m, unsigned int n, } *ret_n = min_t(unsigned int, roundup_pow_of_two(n), DATA_LINK_N_MAX); - *ret_m = div_u64((uint64_t) m * *ret_n, n); + *ret_m = DIV_ROUND_CLOSEST_ULL((uint64_t) m * *ret_n, n); intel_reduce_m_n_ratio(ret_m, ret_n); } (In reply to Jani Nikula from comment #13) > Rafael, please also try > > diff --git a/drivers/gpu/drm/i915/intel_display.c > b/drivers/gpu/drm/i915/intel_display.c > index 85b9e2f521a0..6d608e3df597 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -6288,7 +6288,7 @@ static void compute_m_n(unsigned int m, unsigned int n, > } > > *ret_n = min_t(unsigned int, roundup_pow_of_two(n), DATA_LINK_N_MAX); > - *ret_m = div_u64((uint64_t) m * *ret_n, n); > + *ret_m = DIV_ROUND_CLOSEST_ULL((uint64_t) m * *ret_n, n); > intel_reduce_m_n_ratio(ret_m, ret_n); > } Again patch is a no-go, attaching logs of working vs bad+patched now. Created attachment 131053 [details]
Dmesg of working version
Created attachment 131054 [details]
Dmesg of broken version (after patches)
You can see a clear difference: Good: [1.695231] [drm:intel_dump_pipe_config] dp m_n: lanes: 2; gmch_m: 2428257, gmch_n: 8388608, link_m: 134903, link_n: 524288, tu: 64 [1.710939] [drm:intel_dump_pipe_config] dp m_n: lanes: 2; gmch_m: 2422210, gmch_n: 8388608, link_m: 134567, link_n: 524288, tu: 64 Bad: [1.692824] [drm:intel_dump_pipe_config] dp m_n: lanes: 2; gmch_m: 2428257, gmch_n: 8388608, link_m: 134903, link_n: 524288, tu: 64 [1.708518] [drm:intel_dump_pipe_config] dp m_n: lanes: 2; gmch_m: 302776, gmch_n: 1048576, link_m: 33642, link_n: 131072, tu: 64 (In reply to Jani Nikula from comment #13) > Rafael, please also try > > diff --git a/drivers/gpu/drm/i915/intel_display.c > b/drivers/gpu/drm/i915/intel_display.c > index 85b9e2f521a0..6d608e3df597 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -6288,7 +6288,7 @@ static void compute_m_n(unsigned int m, unsigned int n, > } > > *ret_n = min_t(unsigned int, roundup_pow_of_two(n), DATA_LINK_N_MAX); > - *ret_m = div_u64((uint64_t) m * *ret_n, n); > + *ret_m = DIV_ROUND_CLOSEST_ULL((uint64_t) m * *ret_n, n); > intel_reduce_m_n_ratio(ret_m, ret_n); > } I meant, try this *alone* on top of drm-next, without other patches. (In reply to Rafael Ristovski from comment #17) > You can see a clear difference: Yeah. The link M and N values are used to regenerate the pixel clock from the link clock on the sink side. Regenrated pixel clock = M / N * link clock. They are 24 bit values, and per the DP spec the source can choose them freely, as long as they are within the error boundaries described in comment #12. Except now we have sinks that fail on some values, and some other sinks that fail on some other values. I am not amused. Pixel clock 69300: > Good: > [1.710939] [drm:intel_dump_pipe_config] dp m_n: lanes: 2; gmch_m: 2422210, > gmch_n: 8388608, link_m: 134567, link_n: 524288, tu: 64 Regenerated pixel clock 69299.869537354. We don't know whether the sink rounds that to 69299 or 69300. > Bad: > [1.708518] [drm:intel_dump_pipe_config] dp m_n: lanes: 2; gmch_m: 302776, > gmch_n: 1048576, link_m: 33642, link_n: 131072, tu: 64 Regenerated pixel clock with the rounding patch 69300.384521484 and without the rounding patch 69298.324584961. They should be rounded to 69300 and 69298, respectively. I don't see a whole lot of difference here. Except that the sinks don't know how to count. The rounding should not change anything. In the attachment (next-20170324 with fastboot=1, shows suspend/resume, works fine) bios programs link_m: 134903 whereas driver programs link_m: 134567 and both work fine. This shows the sink is able to recover even with small changes in the values. I think, it's the bigger change in the down scaled values of link_m and link_n that seems to have caused the problem. (In reply to Dhinakaran Pandiyan from comment #20) > The rounding should not change anything. In the attachment (next-20170324 > with fastboot=1, shows suspend/resume, works fine) bios programs link_m: > 134903 whereas driver programs link_m: 134567 and both work fine. This > shows the sink is able to recover even with small changes in the values. I > think, it's the bigger change in the down scaled values of link_m and link_n > that seems to have caused the problem. No, this is a false conclusion, as the BIOS uses a different pixel clock. Created attachment 131098 [details]
next-20170407 with only round closest patch
[ 1.694577] [drm:intel_dump_pipe_config] dp m_n: lanes: 2; gmch_m: 2428257, gmch_n: 8388608, link_m: 134903, link_n: 524288, tu: 64
[ 1.710255] [drm:intel_dump_pipe_config] dp m_n: lanes: 2; gmch_m: 302776, gmch_n: 1048576, link_m: 33642, link_n: 131072, tu: 64
Patch is still a no-go, screen doesn't recover and remains black. (In reply to Jani Nikula from comment #21) > (In reply to Dhinakaran Pandiyan from comment #20) > > The rounding should not change anything. In the attachment (next-20170324 > > with fastboot=1, shows suspend/resume, works fine) bios programs link_m: > > 134903 whereas driver programs link_m: 134567 and both work fine. This > > shows the sink is able to recover even with small changes in the values. I > > think, it's the bigger change in the down scaled values of link_m and link_n > > that seems to have caused the problem. > > No, this is a false conclusion, as the BIOS uses a different pixel clock. You are right, but if you look at the link_m and pixel_rate values for the first kernel modeset: BIOS [drm:intel_dump_pipe_config] dp m_n: lanes: 2; gmch_m: 2428257, gmch_n: 8388608, link_m: 134903, link_n: 524288, tu: 64 [drm:intel_dump_pipe_config] port clock: 270000, pipe src size: 720x400, pixel rate 69472 Kernel - before suspend-resume [drm:intel_dp_compute_config] DP link computation with max lane count 2 max bw 270000 pixel clock 69300KHz [drm:intel_dump_pipe_config] dp m_n: lanes: 2; gmch_m: 2428257, gmch_n: 8388608, link_m: 134903, link_n: 524288, tu: 64 -> Regenerated pixel clock - 69472.904205322 Kernel - after suspend-resume [drm:intel_dp_compute_config] DP link computation with max lane count 2 max bw 270000 pixel clock 69300KHz [drm:intel_dump_pipe_config] dp m_n: lanes: 2; gmch_m: 2422210, gmch_n: 8388608, link_m: 134567, link_n: 524288, tu: 64 -> Regenerated pixel clock - 69299.869537354 It does look like the sink can tolerate small differences in link_m for the same pixel clock. (In reply to Dhinakaran Pandiyan from comment #24) > It does look like the sink can tolerate small differences in link_m for the > same pixel clock. Agreed, and it should. See comment #12. It's just that the sinks screw this up based on the absolute values of M and N, not based on M/N or M/N*Ls. Adding laptop and panel information from Rafael. Laptop: Dell Inspiron 3537 with a i7-4500U Haswell and HD 4400 Panel: Manufacturer: AUO Model 45ec Serial Number 0 Adding tag into "Whiteboard" field - ReadyForDev The bug still active *Status is correct *Platform is included *Feature is included *Priority and Severity correctly set *Logs included moving status to REOPENED since there is no pending information needed Upstreamed a patch that detects the Analogix 7737 DP->HDMI converter chip inside the DA200 dongle. If the chip is detected the M and N optimization is executed, if not the M and N values are not changed. This should prevent this issue from happening on this or any other panel/monitor. https://patchwork.freedesktop.org/patch/155478/ (In reply to Clinton Taylor from comment #28) > Upstreamed a patch that detects the Analogix 7737 DP->HDMI converter chip > inside the DA200 dongle. If the chip is detected the M and N optimization is > executed, if not the M and N values are not changed. This should prevent > this issue from happening on this or any other panel/monitor. > > https://patchwork.freedesktop.org/patch/155478/ After some thorough testing, I can firmly say that the patch works. I've tested both normal boot and suspend and both work fine. Only remark is that it seems kind of awkward having a check/workaround for an issue caused by a single dongle. But I guess this is the best way to do it. New series from Jani which includes a rebased version of Clint's fix https://patchwork.freedesktop.org/series/24282/ (In reply to Dhinakaran Pandiyan from comment #30) > New series from Jani which includes a rebased version of Clint's fix > https://patchwork.freedesktop.org/series/24282/ Good afternoon, Do you have any update on this case? Has the new patch been tested? Thank you. Patch has been tested and merged into drm-tip. This issue should now be closed. Please reopen if you have further issues. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 130972 [details] next-20170324 with fastboot=1, shows suspend/resume, works fine Ever since next-20170407 (last good next-20170324, versions in-between not tested) my laptops screen stopped turning on after resuming. Upon removing i915.fastboot=1 from cmdline, the display now turns off immediately after boot, which means enabling fastboot only covered up the real issue. HW: i7-4500U Haswell with HD 4400