Summary: | Screen flickering after getting hpd irq | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Ethan Hsieh <ethan.hsieh> | ||||||||||||||||||||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||||||||||
Severity: | major | ||||||||||||||||||||||||
Priority: | high | CC: | intel-gfx-bugs, manasi.d.navare, perry_yuan, rodrigo.vivi, sheirys2, tjaalton | ||||||||||||||||||||||
Version: | DRI git | ||||||||||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||
i915 platform: | CFL | i915 features: | display/eDP | ||||||||||||||||||||||
Attachments: |
|
Description
Ethan Hsieh
2018-02-27 07:20:33 UTC
Created attachment 137635 [details]
photo - screen_goes_blurry
Created attachment 137636 [details]
video - screen goes blurry
The video looks like a flicker more than being blurry. To me blurry is static while flicker is constantly changing. Would you describe it as a flicker? Is this the full dmesg? There's information missing. Does the screen recover if you switch the display off and back on? Please try without nvidia module loaded. > The video looks like a flicker more than being blurry. To me blurry is static while flicker is constantly changing. Would you describe it as a flicker? Sure...I'm not graphic expert. > Is this the full dmesg? There's information missing. Yes. It's the full dmesg. > Does the screen recover if you switch the display off and back on? The issue can be recovered by suspend/resume (display will off and back on) > Please try without nvidia module loaded. Still can reproduce the issue after removing nvidia 390.25. http://www.nvidia.com/Download/driverResults.aspx/130646/en-us Seems to be a Coffeelake with a Cannonlake PCH. (Mysteriously the device info is missing from the logs.) I have no clue about the root cause yet, but what happens is that the eDP panel signals short pulse hotplug. This indicates the panel requests a link status check. We do this, and find the link status is in fact not good. To remedy, we try to re-train the link. The link training succeeds. This is all according to DP spec. From the logs, there is nothing out of the ordinary. But apparently the link retraining leads to flickering. The hotplug occurs at about 71 seconds into the boot. If this is typical, this should give you enough time to do a modeset (disable/enable display, but don't suspend/resume) before this happens due to the hotplug. Does this cause flicker? Or only when the panel requests link status check? The simple thing (for us) to do is double check the Coffeelake DDI buffer translations and link training values, especially see if there have been any recent updates to the specs. Curiously the VBT refers to Skylake. I'm wondering where the VBT comes from and whether it's been updated for the platform at hand. My first instinct is that the speculation (outside the bug report) about this being an eDP 1.4 related thing is a red herring. That said, there haven't been all that many eDP 1.4 panels around yet, and it's of course possible we may have overlooked a DP spec change wrt link (re)training that's specific to eDP 1.4. (In reply to Jani Nikula from comment #6) > My first instinct is that the speculation (outside the bug report) about > this being an eDP 1.4 related thing is a red herring. > > That said, there haven't been all that many eDP 1.4 panels around yet, and > it's of course possible we may have overlooked a DP spec change wrt link > (re)training that's specific to eDP 1.4. Hi Nikula : Basing on the isolation from Testing,the panel issue only happen on eDP1.4 panel .It cannot be reproduced with eDP1.3 panel. So i think we need to check if the eDP1.4 protocol and i915 driver has something need to fix. Thanks. Perry (In reply to Perry Yuan from comment #7) > Basing on the isolation from Testing,the panel issue only happen on eDP1.4 > panel .It cannot be reproduced with eDP1.3 panel. Do you get short hotplug pulses on the eDP 1.3 panel? Does that lead to link retraining? If not, then it's inconclusive. Hi Nikula, I tried to reproduce the issue on laptop with eDP 1.3 panel. I cannot reproduce the issue and didn't get short hotplug pulses. Shot in the dark, please try [1]. We can't apply that as-is, but it's a data point. [1] http://patchwork.freedesktop.org/patch/msgid/1520579339-14745-1-git-send-email-manasi.d.navare@intel.com Created attachment 138093 [details]
kern.log (drm.debug=0x14)
Hi Nikula,
The issue is gone after applying the patch. But, it's not easy to reproduce the issue. Sometime it takes more than 1 hr to reproduce it. So, I'll do more tests to confirm it.
Here is the test result:
1. With patch in [1]: Pass (0/7)
2. Without patch in [1]: Fail (4/6)
Here are the reproduction steps:
1. Run glxgears for 30mins with patched kernel
2. Check if screen is flickering or not
3. Reboot
3. Run glxgears for 30mins
4. Check if screen is flickering or not
5. Reboot
6. Got to 1.
Please check log as attached.
Hi Nikula, All logs are around 5GB. So, I only uploaded log12&13 (attached file in comment#11). I always can get following kernel message in fail cases $ grep -r -e "got hpd" . ./08_fail/kern.log:kernel:[ 621.812487][drm:intel_dp_hpd_pulse [i915]] got hpd irq on port A - short ./06_fail/kern.log:kernel:[2428.068094][drm:intel_dp_hpd_pulse [i915]] got hpd irq on port A - short ./12_fail/kern.log:kernel:[1318.258726][drm:intel_dp_hpd_pulse [i915]] got hpd irq on port A - short ./10_fail/kern.log:kernel:[ 723.448265][drm:intel_dp_hpd_pulse [i915]] got hpd irq on port A - short Hi Nikula, The issue seems to be gone after applying patch. I ran stress test (glxgears) for 1 hour tree times and cannot reproduce the issue. Here is the test result: 1. With patch in [1] (1 hour): Pass (0/3) 2. Without patch in [1] ( 30 mins): Fail (2/2) BTW, when issue occurs, it can be recovered by following command. $ DISPLAY=:0 xset dpms force off $ DISPLAY=:0 xset dpms force on Hi Jani, Patched kernel passed 3hr test. May I know what next action is? Thanks for testing. The problem with the patch referenced in comment #10 is that will regress older platforms. We can't do that. The background is that we have tried to use optimal link parameters, and we have tried to optimize for both fewer lanes with higher rate, and more lanes with lower rate. All of this failed, until we learned that, uh, a certain other OS always used the maximum link parameters reported by the display. Apparently that was the only configuration that the panel/laptop vendors then ended up validating. We switched to using max link rate and lane count, which presumably correspond to the native resolution of the display anyway, and we haven't had issues with that approach until now. Arguably all the displays should work with all the lane counts and rates they report, but sadly this appears not to be the case. In this bug, the display does not work with the maximum parameters it reports. Apparently nobody has double checked the DDI buffer translation and voltage swing etc. parameters that I suggested in comment #5. :( eDP 1.4 also adds two somewhat related features. Link rate select to support more intermediate rates between what's available for DP. DSC to support stream compression. 1) Please attach /sys/kernel/debug/dri/0/i915_vbt. The VBT is supposed to contain the port specific maximums, but perhaps that's not being used. 2) Please see that you have CONFIG_DRM_DP_AUX_CHARDEV=y, and try to use /dev/drm_dp_auxN node to hexdump the DPCD, and attach them. Created attachment 138241 [details]
i915_vbt.log
cat /sys/kernel/debug/dri/0/i915_vbt > i915_vbt.log
Created attachment 138242 [details]
hexdump.log
Yes. CONFIG_DRM_DP_AUX_CHARDEV=y
hexdump /dev/drm_dp_aux0
hexdump /dev/drm_dp_aux1
hexdump /dev/drm_dp_aux2
hexdump /dev/drm_dp_aux3
Please refer to attached file.
First of all. Sorry about spam. This is mass update for our bugs. Sorry if you feel this annoying but with this trying to understand if bug still valid or not. If bug investigation still in progress, please ignore this and I apologize! If you think this is not anymore valid, please comment to the bug that can be closed. If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug. (In reply to Jani Nikula from comment #15) > Thanks for testing. > > The problem with the patch referenced in comment #10 is that will regress > older platforms. We can't do that. > > The background is that we have tried to use optimal link parameters, and we > have tried to optimize for both fewer lanes with higher rate, and more lanes > with lower rate. All of this failed, until we learned that, uh, a certain > other OS always used the maximum link parameters reported by the display. > Apparently that was the only configuration that the panel/laptop vendors > then ended up validating. We switched to using max link rate and lane count, > which presumably correspond to the native resolution of the display anyway, > and we haven't had issues with that approach until now. > > Arguably all the displays should work with all the lane counts and rates > they report, but sadly this appears not to be the case. In this bug, the > display does not work with the maximum parameters it reports. > > Apparently nobody has double checked the DDI buffer translation and voltage > swing etc. parameters that I suggested in comment #5. :( > > eDP 1.4 also adds two somewhat related features. Link rate select to support > more intermediate rates between what's available for DP. DSC to support > stream compression. > > 1) Please attach /sys/kernel/debug/dri/0/i915_vbt. The VBT is supposed to > contain the port specific maximums, but perhaps that's not being used. > > 2) Please see that you have CONFIG_DRM_DP_AUX_CHARDEV=y, and try to use > /dev/drm_dp_auxN node to hexdump the DPCD, and attach them. Hi Jani: If the patch has regression effect,then what we can do to fix the issue ? Perry Please try this patch to debug. Is the issue reproducible with this? Either way, please attach dmesg running this, with drm.debug=14 module parameter set. diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c index 62f82c4298ac..78ee270fefc3 100644 --- a/drivers/gpu/drm/i915/intel_dp.c +++ b/drivers/gpu/drm/i915/intel_dp.c @@ -1806,7 +1806,7 @@ intel_dp_compute_config(struct intel_encoder *encoder, * configuration, and typically these values correspond to the * native resolution of the panel. */ - min_lane_count = max_lane_count; + min_lane_count = max_lane_count = 2; min_clock = max_clock; } For Rodrigo, Manasi, et al: the purpose in comment #20 is to ensure we can indeed reach the highest clock. Created attachment 138719 [details] kern.log (drm.debug=0x14) Hi Jani With the patch in comment#20, screen becomes black after booting to kernel. Jani, any advice to progress here? Highest priority for consideration Please try drm-tip branch of [1]. I presume that will still fail, but I've been wrong before, so let's make sure. After that, please try patch [2] on top of drm-tip. I presume this will fix the issue. But let's make sure. ;) If all this helps, we'll still need to figure out how to backport this to older kernels as needed. But first things first, let's figure this out on current drm-tip. [1] https://cgit.freedesktop.org/drm/drm-tip [2] http://patchwork.freedesktop.org/patch/msgid/20180509071321.28563-1-jani.nikula@intel.com Cannot reproduce the issue with both of [1] and [2]. Run glxgears for 1 hour. [1]: Pass (2/2) [2]: Pass (2/2) (In reply to Ethan Hsieh from comment #26) > Cannot reproduce the issue with both of [1] and [2]. > Run glxgears for 1 hour. > [1]: Pass (2/2) > [2]: Pass (2/2) That's a surprise. I expected [1] to fail and [2] to fix it. It appears we already have something that fixes the issue in drm-tip. I'm inclined to mark this resolved if there is no objection. Please post the dmesg with drm.debug=14 for running drm-tip. Is this for sure the same configuration that fails on older kernels? Created attachment 139606 [details]
kern.log (drm.debug=0xe)
Please refer to the attached file for drm-tip's kernel log.
I always use same machine and configuration to reproduce the issue.
(In reply to Ethan Hsieh from comment #30) > Created attachment 139606 [details] > kern.log (drm.debug=0xe) > > Please refer to the attached file for drm-tip's kernel log. > I always use same machine and configuration to reproduce the issue. Well, for some reason or another, we don't get the hotplug irq from the panel here like we do in the failing case. That's what the panel uses to indicate the link is not good, and sets the failure in motion. Created attachment 140249 [details]
with drm.debug=0xe
Hello I think I am also affected by this bug, or similar. Screen flickers constantly in random intervals. Flickering does not appear while running windows or in bios. To describe "flick" - display or its part becomes black or filled with random pixels for a very short time. Sometimes "flick" appears more then once per second. When "flick" appears dmesg (with drm.debug=0xe) produces: ... birž. 19 21:29:04 localhost.localdomain kernel: [drm:drm_mode_addfb2 [drm]] [FB:74] birž. 19 21:29:04 localhost.localdomain kernel: [drm:gen8_irq_handler [i915]] hotplug event received, stat 0x01000000, dig 0x11101010, pins 0x00000010 birž. 19 21:29:04 localhost.localdomain kernel: [drm:intel_hpd_irq_handler [i915]] digital hpd port A - short birž. 19 21:29:04 localhost.localdomain kernel: [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port A - short birž. 19 21:29:04 localhost.localdomain kernel: [drm:intel_dp_read_dpcd [i915]] DPCD: 12 0a 84 41 00 00 01 01 02 00 00 00 00 0b 00 birž. 19 21:29:04 localhost.localdomain kernel: [drm:drm_mode_addfb2 [drm]] [FB:72] ... Arch: x86_64 Kern: 4.16.15-300.fc28.x86_64 Dist: fedora 28 Machine: Lenovo yoga 900s-12isk xrandr --verbose: Screen 0: minimum 320 x 200, current 2560 x 1440, maximum 8192 x 8192 XWAYLAND0 connected 2560x1440+0+0 (0x22) normal (normal left inverted right x axis y axis) 280mm x 160mm Identifier: 0x21 Timestamp: 21014 Subpixel: unknown Gamma: 1.0:1.0:1.0 Brightness: 0.0 Clones: CRTC: 0 CRTCs: 0 Transform: 1.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 1.000000 filter: 2560x1440 (0x22) 312.250MHz -HSync +VSync *current +preferred h: width 2560 start 2752 end 3024 total 3488 skew 0 clock 89.52KHz v: height 1440 start 1443 end 1448 total 1493 clock 59.96Hz Created attachment 140250 [details]
screen flickering
sheirys2 it seems this got fixed on latest kernels. Could you please try with latest drm-tip? Reporter, can you check if this issue is still reproducible with latest drm-tip? Ethan, Ping? Hai, sorry for late reply.
>> sheirys2 it seems this got fixed on latest kernels.
>> Could you please try with latest drm-tip?
Can you provide information how to do that?
Also, I do not know if it is related, I installed archlinux with gnome3 and by default screen rotation does not work and screen flickering is gone. But after I installed `iio-sensor-proxy` rotation starts working and screen starts to flicker again. So for now fix for me is to remove `iio-sensor-proxy` package.
Hi Lakshmi, I used DVT1 device to reproduce the issue and the issue can be reproduced on DVT1 easily. But, I only have DVT2 on hand now. The failure rate is very low on DVT2. Even though latest drm-tip can pass 8 hours test, I have no confidence that the issue is fixed really by latest drm-tip. I'm assuming this is fixed by commit 7769db5883841b03de544a35a71ff528d4131c17 Author: Jani Nikula <jani.nikula@intel.com> Date: Wed Sep 5 12:53:21 2018 +0300 drm/i915/dp: optimize eDP 1.4+ link config fast and narrow that I just pushed. Please reopen if the problem still persists with that commit or current drm-tip. I just updated to Linux 5.0 and got a black screen when starting X. I did a git bisect and found out this patch seems to be the culprit. I'm running an XPS 15 9570 (several other people seem to be having the same problem). Anything i can do to help diagnose what's wrong? FWIW this "fixes" it for me https://invent.kde.org/snippets/44 (In reply to Albert Astals Cid from comment #41) > I just updated to Linux 5.0 and got a black screen when starting X. > > I did a git bisect and found out this patch seems to be the culprit. > > I'm running an XPS 15 9570 (several other people seem to be having the same > problem). > > Anything i can do to help diagnose what's wrong? Please file a new bug, attach dmesg all the way from boot with drm.debug=14 module parameter set. And bugzilla decided not to email me so i didn't see the answer. Ok, so i will recompile the kernel without my workaround and give you that debug log. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.