Summary: | two external screens permanently go blank on HP EliteBook Folio G1 | ||
---|---|---|---|
Product: | DRI | Reporter: | Johannes Berg <johannes> |
Component: | DRM/Intel | Assignee: | Karthik B S <karthik.b.s> |
Status: | RESOLVED MOVED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | major | ||
Priority: | high | CC: | goodmirek, imre.deak, intel-gfx-bugs, nutello, russianneuromancer, shtetldik, thomas, ville.syrjala |
Version: | unspecified | Keywords: | regression |
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | Triaged, ReadyForDev | ||
i915 platform: | SKL | i915 features: | display/watermark |
Attachments: |
Description
Johannes Berg
2018-10-16 19:49:51 UTC
Created attachment 142053 [details]
vbios dump
Created attachment 142054 [details] dmesg Note that these attachments are the same as for bug 108460, both problems occurred there, just at different times. Created attachment 142055 [details]
register dump
Created attachment 142056 [details]
video showing the issue
FWIW, occasionally it only happens to one screen ... Created attachment 142057 [details]
dmesg covering just one or two instances of the problem
(In reply to Johannes Berg from comment #0) > DRM tip tree, commit 90b59df999a1 ("drm-tip: 2018y-10m-15d-20h-57m-27s UTC > integration manifest") > > Fedora 28 system on an HP EliteBook Folio G1 with Intel(R) Core(TM) m7-6Y75 > CPU, x86_64 of course. > > I have two external monitors connected: > * one directly by USB-C display port cable > * one via docking station (https://www.iogear.com/product/GUD3C01/), > connected with HDMI > > Both screens frequently go completely blank. Of course now while I was > waiting for it with the camera turned on it didn't happen, but basically it > just all goes completely black and the displays turn off the backlight > temporarily. How the screen comes back? Any particular actions will make the screen turn on again? > This is a regression, but I cannot exactly say when it was introduced. I > know it works on 4.13 (which I installed because of this bug...), and I > believe it still worked on 4.15, but I don't have that installed now to test. (In reply to Lakshmi from comment #7) > > Both screens frequently go completely blank. Of course now while I was > > waiting for it with the camera turned on it didn't happen, but basically it > > just all goes completely black and the displays turn off the backlight > > temporarily. > > How the screen comes back? Any particular actions will make the screen turn > on again? Oh, they just come back automatically and pretty much immediately, but it's super annoying to work with a system that just decides to turn your screen off and on occasionally :-) The log at https://bugs.freedesktop.org/attachment.cgi?id=142054 has pipe underruns on all 3 pipes, so I suspect some watermark problem. The log at https://bugs.freedesktop.org/attachment.cgi?id=142057 doesn't have any obvious issues, but that could just be due to underrun reporting being disabled at that time. Any chance that you could do a bisect? (In reply to Imre Deak from comment #9) > The log at > https://bugs.freedesktop.org/attachment.cgi?id=142054 > has pipe underruns on all 3 pipes, so I suspect some watermark problem. > > The log at > https://bugs.freedesktop.org/attachment.cgi?id=142057 > doesn't have any obvious issues, I'm pretty sure that the second log had the issue at least once. > but that could just be due to underrun > reporting being disabled at that time. but I suppose that's possible. > Any chance that you could do a bisect? Technically yes, since I know it was fine around 4.15 time-frame, but it'll take ... forever, especially on this machine. Any other ideas would be nicer... :-)
> Technically yes, since I know it was fine around 4.15 time-frame, but it'll
> take ... forever, especially on this machine. Any other ideas would be
> nicer... :-)
That said, any idea which paths I can restrict the bisect to? Maybe I'll try to run it at some point.
Ok... I started to bisect, but instead of compiling the fedora config I used "make localmodconfig". I'm on 4.17-rc5 now and the issue isn't happening, though I was reasonably sure that it would happen here. I'm compiling 4.19-rc again with my current config to see if it's just the config ... or if it reproduces there. Any ideas how the config might affect it? Like I said, I'm not 100% certain it previously occurred on 4.17 with Fedora config, but I thought it did. Ok, hmmm. This does seem to depend on the kernel .config, now with the current config ("make localmodconfig") on the same DRM tip tree (commit 90b59df999a1) it hasn't happened yet in a few minutes, which would be almost impossible with the broken kernel... One (perhaps significant) difference that I notice it that this kernel now shows the 4 boot-time penguins, which is not the case on Fedora's config. Any thoughts as to what Kconfig knobs might affect this that I can play with? I can't really bisect if it's a Kconfig issue, and only happens on Fedora's config - that's too big to bisect with. If I can reproduce with a smaller config (and then not reproduce on older kernels) I can attempt the bisect again. Not sure what Kconfig option would affect this issue.
As another approach to narrow down the problem, could you try - right after triggering the problem - disabling the low-power fifo mode and see if the problem is still reproducible? Please also provide the output for the script:
# cd /sys/kernel/debug/dri/0
# for plane in pri cur spr; do
> cat i915_${plane}_wm_latency
> wm0=$(head -1 i915_${plane}_wm_latency|cut -d' ' -f2)
> echo $wm0 1000 1000 1000 1000 > i915_${plane}_wm_latency
> done
Johannes, Have you tried Imre's suggestion? (In reply to Lakshmi from comment #15) > Johannes, Have you tried Imre's suggestion? Not yet, unfortunately - I'd still been trying to figure out why my new kernel .config doesn't exhibit the issue, but yeah, I should do that. Perhaps tonight, when I'm off work. FWIW, I don't actually know of a way of *triggering* this. It seems to just happen all by itself, sometimes with lots of screen activity, sometimes with none at all.
I noticed that this message *sometimes* seems to coincide with the issue:
[drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe C FIFO underrun
but I suppose that's just be a symptom of the issue, rather than the cause, since it doesn't *always* happen.
Your script doesn't actually work:
jberg1-mobl2:/sys/kernel/debug/dri/0# for plane in pri cur spr; do
> cat i915_${plane}_wm_latency
> wm0=$(head -1 i915_${plane}_wm_latency|cut -d' ' -f2)
> echo $wm0 1000 1000 1000 1000 > i915_${plane}_wm_latency
> done
WM0 2 (2.0 usec)
WM1 19 (19.0 usec)
WM2 28 (28.0 usec)
WM3 32 (32.0 usec)
WM4 63 (63.0 usec)
WM5 77 (77.0 usec)
WM6 83 (83.0 usec)
WM7 99 (99.0 usec)
bash: echo: write error: Invalid argument
WM0 2 (2.0 usec)
WM1 19 (19.0 usec)
WM2 28 (28.0 usec)
WM3 32 (32.0 usec)
WM4 63 (63.0 usec)
WM5 77 (77.0 usec)
WM6 83 (83.0 usec)
WM7 99 (99.0 usec)
bash: echo: write error: Invalid argument
WM0 2 (2.0 usec)
WM1 19 (19.0 usec)
WM2 28 (28.0 usec)
WM3 32 (32.0 usec)
WM4 63 (63.0 usec)
WM5 77 (77.0 usec)
WM6 83 (83.0 usec)
WM7 99 (99.0 usec)
bash: echo: write error: Invalid argument
(In reply to Johannes Berg from comment #17) > FWIW, I don't actually know of a way of *triggering* this. It seems to just > happen all by itself, sometimes with lots of screen activity, sometimes with > none at all. > > I noticed that this message *sometimes* seems to coincide with the issue: > > [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe C FIFO > underrun > > but I suppose that's just be a symptom of the issue, rather than the cause, > since it doesn't *always* happen. It could be a marker that we program wrong watermark levels (which I'd like to test with my script). > Your script doesn't actually work: > > jberg1-mobl2:/sys/kernel/debug/dri/0# for plane in pri cur spr; do > > cat i915_${plane}_wm_latency > > echo $wm 1000 1000 1000 1000 > i915_${plane}_wm_latency > > done Ah sorry, you have SKL and so 8 watermark levels not 5. The following should work better, could you try it? Forgot to say, but after running the script you also have to force a display modeset, for example by making both displays blank then unblank. Then just leave it running and see if any FIFO underrun message shows up or if the displays flicker. Thanks. # for plane in pri cur spr; do > echo 20 500 500 500 500 500 500 500 > i915_${plane}_wm_latency > done Created attachment 142237 [details]
config without this problem
Created attachment 142238 [details] config that doesn't boot So I've been trying to figure out why one .config works, and another doesn't. In the process, I've arrived at the previously attached .config (config-working.txt) that doesn't exhibit the issue, but I don't have working sound. Note that this config also doesn't exhibit bug 108462. Now, since I was trying to _also_ make sound work (since I'm actually fairly happy to have a working .config, except it didn't have sound which is annoying), I've slowly been enabling sound options. My previously attached config-working.txt had some sound options enabled, but still no sound. Now ... with config-notbooting.txt that I just attached, I've really only changed sound configuration: +CONFIG_REGMAP_I2C=m +CONFIG_SND_HDA_EXT_CORE=m +CONFIG_SND_SOC=m +CONFIG_SND_SOC_TOPOLOGY=y +CONFIG_SND_SOC_ACPI=m +CONFIG_SND_SOC_INTEL_SST_TOPLEVEL=y +CONFIG_SND_SOC_INTEL_SST=m +CONFIG_SND_SOC_INTEL_SKYLAKE=m +CONFIG_SND_SOC_ACPI_INTEL_MATCH=m +CONFIG_SND_SOC_INTEL_MACH=y +CONFIG_SND_SOC_I2C_AND_SPI=m REGMAP_I2C got pulled in extra, apparently. Here's the weird part: this .config doesn't even boot properly. The external screens are not initialized at all, and even the internal panel doesn't get the right resolution in the fedora boot/disk password screen!! It also doesn't boot fully, I get to enter my password but it doesn't get to the display manager. Looks like I have a choice between * kernel with working graphics and no sound * kernel with flickering graphics but sound * kernel 4.13 (or perhaps 4.15?) Can anyone explain to me why selecting the sound options should have any impact on the graphics? Clearly it does though ... I can file a separate bug on that though, if you prefer. (In reply to Johannes Berg from comment #20) > In the process, I've arrived at the previously attached .config > (config-working.txt) that doesn't exhibit the issue, but I don't have > working sound. Note that this config also doesn't exhibit bug 108462. Sorry, I meant bug 108460. johannes > Ah sorry, you have SKL and so 8 watermark levels not 5. The following should > work better, could you try it? Forgot to say, but after running the script > you also have to force a display modeset, for example by making both > displays blank then unblank. Then just leave it running and see if any FIFO > underrun message shows up or if the displays flicker. Thanks. It looks like that did indeed help. It's been running for a few minutes without showing the error, and that would've been highly unlikely with the situation before. Still on the same DRM commit mentioned in comment #1, fwiw. Tell me how _this_ is related to kernel .config though? Imre, any comments here? (In reply to Lakshmi from comment #23) > Imre, any comments here? I think we should check for missing SKL workarounds related to watermark programming. Ville has said that we are missing a few of those. Ville, any changes are pushed to drm-tip that helps this issue? Hi Johannes, I tried to reproduce the issue at my end with ubuntu16.04 using DRM-TIP(4.20_rc1) kernel, with 2 displays(eDP+HDMI) connected. I also set the audio parameters in the config file as mentioned in the bug, but I'm unable to reproduce the issue. Could you please provide the ftrace together with register trace enabled. (echo 1 > /sys/kernel/debug/tracing/events/i915/i915_reg_rw/enable) > Ah sorry, you have SKL and so 8 watermark levels not 5. The following should work better, could you try it? Imre, your workaround script from Comment 18 helps with bug 103229 (internal screen flicker on same laptop). Karthik and Imre, if possible, could you please look into bug 103229? Created attachment 142533 [details]
trace-cmd recording of i915_reg_rw
Sorry for the delay, Karthik, here's the trace you requested. I think. Only one of the screens went blank towards the end of the file.
If you have something else in mind, I'd appreciate a full trace-cmd record command line.
FWIW, I'm not surprised you're not able to reproduce this, I myself am having a very hard time reproducing on a kernel that doesn't use fedora's configuration.
Hi, Sorry for the delay in reply. I actually tried to reproduce the bug at our end multiple times, but have not been successful till now. Also I'm having some issue with the .dat file format, the file I have seems partially corrupted. A .txt file would suffice. I've narrowed the ftrace to 4 functions so that the buffer doesn't get overwritten. Could you please run the below steps. echo 0 > /sys/kernel/debug/tracing/tracing_on echo nop > /sys/kernel/debug/tracing/current_tracer echo "intel_atomic_commit" "intel_atomic_commit_tail" "intel_cpu_fifo_underrun_irq_handler" "gen8_de_irq_handler" > /sys/kernel/debug/tracing/set_ftrace_filter echo function > /sys/kernel/debug/tracing/current_tracer echo 0 > /sys/kernel/debug/tracing/events/enable echo 1 > /sys/kernel/debug/tracing/events/i915/i915_reg_rw/enable echo 1 > /sys/kernel/debug/tracing/tracing_on And once you've seen the flicker, dump the trace into a log file. cat /sys/kernel/debug/tracing/trace > trace.log Ok, I'll try to do that - in the meantime I'm attaching the parsed version of the trace, but it looks like the function stuff didn't get recorded?! Created attachment 142709 [details]
trace.dat parsed
(In reply to Johannes Berg from comment #30) > Ok, I'll try to do that - in the meantime I'm attaching the parsed version > of the trace, but it looks like the function stuff didn't get recorded?! Yea, looks like it didn't get recorded. I believe it would be easier to pin point the set of reg read/write which actually caused the error if we have the function calls recorded as well. (In reply to Karthik B S from comment #32) > Yea, looks like it didn't get recorded. I believe it would be easier to pin > point the set of reg read/write which actually caused the error if we have > the function calls recorded as well. Yep, fair enough. I'll work on it later, too busy with other things now to reboot etc. Created attachment 142784 [details]
requested trace.log
Sorry for the delay, finally here's the requested trace.log.
I can't really understand anything from it though, tbh :-)
Hi, We have logs only for some 20s, looks like the buffer is getting overwritten. So I'm not getting the function call I wanted. Assuming that the error occurred at some point, I'm looking for "intel_cpu_fifo_underrun_irq_handler" and the commit just before this, causing the underrun. I tried to debug regardless, but it is a very difficult to find the commit causing this issue considering that there are 500 odd commits in the trace. Can you please try having a script running in background to keep dumping the trace to a log file(or may be different files if there's too much logs) just to ensure we're not missing any trace and with this we'll try to catch the function call for fifo underrun. Created attachment 142977 [details]
continuous trace log
Here's a file captured via trace_pipe. Only one of the screens flickered during this time, I'll see if I can capture one where both do.
(In reply to Johannes Berg from comment #36) > Created attachment 142977 [details] > continuous trace log > > Here's a file captured via trace_pipe. Only one of the screens flickered > during this time, I'll see if I can capture one where both do. Note that I *didn't* see a corresponding "FIFO underrun" message in dmesg for this ... so maybe the reason here is something else? Created attachment 142980 [details]
continuous trace log - both screens going blank
Apologies for shifting the goal-posts, but I just realized that these latest traces were actually captured on 4.19.13-200.fc28.x86_64 (Fedora), rather than the DRM tip or the original Fedora kernel. Let me know if you need me to reproduce on a particular kernel version. I went through both the logs and checked the DDB /watermark register write's. They look fine, although I see the WM registers only for Pipe A. Somehow it seems like many reg read/write's are missing. Also I don't see any read/write of the Plane control registers and without that it would not be possible to verify the correctness of DDB allocation. I would need to bug you for more logs again and this would be never ending, instead I think it would be better if I'm able to reproduce the issue locally. So we'll give one last try for the same. Firstly it would be good for me if it would be possible to reproduce to the issue on DRM-TIP. And also please share the config file you used with the kernel together with the resolutions of both the displays. And I believe the issue is only reproduced on the display config mentioned by you right? * one directly by USB-C display port cable * one via docking station connected with HDMI Hopefully I'm able to reproduce the issue locally time time and then root cause the issue asap. (In reply to Karthik B S from comment #40) > I went through both the logs and checked the DDB /watermark register > write's. They look fine, although I see the WM registers only for Pipe A. > Somehow it seems like many reg read/write's are missing. > Also I don't see any read/write of the Plane control registers and without > that it would not be possible to verify the correctness of DDB allocation. We no longer have the tracepoint for most plane register writes. It was (somewhat unintentionally) removed as part of dd584fc0711a ("drm/i915: Use I915_READ_FW for plane updates"). Karthik, any further updates here? (In reply to Lakshmi from comment #42) > Karthik, any further updates here? No, we're not able to reproduce it locally. As mentioned earlier, it would be helpful if we are able get config file used and the details for reproducing the bug, as logs aren't helping much. Johannes, can you please attach the config file which caused the issue as mentioned in the description? Also, can you mention the display resolution that was set when the issue occurred? I have a seemingly related bug, when switching to tty causes system to hang for some period of time. Sometimes tty comes up, and you can see [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun in dmesg. Configuration: Dell Latitude 7490 GPU: Intel UHD Graphics 620 (Kabylake GT2). Kernel: 4.20.0-trunk-amd64 (from Debian experimental). Displays: 2 external monitors, one connected over USB-C (display port connection), and other daisy chained from the first through DisplayPort cable. I keep laptop screen off, so using only two external monitors (KDE Plasma 5.14.5). You can find config here: https://salsa.debian.org/kernel-team/linux/blob/debian/4.20-1_exp1/debian/config/config Often the freeze is hard, and the only way to unfreeze it is Alt+SysRq+REISUB or complete hard reboot. Some more details for my case above. External monitors resolution (both): 2560x1440 Laptop screen resolution: 1920x1080 (kept off). Displays are daisy chained like this: Laptop -> (USB-C) -> Dell U2715H -> (DP) -> Dell U2713HM. 2560x1440 DisplayPort 1.2 mode is enabled on Dell U2715H, while Dell U2713HM doesn't have such setting in on-screen UI. To add to the above, this also happens sometimes when external monitors simply go to sleep normally due to inactivity (after set KDE time period). After that it's not possible to wake them up (or even laptop screen), without rebooting the system. Johannes, can you please attach the config file which caused the issue as mentioned in the description? Can you address Comment 43? This information would be very helpful to proceed with the investigation. I'm sorry, I've been completely swamped. I'll attach the config file, as for other details, I think you asked for details on the configuration. So, I have: 1) A Dell U2312HM running at 1920x1080 @ 60Hz 2) A Dell U2311H running at 1920x1080 @ 60Hz 3) internal display running at 1920x1080 Both are connected using DP, but one is connected directly to the laptop (USB-C connector) and the other is connected via a dock as mentioned before. I said before it was connected on HDMI, but it is connected on DP now. Didn't change anything. I've been running with the workaround: for plane in pri cur spr; do echo 20 500 500 500 500 500 500 500 > i915_${plane}_wm_latency ; done which makes it not be an issue for me. Created attachment 143673 [details]
kernel config file with the issue
This is the config file I use right now. Previously, when I rebuilt drm-tip, I was able to reproduce it with this config, but not with an arbitrary locally generated one.
(In reply to Johannes Berg from comment #49) > I've been running with the workaround: > > for plane in pri cur spr; do echo 20 500 500 500 500 500 500 500 > > i915_${plane}_wm_latency ; done > > which makes it not be an issue for me. What is the correct format for setting those values? That's what I see there now: for plane in pri cur spr; do cat /sys/kernel/debug/dri/0/i915_${plane}_wm_latency; done WM0 2 (2.0 usec) WM1 19 (19.0 usec) WM2 28 (28.0 usec) WM3 32 (32.0 usec) WM4 63 (63.0 usec) WM5 77 (77.0 usec) WM6 83 (83.0 usec) WM7 99 (99.0 usec) WM0 2 (2.0 usec) WM1 19 (19.0 usec) WM2 28 (28.0 usec) WM3 32 (32.0 usec) WM4 63 (63.0 usec) WM5 77 (77.0 usec) WM6 83 (83.0 usec) WM7 99 (99.0 usec) WM0 2 (2.0 usec) WM1 19 (19.0 usec) WM2 28 (28.0 usec) WM3 32 (32.0 usec) WM4 63 (63.0 usec) WM5 77 (77.0 usec) WM6 83 (83.0 usec) WM7 99 (99.0 usec) That doesn't match the format in your example above. I tried this workaround: for plane in pri cur spr; do echo 20 500 500 500 500 500 500 500 > i915_${plane}_wm_latency ; done But it's causing problems in my setup, (daisy chained monitor becomes unstable, and flickers). I had to revert to previous values: for plane in pri cur spr; do echo 2 19 28 32 63 77 83 99 > i915_${plane}_wm_latency ; done (In reply to Johannes Berg from comment #49) > I'm sorry, I've been completely swamped. > > I'll attach the config file, as for other details, I think you asked for > details on the configuration. > > So, I have: > 1) A Dell U2312HM running at 1920x1080 @ 60Hz > 2) A Dell U2311H running at 1920x1080 @ 60Hz > 3) internal display running at 1920x1080 > > Both are connected using DP, but one is connected directly to the laptop > (USB-C connector) and the other is connected via a dock as mentioned before. > I said before it was connected on HDMI, but it is connected on DP now. > Didn't change anything. > > I've been running with the workaround: > > for plane in pri cur spr; do echo 20 500 500 500 500 500 500 500 > > i915_${plane}_wm_latency ; done > > which makes it not be an issue for me. Tried to repro the bug with 2 external DP panels at 2k@60 and one eDP panel at 2k@60, using the config file provided on DRM-TIP(5.0.0-rc5+). Still not seeing any under runs. I tried video playback on 2 displays for 4-5 hours and also the suspend resume scenario without any success. Any particular workload or sequence you can suggest, which might cause this issue in particular? (In reply to Shmerl from comment #45) > I have a seemingly related bug, when switching to tty causes system to hang > for some period of time. Sometimes tty comes up, and you can see > > [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO > underrun > > in dmesg. > > Configuration: > Dell Latitude 7490 > GPU: Intel UHD Graphics 620 (Kabylake GT2). > Kernel: 4.20.0-trunk-amd64 (from Debian experimental). > > Displays: 2 external monitors, one connected over USB-C (display port > connection), and other daisy chained from the first through DisplayPort > cable. I keep laptop screen off, so using only two external monitors (KDE > Plasma 5.14.5). > > You can find config here: > https://salsa.debian.org/kernel-team/linux/blob/debian/4.20-1_exp1/debian/ > config/config > > Often the freeze is hard, and the only way to unfreeze it is > Alt+SysRq+REISUB or complete hard reboot. Yet to try this out on Kabylake, will check it with the mentioned config and get back. (In reply to Karthik B S from comment #53) > Tried to repro the bug with 2 external DP panels at 2k@60 and one eDP panel > at 2k@60, using the config file provided on DRM-TIP(5.0.0-rc5+). Still not > seeing any under runs. Hmm. Maybe it's as machine-specific thing? Or maybe it's important that there's the DP-hub inbetween? > I tried video playback on 2 displays for 4-5 hours and also the suspend > resume scenario without any success. > > Any particular workload or sequence you can suggest, which might cause this > issue in particular? Nope, sorry. For me it usually starts showing up as soon as I log in, sometimes before, sometimes a little later. But it's not even that I need to play video or do something else. I'm on Fedora but still run X (not wayland) with gnome shell, but that's it... > Maybe it's as machine-specific thing? With bug 103229 this indeed looks like machine-specific thing. Question regarding bugreport status NEEDINFO - what other information is needed? @Karthik, any updates on this issue? (In reply to Lakshmi from comment #57) > @Karthik, any updates on this issue? Unfortunately no. The last update is that I'm not able to reproduce it. After all the futile attempts at reproducing the issue, it looks like this is a machine specific issue. > it looks like this is a machine specific issue What is further action is possible in this case? Is it possible to request HP to get access to certain hardware? It's seems like this was done for example in this case: https://bugzilla.kernel.org/show_bug.cgi?id=201579#c8 (In reply to Karthik B S from comment #58) > (In reply to Lakshmi from comment #57) > > @Karthik, any updates on this issue? > > Unfortunately no. The last update is that I'm not able to reproduce it. > After all the futile attempts at reproducing the issue, it looks like this > is a machine specific issue. Let's take this internally then (look me up in the Intel address book), maybe we have more such machines and can provide one, or maybe I can swap machines and send you this one, or something like that. Or maybe I can put the system on the VPN and let you look at it that way. (In reply to Johannes Berg from comment #60) > (In reply to Karthik B S from comment #58) > > (In reply to Lakshmi from comment #57) > > > @Karthik, any updates on this issue? > > > > Unfortunately no. The last update is that I'm not able to reproduce it. > > After all the futile attempts at reproducing the issue, it looks like this > > is a machine specific issue. > > Let's take this internally then (look me up in the Intel address book), > maybe we have more such machines and can provide one, or maybe I can swap > machines and send you this one, or something like that. > > Or maybe I can put the system on the VPN and let you look at it that way. Sure. Karthik, when you get access to hardware, please also look into Bug 111201 - this is another screen flicker issue specific to HP EliteBook Folio G1. If you have a chance, check also Dell Latitude 7490 please, which I mentioned above. It also has the same issue. I'm almost done doing a long and tiring config bisect now, using Steven's awesome script tools/testing/ktest/config-bisect.pl ... But one thing I just noticed, on the drm-tip kernel I got this WARN_ON a lot? [ 289.339478] WARNING: CPU: 2 PID: 2200 at drivers/gpu/drm/i915/intel_pm.c:4395 skl_allocate_pipe_ddb+0xa2f/0xb60 [i915] [ 289.339481] ---[ end trace f63ed9bc71cfc7ae ]--- [ 289.339496] ------------[ cut here ]------------ [ 289.339498] WARN_ON(wm->wm[level].min_ddb_alloc > total[PLANE_CURSOR]) it's possible, however, that this occurred because I set the watermark levels as per a comment above... The warnings seem quite possibly unrelated, but this is a new print I hadn't seen before: [ 5138.452734] [drm] HPD interrupt storm detected on connector eDP-1: switching from hotplug detection to polling (but then again, I never let it flicker for this long!) Ah, the warnings do happen when I enter the script from comment #18... Just updated to kernel 5.2.6 on my Dell Latitude 7490, also using latest i915 firmware. This hang still happens, as soon as KDE puts monitors to sleep, they never wake up after that, and I need to reboot the computer. (In reply to Shmerl from comment #67) > Just updated to kernel 5.2.6 on my Dell Latitude 7490, also using latest > i915 firmware. This hang still happens, as soon as KDE puts monitors to > sleep, they never wake up after that, and I need to reboot the computer. Can you attach the dmesg from boot with kernel parameters drm.debug=0x1e log_buf_len=4M? (In reply to Lakshmi from comment #68) > > Can you attach the dmesg from boot with kernel parameters drm.debug=0x1e > log_buf_len=4M? Just installed kernel 5.3-rc5, and looks like sleep mode for the monitors worked OK at least one time. I'll see if it works consistently now, and will comment. If it hangs again, I'll post dmesg. Thanks! Yep, after using for a while, I can confirm that the issue is gone for me (kernel 5.3-rc5). (In reply to Shmerl from comment #70) > Yep, after using for a while, I can confirm that the issue is gone for me > (kernel 5.3-rc5). Johannes, can you please confirm if this issue is still happening to you? If not, can I close this bug? (In reply to Lakshmi from comment #71) > Johannes, can you please confirm if this issue is still happening to you? If > not, can I close this bug? I rebuilt drm-tip as of an hour ago or so (commit 244c5c8116c0) and the issue definitely *is* still happening. Actually, it got MUCH worse. Not only do screens (all, including the internal) keep flickering/turning on and off with that kernel like before, but it also loses one of the external screens entirely. What happens is that with this new kernel, going into gdm3 turns off both external displays. This does NOT happen with gdm3 with a different kernel, including rc1, there they are turned on (and keep flickering) in gdm3. When I log in, only one external display is turned on again. Note that a similar behaviour happened before - if I were warm-rebooting the machine, only one external display would turn on after the reboot. A cold reboot turned on both external screens reliably. So the "can only turn on one of the two after two had been turned off" was already there, but what's new is that without any changes other than the kernel, gdm3 now turns off both external screens on the login screen. Much worse, because now with this kernel I can only use one external screen at all (except during boot, well...) Just for the reference, I'm using KDE Plasma / sddm. (In reply to Johannes Berg from comment #73) > Actually, it got MUCH worse. > > Not only do screens (all, including the internal) keep flickering/turning on > and off with that kernel like before, but it also loses one of the external > screens entirely. > > What happens is that with this new kernel, going into gdm3 turns off both > external displays. This does NOT happen with gdm3 with a different kernel, > including rc1, there they are turned on (and keep flickering) in gdm3. When > I log in, only one external display is turned on again. > > Note that a similar behaviour happened before - if I were warm-rebooting the > machine, only one external display would turn on after the reboot. A cold > reboot turned on both external screens reliably. So the "can only turn on > one of the two after two had been turned off" was already there, but what's > new is that without any changes other than the kernel, gdm3 now turns off > both external screens on the login screen. > > Much worse, because now with this kernel I can only use one external screen > at all (except during boot, well...) Can you please attach the logs as you verified the issue on drmtip? (In reply to Lakshmi from comment #75) > Can you please attach the logs as you verified the issue on drmtip? Sure, but can you remind me which logs you'd want in this case, and how to capture them? Like in comments 26ff? Oh, and also, unless you can tell me how to capture this from boot, I'm not sure I could capture the gdm3 issue since it boots into that directly? (In reply to Johannes Berg from comment #76) > (In reply to Lakshmi from comment #75) > > > Can you please attach the logs as you verified the issue on drmtip? > > Sure, but can you remind me which logs you'd want in this case, and how to > capture them? Like in comments 26ff? For now Dmesg from boot is required. You can get it from cd /var/log. Created attachment 145242 [details]
requested dmesg from new drm-tip commit 244c5c8116c0042d61455697a9d85e899e2d9267
This is from drm-tip 244c5c8116c0042d61455697a9d85e899e2d9267 that I compiled the other day, sorry for the delay.
Nothing really stands out in the log though.
Remember this is now with the second external display turned off, as gdm3 disabled both external displays on this kernel and only one came back after logging in - nothing really is shown though in the log pertaining to this.
(In reply to Johannes Berg from comment #79) > Created attachment 145242 [details] > requested dmesg from new drm-tip commit > 244c5c8116c0042d61455697a9d85e899e2d9267 > > This is from drm-tip 244c5c8116c0042d61455697a9d85e899e2d9267 that I > compiled the other day, sorry for the delay. > > Nothing really stands out in the log though. > > Remember this is now with the second external display turned off, as gdm3 > disabled both external displays on this kernel and only one came back after > logging in - nothing really is shown though in the log pertaining to this. Thanks. I can see underruns from the logs. *ERROR* CPU pipe B FIFO underrun. Can you please attach the same logs with kernel parameters drm.debug=0x1e log_buf_len=4M. This will show more information about he issue. Actually, while it's better the issue is not totally gone in my case. At least monitors wake up now most of the time, but one time I got only one monitor waking up and not another. I also see this in dmesg (relatively recent messages): [155348.188820] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun [169094.717384] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun [331635.239381] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun I'll post detailed dmesg from boot with drm.debug=0x1e a bit later. Created attachment 145261 [details]
requested dmesg from new drm-tip commit 244c5c8116c0042d61455697a9d85e899e2d9267 (with drm.debug=0x1e)
(In reply to Johannes Berg from comment #82) > Created attachment 145261 [details] > requested dmesg from new drm-tip commit > 244c5c8116c0042d61455697a9d85e899e2d9267 (with drm.debug=0x1e) Thanks for gathering the logs from drmtip. There are some underruns here 7.734338] [drm:gen8_de_irq_handler [i915]] hotplug event received, stat 0x00200000, dig 0x10101011, pins 0x00000020, long 0x00000000 [ 7.734378] [drm:intel_hpd_irq_handler [i915]] digital hpd port B - short [ 7.734437] [drm:intel_dp_hpd_pulse [i915]] got hpd irq on port B - short [ 7.736611] [drm:intel_dp_hpd_pulse [i915]] got esi 01 10 00 [ 7.744269] [drm:intel_dp_hpd_pulse [i915]] got esi2 01 00 00 [ 7.744307] [drm:intel_dp_hpd_pulse [i915]] got esi 01 00 00 [ 7.750579] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun [ 7.750676] [drm:intel_fbc_underrun_work_fn [i915]] Disabling FBC due to FIFO underrun. @Ville, help here? Just want to remind - there is exactly same laptop in Bug 111201 with exactly same error in log: [ 152.025695] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun [ 152.025858] [drm:intel_fbc_underrun_work_fn [i915]] Disabling FBC due to FIFO underrun. Created attachment 145335 [details]
Debug log with underrun errors
Attaching log from /var/log/debug with underrun messages. Produced after booting with drm.debug=0x1e
(In reply to Karthik B S from comment #61) > (In reply to Johannes Berg from comment #60) > > (In reply to Karthik B S from comment #58) > > > (In reply to Lakshmi from comment #57) > > > > @Karthik, any updates on this issue? > > > > > > Unfortunately no. The last update is that I'm not able to reproduce it. > > > After all the futile attempts at reproducing the issue, it looks like this > > > is a machine specific issue. > > > > Let's take this internally then (look me up in the Intel address book), > > maybe we have more such machines and can provide one, or maybe I can swap > > machines and send you this one, or something like that. > > > > Or maybe I can put the system on the VPN and let you look at it that way. > > Sure. @Karthik, any further updates here? There are more updated logs attached under this bug. I was trying to debug it on the system provided by Johannes, but couldn't make much progress,(In reply to Lakshmi from comment #86) > (In reply to Karthik B S from comment #61) > > (In reply to Johannes Berg from comment #60) > > > (In reply to Karthik B S from comment #58) > > > > (In reply to Lakshmi from comment #57) > > > > > @Karthik, any updates on this issue? > > > > > > > > Unfortunately no. The last update is that I'm not able to reproduce it. > > > > After all the futile attempts at reproducing the issue, it looks like this > > > > is a machine specific issue. > > > > > > Let's take this internally then (look me up in the Intel address book), > > > maybe we have more such machines and can provide one, or maybe I can swap > > > machines and send you this one, or something like that. > > > > > > Or maybe I can put the system on the VPN and let you look at it that way. > > > > Sure. > > @Karthik, any further updates here? There are more updated logs attached > under this bug. I tried to debug it over VPN on the system provided by Johannes, but couldn't make much progress. I'll look into the new logs and provide an update. Just updated to latest UEFI firmware for my Dell Latitude 7490. The problem is still preset. Just got such error in dmesg (this time there is an additional detail about DisplayPort payload): 877.679147] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun [ 885.611061] [drm:intel_mst_enable_dp [i915]] *ERROR* Timed out waiting for ACT sent [ 901.774749] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to get link status [ 907.540929] [drm:intel_mst_disable_dp [i915]] *ERROR* failed to update payload -22 Could be because I was plugging in and unplugging USB-C cable that routes DP signal to two external monitors. Sometimes it helps to work around the problem. Ping, anything we should do here? If you happen to have any colleagues in Israel who might be able to take a look - I'll be going there in a month or so, ping me internally. Interesting idea. I will visit Germany in a couple of months. Is anyone from Intel can take a look at issue described in Bug 111201 in Germany in December? Well, I live in Germany and work for Intel, but so far that hasn't helped me ;-) There has been lately changes with DP-MST so would you be able to test with latest drm-tip and add logs from drm-tip from dmesg with drm.debug=0x1e log_buf_len=4M if problem still exists. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/175. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.