Bug 111517

Summary: No screen issues for extended period on xorg/lxde (HP systems)
Product: DRI Reporter: Ferry <freaky>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED MOVED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: high CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: Triaged, ReadyForDev
i915 platform: SNB i915 features: display/Other
Description Flags
dmesg with drm.debug=0x1e -- XZ compressed
Log on 20190926 with drm-tip (5.3.0-rc8+) before X is displayed
Log on 20190926 with drm-tip (5.3.0-rc8+) after X is displayed (did move the mouse after a while) none

Description Ferry 2019-08-29 12:03:21 UTC
Created attachment 145206 [details]
dmesg with drm.debug=0x1e -- XZ compressed

Hi there,

I'm not sure where to file this I'm afraid, I think it's supposed to go here.

We build a kiosk like distribution for taking exams with. It's based of Fedora 30 currently and runs Xorg with LXDE. We have no issues with stock Fedora 30 (which used Wayland/Gnome) on these machines. Besides our application (modified browser), all packages are from Fedora. Some config files have been modified and Xorg/LXDM isn't the default on Fedora, but they are in their repo's.

Our modified Fedora works fine nearly everywhere, but on some older HP machines GUI will not appear for an extended amount of time (Xorg, console is fine). After you see the LXDM service starting the console seems stuck there. After waiting 10-20 minutes GUI will appear and all seems well.

On some of the systems the GUI instantly comes to live after moving the mouse a few seconds after LXDM has started. Presume the mouse forces a redraw or something which makes it come to live.

The systems all concern older HP systems, 8200, 8300, & 6560B so far with various BIOS versions. Newer BIOS versions actually seem to have more issues.

On the 8200's the mouse will activate GUI, it does not on the 8300's from what we got. We have a 8200 to test with, we don't have 8300's ourselves but I can ask people to test those (more cumbersome tho' as there are several parties in between).

In order to gather logging we have a script active, if users plug in a USB stick which contains a file with a certain name it will output logs to the stick. This process also runs xrandr, which also seems to 'unfreeze?' the GUI.

We have tested with a lot of kernel versions (all stock Fedora kernels, although some come from older Fedora versions, we only replace the kernel (and the modules in /lib/modules oc) and nothing else).

With 4.16.x they all seem to work without any issues.
With 4.17.x about half works without issues, most of the time.
4.18 and higher they all exhibit issues.

I'm not sure where to start as the systems work fine when using Fedora 30 live, but then Wayland and gnome are used which probably address the GPU very differently.

As reverting the kernel to 4.16 solves it, my first guess would be the kernel driver. There's nothing in the logs that are indicative of the source, not to me anyways.

Hope we can tackle this. I've attached logs from a system, I moved the mouse on it to activate the GUI after waiting some time. Did pass along  drm.debug=0x1e log_buf_len=1M as mentioned on https://01.org/linuxgraphics/documentation/how-report-bugs

I can provide logging, test stuff, compile things, etc., but no clue where to start here I'm afraid, C driver code is beyond my skills. Hope someone can help us narrow this down.

Thanks in advance!
Comment 1 Lakshmi 2019-08-30 08:02:01 UTC
Is the external monitor connected through DP?
How did you recover from the situation, what steps you have followed to make GUI appear?
Can you verify the issue with drmtip?(https://cgit.freedesktop.org/drm-tip)
If issue appears can you please attach the logs?
Comment 2 Ferry 2019-09-10 08:34:53 UTC

I'll be there this Thursday. I'll test with the drm tip branch. Anything in particular you'd like me to try / debug parameters?

Most systems have the monitor connected through DP cables, but there's also customers with VGA or DVI connections that have the issues.

I don't have to follow any steps to make the GUI appear. Either wait 10-20 minutes and it will come on it's own (perhaps due to something triggering a redraw - I don't know) or move the mouse.

On some systems moving the mouse doesn't help, they just need to wait a long time.
Comment 3 Lakshmi 2019-09-11 08:00:37 UTC
(In reply to Ferry from comment #2)
> Hi,
> I'll be there this Thursday. I'll test with the drm tip branch. Anything in
> particular you'd like me to try / debug parameters?
You can collect dmesg logs from boot with kernel parameters drm.debug=0x1e log_buf_len=4M. This will show more information.
Comment 4 Ferry 2019-09-11 11:04:14 UTC
Hi, thanks.

Was already using those (well the buffer is listed as 1M on the debug page, not 4M). Thing is more that I'm only there once every 2 weeks so there's quite an interval on when I can test unfortunately.

I'll build drm-tip tomorrow and use the drm.debug=0x1e log_buf_len=4M parameters and report back.
Comment 5 Ferry 2019-09-26 12:13:23 UTC
Created attachment 145524 [details]
Log on 20190926 with drm-tip (5.3.0-rc8+) before X is displayed
Comment 6 Ferry 2019-09-26 12:13:47 UTC
Created attachment 145525 [details]
Log on 20190926 with drm-tip (5.3.0-rc8+) after X is displayed (did move the mouse after a while)
Comment 7 Ferry 2019-09-26 12:17:08 UTC

it took a while longer unfortunately. Had build the kernel last time, so it's 2 weeks old today (the drm-tip branch used).

Didn't make any difference unfortunately. Exported the logs twice, once before we had screen and once after. The latter one contains all the info in the first one as well.

Our software outputs some logging back to syslog in case you're wondering why messages seem to reappear after a while (after screen became active). This is done because it normally logs to a network syslog server after the network has become active and the client has connected to an exam server. As it takes a while before the network is active, and the syslog is reconfigured after that, some logs are re-outputted to syslog in order to ship them to the remote end. It's not an issue with the system.

TIA & kind regards :)
Comment 8 Martin Peres 2019-11-29 19:25:28 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/intel/issues/387.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.