Created attachment 145206 [details]
dmesg with drm.debug=0x1e -- XZ compressed
I'm not sure where to file this I'm afraid, I think it's supposed to go here.
We build a kiosk like distribution for taking exams with. It's based of Fedora 30 currently and runs Xorg with LXDE. We have no issues with stock Fedora 30 (which used Wayland/Gnome) on these machines. Besides our application (modified browser), all packages are from Fedora. Some config files have been modified and Xorg/LXDM isn't the default on Fedora, but they are in their repo's.
Our modified Fedora works fine nearly everywhere, but on some older HP machines GUI will not appear for an extended amount of time (Xorg, console is fine). After you see the LXDM service starting the console seems stuck there. After waiting 10-20 minutes GUI will appear and all seems well.
On some of the systems the GUI instantly comes to live after moving the mouse a few seconds after LXDM has started. Presume the mouse forces a redraw or something which makes it come to live.
The systems all concern older HP systems, 8200, 8300, & 6560B so far with various BIOS versions. Newer BIOS versions actually seem to have more issues.
On the 8200's the mouse will activate GUI, it does not on the 8300's from what we got. We have a 8200 to test with, we don't have 8300's ourselves but I can ask people to test those (more cumbersome tho' as there are several parties in between).
In order to gather logging we have a script active, if users plug in a USB stick which contains a file with a certain name it will output logs to the stick. This process also runs xrandr, which also seems to 'unfreeze?' the GUI.
We have tested with a lot of kernel versions (all stock Fedora kernels, although some come from older Fedora versions, we only replace the kernel (and the modules in /lib/modules oc) and nothing else).
With 4.16.x they all seem to work without any issues.
With 4.17.x about half works without issues, most of the time.
4.18 and higher they all exhibit issues.
I'm not sure where to start as the systems work fine when using Fedora 30 live, but then Wayland and gnome are used which probably address the GPU very differently.
As reverting the kernel to 4.16 solves it, my first guess would be the kernel driver. There's nothing in the logs that are indicative of the source, not to me anyways.
Hope we can tackle this. I've attached logs from a system, I moved the mouse on it to activate the GUI after waiting some time. Did pass along drm.debug=0x1e log_buf_len=1M as mentioned on https://01.org/linuxgraphics/documentation/how-report-bugs
I can provide logging, test stuff, compile things, etc., but no clue where to start here I'm afraid, C driver code is beyond my skills. Hope someone can help us narrow this down.
Thanks in advance!
Is the external monitor connected through DP?
How did you recover from the situation, what steps you have followed to make GUI appear?
Can you verify the issue with drmtip?(https://cgit.freedesktop.org/drm-tip)
If issue appears can you please attach the logs?
I'll be there this Thursday. I'll test with the drm tip branch. Anything in particular you'd like me to try / debug parameters?
Most systems have the monitor connected through DP cables, but there's also customers with VGA or DVI connections that have the issues.
I don't have to follow any steps to make the GUI appear. Either wait 10-20 minutes and it will come on it's own (perhaps due to something triggering a redraw - I don't know) or move the mouse.
On some systems moving the mouse doesn't help, they just need to wait a long time.
(In reply to Ferry from comment #2)
> I'll be there this Thursday. I'll test with the drm tip branch. Anything in
> particular you'd like me to try / debug parameters?
You can collect dmesg logs from boot with kernel parameters drm.debug=0x1e log_buf_len=4M. This will show more information.
Was already using those (well the buffer is listed as 1M on the debug page, not 4M). Thing is more that I'm only there once every 2 weeks so there's quite an interval on when I can test unfortunately.
I'll build drm-tip tomorrow and use the drm.debug=0x1e log_buf_len=4M parameters and report back.
Created attachment 145524 [details]
Log on 20190926 with drm-tip (5.3.0-rc8+) before X is displayed
Created attachment 145525 [details]
Log on 20190926 with drm-tip (5.3.0-rc8+) after X is displayed (did move the mouse after a while)
it took a while longer unfortunately. Had build the kernel last time, so it's 2 weeks old today (the drm-tip branch used).
Didn't make any difference unfortunately. Exported the logs twice, once before we had screen and once after. The latter one contains all the info in the first one as well.
Our software outputs some logging back to syslog in case you're wondering why messages seem to reappear after a while (after screen became active). This is done because it normally logs to a network syslog server after the network has become active and the client has connected to an exam server. As it takes a while before the network is active, and the syslog is reconfigured after that, some logs are re-outputted to syslog in order to ship them to the remote end. It's not an issue with the system.
TIA & kind regards :)