Created attachment 124583 [details] output of `journalctl -k` after freeze with option drm.debug=0xe During or directly after the boot and before the tty login, the system freezes. It can only be shut down using Magic SysRQ. The system log contains these errors (full log attached): [drm:intel_dp_start_link_train [i915]] *ERROR* failed to train DP, aborting [drm:intel_psr_work [i915]] *ERROR* Timed out waiting for PSR Idle for re-enable The second error repeats multiple times per seconds, until the system is shutdown. The problem can be reproduced using Linux v4.7-rc3, drm-intel-nightly (8bf2b76) and the current git/torvalds version (g9cbbef4). Hardware: Toshiba Portege Z30-A notebook, no additonal devices connected Linux version: Arch $ lspci | grep VGA 00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b)
This is silly. Based on the logs, I presume this happens before we enable eDP for the first time: -> intel_dp_detect -> intel_dp_long_pulse -> intel_dp_check_link_status (since it's eDP, connector status remains connected) -> apparently the crtc is active, but the channel eq is not okay, because well, we've never trained before, and then we go on trying to retrain Ander, any ideas when we caused this to happen?
Created attachment 125056 [details] [review] Workaround: don't retrain the link from long pulse (In reply to Jani Nikula from comment #1) > This is silly. > > Based on the logs, I presume this happens before we enable eDP for the first > time: > > -> intel_dp_detect > -> intel_dp_long_pulse > -> intel_dp_check_link_status (since it's eDP, connector status remains > connected) > -> apparently the crtc is active, but the channel eq is not okay, because > well, we've never trained before, and then we go on trying to retrain > > Ander, any ideas when we caused this to happen? Probably in the commit below. We had logic to retrain the link from long hpd, and that series changed intel_dp_detect() to share code with the long pulse handling. The problem is that the long pulse handling is also called from output polling during boot and resume from suspend and ends up calling that. I'm not sure what the proper fix would be, but the attached patch would confirm that's the issue. commit 7d23e3c37bb3fc6952dc84007ee60cb533fd2d5c Author: Shubhangi Shrivastava <shubhangi.shrivastava@intel.com> Date: Wed Mar 30 18:05:23 2016 +0530 drm/i915: Cleaning up intel_dp_hpd_pulse
(In reply to Ander Conselvan de Oliveira from comment #2) > Created attachment 125056 [details] [review] [review] > Workaround: don't retrain the link from long pulse > > […] > > I'm not sure what the proper fix would be, but the attached patch would > confirm that's the issue. Thanks! Unfortunately the patch (applied against v4.7-rc1) does not solve the problem in the original setup. But I found out by accident that apparently the system does not actually freeze, it’s just the internal monitor that is not updated. If I connect an external monitor, I can login and work (though the tty is flooded with the time out errors). Should I provide logs from the patched kernel, or test whether the external monitor also works with the unpatched kernel?
(In reply to Robin Krahl from comment #3) > (In reply to Ander Conselvan de Oliveira from comment #2) > > Created attachment 125056 [details] [review] [review] [review] > > Workaround: don't retrain the link from long pulse > > > > […] > > > > I'm not sure what the proper fix would be, but the attached patch would > > confirm that's the issue. > > Thanks! Unfortunately the patch (applied against v4.7-rc1) does not solve > the problem in the original setup. > > But I found out by accident that apparently the system does not actually > freeze, it’s just the internal monitor that is not updated. If I connect an > external monitor, I can login and work (though the tty is flooded with the > time out errors). > > Should I provide logs from the patched kernel, or test whether the external > monitor also works with the unpatched kernel? Please provide the logs for the patched kernel. There were two different error messages in the log, and the patch would fix the first one: "*ERROR* failed to train DP, aborting". This may or may not be related to the second error message and the frozen screen.
Created attachment 125282 [details] output of `journalctl -k` with patched kernel Okay, I added the log of the patched kernel (4.7-rc1).
(In reply to Robin Krahl from comment #5) > Created attachment 125282 [details] > output of `journalctl -k` with patched kernel > > Okay, I added the log of the patched kernel (4.7-rc1). So, it does seem like the two issues are not related. The link training errors went away. Rodrigo, do you know what could cause the PSR wait-for-idle timeout?
I had this problem also, and bisected the following kernel commit as the cause: 03b7b5f983091bca1, drm/i915/psr: Try to program link training times correctly which was done as a fix to https://bugs.freedesktop.org/show_bug.cgi?id=95176 I have Toshiba Portege Z30-A-15M with i7-4500U and in the linked bug another Toshiba user also had a problem. I tested with an external hdmi screen now and I can confirm it is just the laptop display not redrawing, not a complete hang.
Hi again Is there anything we can help with on this bug? Seems to be Toshiba specific.
I think I have the same issue. I have posted some info here: https://bbs.archlinux.org/viewtopic.php?pid=1648897
for me the external display also hangs, and the bug only occurs when using a linux kernel version >4.5.4 the newest versions i tried it with are linux 4.8.13 xf86-video-intel 1:2.99.917+746+g169c74f-1 using i915.enable_psr=0 doesn’t help.
(In reply to flying-sheep from comment #10) > for me the external display also hangs, and the bug only occurs when using a > linux kernel version >4.5.4 > > the newest versions i tried it with are > > linux 4.8.13 > xf86-video-intel 1:2.99.917+746+g169c74f-1 > > using i915.enable_psr=0 doesn’t help. If that's the case, please open a new bug report. Boot the latest Kernel from drm-tip (https://cgit.freedesktop.org/drm-tip) with drm.debug=0xe log_buf_len=1M, then attach the dmesg output to the bug report. PSR was just disabled by defaul on drm-tip so any error messages related to PSR should just go away now.
We just merged a patch to disable PSR by default: commit 2ee7dc497e348eecbb82adbb1ea9e9a7e29fe921 drm/i915: disable PSR by default on HSW/BDW This commit is marked for inclusion in the stable Kernels, so it should reach your Linux distribution at some point soon. Thank you for your bug report. In case you think the problem still happens, please feel free to reopen the bug. Please also make sure to re-generate the log files with the latest drm-tip tree and attach them here, since at least the PSR-related error messages should be gone now.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.