Bug 91438 - hard lockup after switching between laptop's panel and external monitor
Summary: hard lockup after switching between laptop's panel and external monitor
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Rodrigo Vivi
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-23 11:12 UTC by ivan
Modified: 2016-12-09 09:05 UTC (History)
1 user (show)

See Also:
i915 platform: BDW
i915 features: display/DP MST


Attachments
relevant dmesg and laptop info (163.38 KB, text/plain)
2015-07-23 11:12 UTC, ivan
no flags Details
dmesg (227.97 KB, text/plain)
2015-08-21 07:04 UTC, ivan
no flags Details
oops on reboot. (453.75 KB, image/jpeg)
2015-08-21 07:04 UTC, ivan
no flags Details
debug info (2.71 KB, patch)
2015-08-21 16:49 UTC, Rodrigo Vivi
no flags Details | Splinter Review
dmesg with debug patch (1010.63 KB, text/plain)
2015-08-21 18:04 UTC, ivan
no flags Details
dmesg with debug patch - redone. (187.30 KB, application/octet-stream)
2015-08-21 18:24 UTC, ivan
no flags Details
dmesg with debug patch after a suspend/resume cycle (472.11 KB, text/plain)
2015-08-21 18:24 UTC, ivan
no flags Details
dmesg/xrandr - retry with 2015-08-29 drm-intel-nightly (83.52 KB, application/octet-stream)
2015-08-29 08:48 UTC, ivan
no flags Details

Description ivan 2015-07-23 11:12:53 UTC
Created attachment 117314 [details]
relevant dmesg and laptop info

When docking my laptop I usually use xrandr to turn off the laptop's display so that my taskbar and network applets are displayed properly on my external monitor.
When trying to debug some unrelated stuff I found out that I could hard lock the laptop after switching back and forth a few times between both monitors.

It took quite a bit of time to reproduce the problem. It happens only when the laptop is docked.

How to reproduce:
- laptop docked, external monitor on VGA port (no dongle)
- the following steps trigger a hard lockup:

# "try1"
xrandr --output $PANEL --off
xrandr --output $EXT --auto
xrandr --output $PANEL --auto
xrandr --output $EXT --off

# "try2"
xrandr --output $PANEL --off
xrandr --output $EXT --auto # the external panel stays blank
xrandr --output $PANEL --auto # locks a bit after here
xrandr --output $EXT --off

note that the blocks are identical - ie there's no problem the first time, but running the same commands a second time trigger the lockup. Both displays are black/blank so there's no way to even see what's happening.

The xrandrvga script in the attached tgz outputs dmesg + xrandr --verbose after each switch. The debug files show that the output of xrandr --verbose after setting the panel to auto in the 2nd "try" is 0. So I'm not sure if it's because of running xrandr, or because the next command (turning off the external panel) triggers the hard lockup and the output file is not flushed to disk.

note:
- when docked, the external monitor shows up as DP2-3, and it's DP2 when undocked (= plugged in the laptop's VGA port).
- enable_psr=0, enable_fbc=0

Hardware: lenovo T450s, model 20BX000TBM (see attached dmidecode and lspci_vv).
OS: Fedora 22, fully updated (as of 2015-07-23)
Kernel: git drm-intel-nightly (as of 2015-07-22 evening). .config attached (config.drm-intel).
Xorg:
- xorg-x11-server-Xorg-1.17.2-1.fc22.x86_64
- xorg-x11-drv-intel-2.99.917-12.20150615.fc22.x86_64
- mesa-dri-drivers-10.6.1-1.20150629.fc22.x86_64

attached a tgz with the relevant files
- docked: tests with the laptop docked (lockup)
- undocked_on_AC: tests with the laptop undocked, on AC, with the external monitor plugged in the laptop's VGA port. (in that case everything is working fine).
Comment 1 ivan 2015-07-23 11:17:21 UTC
note: there's a chance that this bug is related to two other bugs (bugs 91436 and 91437) that I found when trying to debug flickering with Rodrigo Vivi when PSR was enabled.
Comment 2 Rodrigo Vivi 2015-08-20 16:55:52 UTC
Yes, this is probably related to those 2 other bugs,

could you please check if those patches solves by themselves this issue here as well?

Along with those patches I'm sending other patches to fix disable sequences, reduce delay, etc, but I believe those 2 patches are the key for issues you are facing.

Anyway, if you are interested in giving a try on all of them:
http://cgit.freedesktop.org/~vivijim/drm-intel/log/?h=psr-delayed-enable

Thanks,
Rodrigo.
Comment 3 ivan 2015-08-21 07:04:13 UTC
Created attachment 117833 [details]
dmesg

Hi Rodrigo,

I compiled/tested a kernel from the psr-delayed-enable branch.

Laptop on AC, undocked, psr=1, fbc=0

- at boot after modeset, screen freezes, like in bug 91436
- a suspend/resume cycle doesn't help with "unfreezing" the display. At resume the display will show the current laptop state (eg. if I suspend the laptop when the screen is frozen at the luks password prompt, at resume I'll see X's login screen), but the screen will freeze instantly again (can't see the mouse pointer moving).
- laptop is responsive though, ssh'ed into it; see attached dmesg with debug output (boot + 2 suspend/resume cycles).
- reboot -> oops
Comment 4 ivan 2015-08-21 07:04:56 UTC
Created attachment 117834 [details]
oops on reboot.
Comment 5 Rodrigo Vivi 2015-08-21 16:49:30 UTC
Created attachment 117840 [details] [review]
debug info

Hi Ivan,

could you please apply this patch to psr-delayed-enable, verify the issues are still there and grab the dmesg for me please?

1 thing I noticed on your platform is that drrs was enabled but also I'd like to investigate the PSR flow there.

Regarding the oops that doesn't look like PSR. Does it happen only when PSR is enabled?

Thanks,
Rodrigo.
Comment 6 ivan 2015-08-21 18:04:35 UTC
Created attachment 117843 [details]
dmesg with debug patch

Hi Rodrigo,

Same issue with the patched kernel, dmesg attached.

I couldn't reproduce the oops, maybe because this time I only did only one suspend/resume cycle. I'll try next time.

Re- DRRS: I didn't set any option other than enable_psr=1 and enable_fbc=0 and there's nothing relevant in modprobe.d/ so it looks like the kernel is enabling it by default.

Kind regards,
Ivan
Comment 7 Rodrigo Vivi 2015-08-21 18:08:41 UTC
Hi Ivan,

thanks, but could you please increase buf_log_len=32M so we get the begin as well?

in this specific dmesg, do you have an idea in around what time it got frozen?

Thanks,
Rodrigo.
Comment 8 ivan 2015-08-21 18:24:13 UTC
Created attachment 117844 [details]
dmesg with debug patch - redone.

sorry I didn't pay attention to buf_log_len

- dmesg -c > boot.dmesg.2; the screen freezes between 5 and 6 seconds after boot but I used a phone's stopwatch so it's not very accurate.

- dmesg > boot.dmesg.2.after_resume: dmesg after a suspend/resume cycle
Comment 9 ivan 2015-08-21 18:24:52 UTC
Created attachment 117845 [details]
dmesg with debug patch after a suspend/resume cycle
Comment 10 Rodrigo Vivi 2015-08-25 21:05:19 UTC
Ah, ops, I was confusing this issue here with others #91436 and #91437... This one here is happening even with PSR disabled so it would block you to test other 2 psr related issues....
Comment 11 ivan 2015-08-27 10:15:18 UTC
Yes, since you said the bugs were related I assumed you wanted to have the discussion about the other 2 bugs (91436 and 91437) only here.

The dmesg outputs I've sent a few days ago are valid for bug 91436. I can re-attach them on that bug's page if you'd like.

I can't test anything for bug 91437 since I can't get a working (= unfrozen) screen to see if powering off the sound controller would freeze the screen like it was happeing before.

I'll try to see what's happening without PSR for this bug (#91438).
Comment 12 Rodrigo Vivi 2015-08-27 18:22:43 UTC
Thank you very much Ivan.

Please forgive my confusion... I had forgotten this one was also happening with PSR disabled. But this is bad because it blockse the PSR ones, right?!
Comment 13 ivan 2015-08-28 07:24:05 UTC
Hi Rodrigo,

No problem. I've added the corresponding comments/attachements to bug 91436 and 91437 (only #91437 is blocked by this bug).

I'll try to find time to redo the dock/undock tests with the psr-delayed-enable brach but I doubt it'll change anything since it happened with psr=0. Maybe I should try with the drm-intel branch instead.
Comment 14 ivan 2015-08-29 08:48:46 UTC
Created attachment 117977 [details]
dmesg/xrandr - retry with 2015-08-29 drm-intel-nightly

So, I retried with today's drm-intel-nightly branch (330b171ec685987e04aa1c942661d12c8da79b10).
laptop docked on AC with ext. monitor, psr=0, fbc=0  but there's still the same problem - the laptop hangs on the 2nd try.

note: the laptop doesn't seem to hang immediately: I can hear the fan kicking in, then slowing. At some point the display is restored on the internal panel, but it's not responsive. Then the fan reaches max speed and a hard reboot is needed.
Comment 15 Jani Nikula 2016-04-25 09:15:02 UTC
ivan, please try latest kernels, including v4.6-rc5, and report back. The logs indicate your dock has DP MST and there's been issues until very recently. I don't think this has anything to do with PSR.
Comment 16 ivan 2016-05-30 16:06:10 UTC
Sorry for the very late reply, I somehow overlooked bugzilla's email.
I can't do any tests at the moment - I don't have the docking station and I switched HDs and don't have the build environment in which I used to build kernels.
If you'd like, you can close the bug, I'll re-open if needed.
Comment 17 Jani Saarinen 2016-12-09 09:00:32 UTC
I will close based on comment. Not sure what fixed but please re-open is occurs.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.