Bug 98257 - [SKL] Crash while intel_fbdev_restore_mode and freeze
Summary: [SKL] Crash while intel_fbdev_restore_mode and freeze
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-14 13:11 UTC by Dennis Wassenberg
Modified: 2017-08-09 19:20 UTC (History)
3 users (show)

See Also:
i915 platform: SKL
i915 features: display/Other


Attachments
crtc info cleanup (528 bytes, patch)
2017-03-17 08:54 UTC, Mika Kahola
no flags Details | Splinter Review

Description Dennis Wassenberg 2016-10-14 13:11:40 UTC
Hi,

I observed an issue which is often reproducible but not always.

I am able to reproduce this with Ubuntu Kernel 4.7 and 4.8 using a Lenovo Thinkpad X1 Tablet with additional Productivity Module and Onelink+ Docking Station. Additionally an external display has to be plugged at the Docking Station (VGA or DP, happens more often with VGA)

I configured both displays (internal and external) at F7 X server console. After that I started a second X server at an other console (e.g. F1). I configured X to use both displays. Then I unplugged the external display from the docking station. After that I terminated the X Server on console F1 and switch to X Server at console F7.

After doing these steps I got a black screen and the system freezes.

At a system where debugging is much easier for me I got the following debug output in that case:

[ 5593.858748] general protection fault: 0000 [#1] SMP                                                                                                                                                                                         
[ 5593.858842] Modules linked in: ...
[ 5593.858885] CPU: 2 PID: 4008 Comm: Xorg Tainted: P        W  O    4.7.3-grsec+ #1
[ 5593.858888] Hardware name: LENOVO 20GHS0D600/20GHS0D600, BIOS N1LET55W (1.55 ) 08/10/2016
[ 5593.858892] task: ffff8802174bad00 ti: ffff8802174bb5c0 task.ti: ffff8802174bb5c0
[ 5593.858908] RIP: 0010:[<ffffffff81084d72>]  [<ffffffff81084d72>] mutex_optimistic_spin+0x42/0x1b0  
[ 5593.858911] RSP: 0018:ffff8800d15ab870  EFLAGS: 00010282
[ 5593.858914] RAX: fefefefefefefefe RBX: 0000000000000001 RCX: 0000000000000005
[ 5593.858917] RDX: 0000000000000001 RSI: ffff8802158051c0 RDI: ffff8800d12e2258
[ 5593.858920] RBP: ffff8800d15ab8c0 R08: 0000000000000000 R09: 00000000d14c7000
[ 5593.858922] R10: 0000000000000780 R11: 0000000000000000 R12: ffff8802174bad00
[ 5593.858925] R13: ffff8802158051c0 R14: ffff8800d1102800 R15: ffff8800d12e2258
[ 5593.858929] FS:  000003551c86d100(0000) GS:ffff880221480000(0000) knlGS:0000000000000000
[ 5593.858933] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5593.858935] CR2: 0000000000000000 CR3: 00000000028a2000 CR4: 00000000003606b0
[ 5593.858938] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5593.858940] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 5593.858942] Stack:
[ 5593.858951]  00000001024000c0 ffff8802174bad00 ffff880217aa9890 0000000000000001
[ 5593.858956]  0000000000099e2a ffff8802158051c0 ffff8802174bad00 ffff8800d12e2000
[ 5593.858962]  ffff8800d1102800 ffff8800d12e2258 ffff8800d15ab930 ffffffff815f151c
[ 5593.858963] Call Trace:
[ 5593.858978]  [<ffffffff815f151c>] __ww_mutex_lock_slowpath+0x3c/0x1d0
[ 5593.858986]  [<ffffffff815f1714>] __ww_mutex_lock+0x64/0xa0
[ 5593.859058]  [<ffffffffa008f5d0>] drm_modeset_lock+0x30/0xd0 [drm]  
[ 5593.859118]  [<ffffffffa0090037>] drm_atomic_get_connector_state+0x37/0x3d0 [drm]
[ 5593.859154]  [<ffffffffa010ee64>] __drm_atomic_helper_set_config+0x274/0x370 [drm_kms_helper]
[ 5593.859183]  [<ffffffffa0112efa>] drm_fb_helper_restore_fbdev_mode_unlocked+0x28a/0x2c0 [drm_kms_helper]
[ 5593.859205]  [<ffffffffa0112f58>] drm_fb_helper_set_par+0x28/0x50 [drm_kms_helper]
[ 5593.859306]  [<ffffffffa01ff9c5>] intel_fbdev_set_par+0x15/0x60 [i915]
[ 5593.859316]  [<ffffffff813285b8>] fb_set_var+0x248/0x450
[ 5593.859339]  [<ffffffff8106f57a>] ? check_preempt_curr+0x8a/0xa0
[ 5593.859346]  [<ffffffff812c062f>] ? rb_erase+0x10f/0x610
[ 5593.859352]  [<ffffffff81321c1d>] fbcon_blank+0x20d/0x2e0
[ 5593.859361]  [<ffffffff8138f9a2>] do_unblank_screen+0xc2/0x1d0
[ 5593.859371]  [<ffffffff81383ed4>] complete_change_console+0x54/0xe0
[ 5593.859377]  [<ffffffff813852d1>] vt_ioctl+0x1371/0x17e0
[ 5593.859429]  [<ffffffffa00740e0>] ? drm_ioctl+0x160/0x630 [drm]
[ 5593.859474]  [<ffffffffa0078630>] ? drm_setmaster_ioctl+0x130/0x130 [drm]
[ 5593.859483]  [<ffffffff813776b5>] tty_ioctl+0x4a5/0xf60
[ 5593.859491]  [<ffffffff8113f4ef>] do_vfs_ioctl+0x9f/0x9c0
[ 5593.859499]  [<ffffffff81057892>] ? recalc_sigpending+0x12/0x50
[ 5593.859506]  [<ffffffff8105852c>] ? __set_task_blocked+0x2c/0x80
[ 5593.859514]  [<ffffffff8105aaa5>] ? __set_current_blocked+0x35/0x60
[ 5593.859520]  [<ffffffff8113fe8a>] sys_ioctl+0x7a/0x90
[ 5593.859527]  [<ffffffff8105ad59>] ? sys_rt_sigprocmask+0x149/0x1e0
[ 5593.859537]  [<ffffffff815f389f>] entry_SYSCALL_64_fastpath+0x13/0x93
[ 5593.859617] Code: 83 ec 28 48 89 45 b8 89 55 c8 65 48 8b 04 25 48 b4 00 00 48 8b 00 a8 08 75 18 48 8b 47 18 49 89 ff 49 89 f5 89 d3 48 85 c0 74 38 <8b> 50 28 85 d2 75 31 65 48 8b 04 25 48 b4 00 00 48 8b 00 c6 45
[ 5593.859624] RIP  [<ffffffff81084d72>] mutex_optimistic_spin+0x42/0x1b0
[ 5593.859626]  RSP <ffff8800d15ab870>
[ 5593.859660] ---[ end trace 902e07127626f91b ]---

The memory protection fault occurred at 0xfefefefefefefefe. This is because grsec will overwrite all freed data with these value. So it looks like a use after free. Not using grsec this is still reproducible but not every time. But if I instrument kfree this way that I write 0x0 to the freed buffer it is always reproducible again. So I assume that in case it is working without a crash at default ubuntu the memory was not reused util the use after free.

After some debugging I found that restore_fbdev_mode_unlocked will restore the fbdev mode and access the fb_helper structure in drm_fb_helper.c. There the drm_connector was removed from fb_helper->connector_info. This is because the unplug was detected and the connected unregistered (drm_connector_unregister) and drm_fb_helper_remove_one_connector was called. Just before the restore the fbdev mode the last reference of the drm_connector was removed and the cleanup of the drm_connected was done (drm_connector_cleanup).

Inside the fb_helper->crtc_info[i].mode_set there is still a reference to this connector (was not removed during unplug). This reference is accessed during fbdev mode restore and the memory protection fault will occur.

The backtrace of this call is:


mutex_optimistic_spin
__mutex_lock_common
__ww_mutex_lock_slowpath
__ww_mutex_lock
ww_mutex_lock
drm_modeset_lock
drm_atomic_get_connector_state
drm_atomic_add_affected_connectors
update_output_state
__drm_atomic_helper_set_config
restore_fbdev_mode
drm_fb_helper_restore_fbdev_mode_unlocked
drm_fb_helper_set_par
intel_fbdev_set_par
fb_set_var
...
Comment 1 Mika Kahola 2017-03-17 08:54:08 UTC
Created attachment 130284 [details] [review]
crtc info cleanup

It has been quite a while since this bug was reported so my question is are you still able to reproduce this bug with the latest drm-tip kernel?

Anyway, the bug report got me thinking that maybe we should also cleanup the ctrc info when we are removing one connector. The attached patch does that and I hope you could give it a go and let me know the outcome.
Comment 2 Jani Saarinen 2017-04-09 16:39:59 UTC
Reporter, are you able to test mentioned by Mika fixes issues seen?
Comment 3 Elizabeth 2017-08-09 19:20:31 UTC
Good afternoon, 
Since there is no answer from the reporter side for a long period. I'm closing this bug. If problem arise again, please file a new bug adding HW/SW information and logs. 
Thank you.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.