Bug 72765 - [ivb efi] Switching from efifb to i915.ko causes a hard hang with fbcon=vc:2-6
Summary: [ivb efi] Switching from efifb to i915.ko causes a hard hang with fbcon=vc:2-6
Status: CLOSED WONTFIX
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: high blocker
Assignee: Daniel Vetter
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-16 18:04 UTC by Chris Wilson
Modified: 2017-03-03 18:27 UTC (History)
3 users (show)

See Also:
i915 platform: IVB
i915 features: display/Other


Attachments
Avoid corrupting active workqueue entries (1.65 KB, patch)
2013-12-17 13:11 UTC, Chris Wilson
no flags Details | Splinter Review

Description Chris Wilson 2013-12-16 18:04:56 UTC
And of course the machine (a ivb i3 nuc) I am using has no wired ethernet and the usbnet is so flaky to prevent using netconsole to see if there is a panic. However, manual short-circuiting of the code initially lead me to suspect that the hard hang is caused by touching VGA IO. However, having worked around those hangs, my machine still hangs upon a cold boot.
Comment 1 Daniel Vetter 2013-12-16 22:28:20 UTC
So vgacon is now out of the picture, but the efifb->i915 takeover is still fail?

Can you confirm this by disabling vgacon in .config, or does that paper over the issues?
Comment 2 Chris Wilson 2013-12-17 11:26:02 UTC
I'm now convinced that my earlier attempts to disable CONFIG_VGA_CONSOLE were a fraud. Hacking .config directly (don't ask), I failed to notice it restore CONFIG_VGA_CONSOLE=y when it built.
Comment 3 Chris Wilson 2013-12-17 12:23:37 UTC
# CONFIG_VGA_CONSOLE is not set

and still the hard hang (down to about 1 out of every 2 boots). The hang is with either hdmi or dp.
Comment 4 Chris Wilson 2013-12-17 12:47:36 UTC
And now for something different:

[    7.372961] WARNING: at kernel/workqueue.c:1365 __queue_work+0x216/0x292()
[    7.372964] Modules linked in: coretemp arc4 kvm_intel kvm iwldvm crc32c_intel mac80211 ghash_clmulni_intel cryptd joydev hid_lenovo_tpkbd lib80211 iTCO_wdt iwlwifi iTCO_vendor_support i915(+) btusb snd_hda_codec_hdmi bluetooth evdev snd_hda_intel usbhid snd_hda_codec pcspkr hid cfg80211 microcode snd_hwdep i2c_i801 snd_pcm drm_kms_helper lpc_ich drm mfd_core snd_page_alloc rfkill snd_timer snd soundcore mei_me i2c_algo_bit video mei acpi_cpufreq mperf i2c_core button processor ext4 crc16 jbd2 mbcache sg sd_mod crc_t10dif ahci libahci libata scsi_mod thermal fan ehci_pci ehci_hcd thermal_sys usbcore usb_common
[    7.373068] CPU: 0 PID: 660 Comm: ps Not tainted 3.10.9+ #55
[    7.373071] Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0025.2012.1011.1534 10/11/2012
[    7.373075]  ffffffff81596a1e ffff88045f203d38 ffffffff813eaef6 ffff88045f203d78
[    7.373083]  ffffffff81041027 ffff88045f203d78 0000000000000000 ffff88045f217f00
[    7.373091]  ffff88044a89c800 ffff88042b473aa0 0000000000000000 ffff88045f203d88
[    7.373098] Call Trace:
[    7.373101]  <IRQ>  [<ffffffff813eaef6>] dump_stack+0x19/0x1b
[    7.373115]  [<ffffffff81041027>] warn_slowpath_common+0x62/0x7b
[    7.373121]  [<ffffffff81041055>] warn_slowpath_null+0x15/0x17
[    7.373127]  [<ffffffff8105aa82>] __queue_work+0x216/0x292
[    7.373133]  [<ffffffff8105ab65>] queue_work_on+0x4c/0x7c
[    7.373140]  [<ffffffff8123cebb>] ? fbcon_add_cursor_timer+0xfb/0xfb
[    7.373146]  [<ffffffff8123cee1>] cursor_timer_handler+0x26/0x42
[    7.373153]  [<ffffffff8104ee1f>] call_timer_fn+0xcc/0x1ea
[    7.373160]  [<ffffffff8104ed53>] ? detach_if_pending+0x7a/0x7a
[    7.373166]  [<ffffffff8123cebb>] ? fbcon_add_cursor_timer+0xfb/0xfb
[    7.373172]  [<ffffffff8104f27b>] run_timer_softirq+0x19c/0x1e4
[    7.373178]  [<ffffffff8104874e>] ? __do_softirq+0x9e/0x2a7
[    7.373183]  [<ffffffff810487e9>] __do_softirq+0x139/0x2a7
[    7.373189]  [<ffffffff81048a7a>] irq_exit+0x56/0x9b
[    7.373196]  [<ffffffff8102af31>] smp_apic_timer_interrupt+0x77/0x85
[    7.373203]  [<ffffffff813f5ff2>] apic_timer_interrupt+0x72/0x80
[    7.373206]  <EOI>  [<ffffffff8113ea70>] ? spin_lock+0x9/0xb
[    7.373217]  [<ffffffff8120d8c1>] ? do_raw_spin_trylock+0x42/0x42
[    7.373223]  [<ffffffff813ef2e0>] ? _raw_spin_unlock+0x23/0x36
[    7.373229]  [<ffffffff8113ea7b>] spin_unlock+0x9/0xb
[    7.373235]  [<ffffffff8113fd25>] dput+0xd9/0xf8
[    7.373241]  [<ffffffff8113685e>] path_put+0x13/0x20
[    7.373247]  [<ffffffff8113a6f3>] do_last+0x925/0xa0d
[    7.373253]  [<ffffffff81137fa4>] ? inode_permission+0x40/0x42
[    7.373259]  [<ffffffff8113a89c>] path_openat+0xc1/0x325
[    7.373265]  [<ffffffff8113ae0c>] do_filp_open+0x33/0x81
[    7.373271]  [<ffffffff811455bd>] ? __alloc_fd+0x169/0x17b
[    7.373279]  [<ffffffff8112d78f>] do_sys_open+0x67/0xf4
[    7.373285]  [<ffffffff8112d839>] SyS_open+0x1d/0x1f
[    7.373290]  [<ffffffff813f5369>] system_call_fastpath+0x16/0x1b
[    7.373294] ---[ end trace 78bba0b9776072a9 ]---
[    7.538936] fbcon: inteldrmfb (fb0) is primary device
[    7.539446] Console: switching consoles 2-6 to frame buffer device
[    7.539463] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[    7.539468] i915 0000:00:02.0: registered panic notifier

A clue?
Comment 5 Chris Wilson 2013-12-17 12:55:41 UTC
Nah, that's just some very dodgy looking code in drivers/video/console/fbcon.c
Comment 6 Chris Wilson 2013-12-17 13:11:59 UTC
Created attachment 90876 [details] [review]
Avoid corrupting active workqueue entries

Maybe not so innocent... The warning is that we corrupt the workqueues so could we effectively kill the machine? Anyway check this patch for sanity.
Comment 7 Daniel Vetter 2014-01-10 08:52:48 UTC
Is this still a thing with the recent vga frobbing?
Comment 8 Chris Wilson 2014-01-10 09:00:58 UTC
Yes. The bug is a race inside fbcon.c with freeing the work struct and it still executing.
Comment 9 Gordon Jin 2014-09-15 06:34:17 UTC
Is this a true highest blocker?
Comment 10 Chris Wilson 2014-11-14 09:46:31 UTC
(In reply to Gordon Jin from comment #9)
> Is this a true highest blocker?

It was hitting a production/retail system.
Comment 11 Daniel Vetter 2014-11-18 12:34:41 UTC
Please retest with latest drm-intel-nightly, specifically

commit 0485c9dc24ec0939b42ca5104c0373297506b555
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Nov 14 10:09:49 2014 +0100

    drm/i915: Kick fbdev before vgacon

Yes, ever the optimist here ;-)

For the original bug, not the efifb fail ofc.
Comment 12 Jani Nikula 2015-01-29 14:36:05 UTC
Chris, retest?
Comment 13 Chris Wilson 2015-01-29 15:02:27 UTC
Code review says it is still buggy, e.g.:

diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
index ea437245562e..90979e32bf4a 100644
--- a/drivers/video/console/fbcon.c
+++ b/drivers/video/console/fbcon.c
@@ -368,7 +368,7 @@ static void fbcon_update_softback(struct vc_data *vc)
 static void fb_flashcursor(struct work_struct *work)
 {
        struct fb_info *info = container_of(work, struct fb_info, queue);
-       struct fbcon_ops *ops = info->fbcon_par;
+       struct fbcon_ops *ops;
        struct vc_data *vc = NULL;
        int c;
        int mode;
@@ -381,6 +381,7 @@ static void fb_flashcursor(struct work_struct *work)
        if (ret == 0)
                return;
 
+       ops = info->fbcon_par;
        if (ops && ops->currcon != -1)
                vc = vc_cons[ops->currcon].d;

has a similar unbind race as the original bug.
Comment 14 Ricardo 2017-02-21 01:26:32 UTC
Chris is this problem still exits, has not been any changes for couple of years should we close it?
Comment 15 Ricardo 2017-03-03 18:26:55 UTC
chatted with Chris and the suggestion was to close it...


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.