Created attachment 108923 [details] dmesg ==System Environment== -------------------------- Regression: not sure Non-working platforms: HSW ==kernel== -------------------------- drm-intel-nightly/782bafb46cc12737b16e5007583bd7b534c6202a ==Bug detailed description== It causes system hang, It happens only one HSW machine(same as bug 85541, bug 85787). Both -nightly and -fixes kernel have this issue. output: IGT-Version: 1.8-ge622850 (x86_64) (Linux: 3.18.0-rc3_drm-intel-nightly_782baf_20141104_debug+ x86_64) hang-read-crc-pipe-B: Testing connector VGA-1 using pipe B dmesg: [ 176.520445] Kernel panic - not syncing: Timeout synchronizing machine check over CPUs [ 177.551604] Shutting down cpus with NMI [ 177.562777] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) [ 177.563666] drm_kms_helper: panic occurred, switching back to text console [ 177.564624] [ 177.565511] ============================================= [ 177.566389] [ INFO: possible recursive locking detected ] [ 177.567268] 3.18.0-rc3_drm-intel-nightly_782baf_20141104_debug+ #1164 Not tainted [ 177.568162] --------------------------------------------- [ 177.569045] kms_pipe_crc_ba/4247 is trying to acquire lock: [ 177.569922] (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa002563a>] __drm_modeset_lock_all+0x6c/0x100 [drm] [ 177.570937] [ 177.570937] but task is already holding lock: [ 177.572671] (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa002563a>] __drm_modeset_lock_all+0x6c/0x100 [drm] [ 177.573707] [ 177.573707] other info that might help us debug this: [ 177.575478] Possible unsafe locking scenario: [ 177.575478] [ 177.577206] CPU0 [ 177.578064] ---- [ 177.578982] lock(&dev->mode_config.mutex); [ 177.579880] lock(&dev->mode_config.mutex); [ 177.580761] [ 177.580761] *** DEADLOCK *** [ 177.580761] [ 177.583085] May be due to missing lock nesting notation [ 177.583085] [ 177.584602] 5 locks held by kms_pipe_crc_ba/4247: [ 177.585370] #0: (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa002563a>] __drm_modeset_lock_all+0x6c/0x100 [drm] [ 177.586305] #1: (crtc_ww_class_acquire){+.+.+.}, at: [<ffffffffa0025644>] __drm_modeset_lock_all+0x76/0x100 [drm] [ 177.587243] #2: (crtc_ww_class_mutex){+.+.+.}, at: [<ffffffffa0024f82>] drm_modeset_lock+0x5c/0xbc [drm] [ 177.588175] #3: (&(&dev_priv->uncore.lock)->rlock){-.-.+.}, at: [<ffffffffa00d3c16>] hsw_write32+0x90/0x124 [i915] [ 177.589124] #4: (panic_lock){....+.}, at: [<ffffffff8183635e>] panic+0x3d/0x1f5 [ 177.590055] [ 177.590055] stack backtrace: [ 177.591583] CPU: 7 PID: 4247 Comm: kms_pipe_crc_ba Not tainted 3.18.0-rc3_drm-intel-nightly_782baf_20141104_debug+ #1164 [ 177.592394] Hardware name: /DZ87KLT75K, BIOS KLZ8711D.86A.0336.2013.0516.1957 05/16/2013 [ 177.593214] ffffffff83eb7cd0 ffff88025fbca878 ffffffff8183ae58 0000000000000000 [ 177.594136] ffffffff83eb7cd0 ffff88025fbca948 ffffffff81074813 ffff88025fbca990 [ 177.595041] ffff88025340c000 0000000183e78bb0 ffff880200000000 28414289aca22785 [ 177.595939] Call Trace: [ 177.596721] <#MC> [<ffffffff8183ae58>] dump_stack+0x46/0x58 [ 177.597537] [<ffffffff81074813>] __lock_acquire+0x8b2/0x1803 [ 177.598326] [<ffffffff81138dc5>] ? create_object+0x17c/0x291 [ 177.599108] [<ffffffff81075c74>] lock_acquire+0xd3/0x10d [ 177.599975] [<ffffffffa002563a>] ? __drm_modeset_lock_all+0x6c/0x100 [drm] [ 177.600749] [<ffffffff81071e23>] ? trace_hardirqs_off+0xd/0xf [ 177.601505] [<ffffffff8183ec91>] mutex_lock_nested+0x4b/0x2d2 [ 177.602252] [<ffffffffa002563a>] ? __drm_modeset_lock_all+0x6c/0x100 [drm] [ 177.602998] [<ffffffffa002560c>] ? __drm_modeset_lock_all+0x3e/0x100 [drm] [ 177.603717] [<ffffffff818341e1>] ? kmemleak_alloc+0x25/0x41 [ 177.604417] [<ffffffff81133ed3>] ? kmem_cache_alloc_trace+0xb8/0x13c [ 177.605118] [<ffffffffa002563a>] __drm_modeset_lock_all+0x6c/0x100 [drm] [ 177.605822] [<ffffffffa0025724>] drm_modeset_lock_all+0x10/0x28 [drm] [ 177.606522] [<ffffffffa0070307>] drm_fb_helper_pan_display+0x36/0xc5 [drm_kms_helper] [ 177.607219] [<ffffffff813da179>] fb_pan_display+0xed/0x131 [ 177.607904] [<ffffffff813d52ec>] bit_update_start+0x20/0x49 [ 177.608586] [<ffffffff813d33f7>] fbcon_switch+0x452/0x469 [ 177.609252] [<ffffffff8142befe>] redraw_screen+0x112/0x1e3 [ 177.609900] [<ffffffff813d2956>] fbcon_blank+0x1e5/0x26e [ 177.610550] [<ffffffff810902b6>] ? mod_timer+0x12a/0x184 [ 177.611180] [<ffffffff81071e23>] ? trace_hardirqs_off+0xd/0xf [ 177.611796] [<ffffffff81841f4e>] ? _raw_spin_unlock_irqrestore+0x38/0x46 [ 177.612417] [<ffffffff810902df>] ? mod_timer+0x153/0x184 [ 177.613035] [<ffffffff8142d011>] do_unblank_screen+0xfa/0x173 [ 177.613656] [<ffffffff8142d09a>] unblank_screen+0x10/0x12 [ 177.614281] [<ffffffff8139cfdc>] bust_spinlocks+0x14/0x28 [ 177.614911] [<ffffffff81836429>] panic+0x108/0x1f5 [ 177.615541] [<ffffffff8101bac1>] mce_panic+0x159/0x18b [ 177.616177] [<ffffffff8101bb39>] mce_timed_out+0x46/0x67 [ 177.616812] [<ffffffff8101befd>] do_machine_check+0x192/0x766 [ 177.617462] [<ffffffffa00d07e1>] ? hsw_unclaimed_reg_detect.isra.6+0x20/0x44 [i915] [ 177.618122] [<ffffffff818441ee>] machine_check+0x1e/0x30 [ 177.618785] [<ffffffffa00d07e1>] ? hsw_unclaimed_reg_detect.isra.6+0x20/0x44 [i915] [ 177.619451] <<EOE>> [<ffffffffa00d3c7f>] hsw_write32+0xf9/0x124 [i915] [ 177.620172] [<ffffffffa00fa6fc>] hsw_fdi_link_train+0xf6/0x34a [i915] [ 177.620844] [<ffffffffa00e5f8a>] haswell_crtc_enable+0x4a4/0x8f5 [i915] [ 177.621496] [<ffffffff810764aa>] ? trace_hardirqs_on+0xd/0xf [ 177.622142] [<ffffffffa00e7bfa>] __intel_set_mode+0x12f4/0x1426 [i915] [ 177.622779] [<ffffffffa00e9ffe>] intel_set_mode+0x16/0x2f [i915] [ 177.623397] [<ffffffffa00eac81>] intel_crtc_set_config+0x77c/0xae0 [i915] [ 177.624013] [<ffffffffa0018d34>] drm_mode_set_config_internal+0x57/0xe4 [drm] [ 177.624637] [<ffffffffa001ca8e>] drm_mode_setcrtc+0x3ef/0x499 [drm] [ 177.625239] [<ffffffffa0010c29>] drm_ioctl+0x2be/0x423 [drm] [ 177.625824] [<ffffffffa001c69f>] ? drm_mode_setplane+0x1d9/0x1d9 [drm] [ 177.626392] [<ffffffff81076441>] ? trace_hardirqs_on_caller+0x142/0x19e [ 177.626962] [<ffffffff8114c559>] do_vfs_ioctl+0x455/0x49f [ 177.627526] [<ffffffff810bca34>] ? __audit_syscall_entry+0xbf/0xe1 [ 177.628087] [<ffffffff8100d3b0>] ? do_audit_syscall_entry+0x63/0x65 [ 177.628645] [<ffffffff8114c5f6>] SyS_ioctl+0x53/0x81 [ 177.629198] [<ffffffff81842552>] system_call_fastpath+0x12/0x17 ==Reproduce steps== ---------------------------- 1. ./kms_pipe_crc_basic --run-subtest hang-read-crc-pipe-B
Run ./kms_flip --run-subtest flip-vs-panning also causes system hang and has same call trace. IGT-Version: 1.8-g50d539e (x86_64) (Linux: 3.18.0-rc3_drm-intel-nightly_9a7620_20141112_debug+ x86_64) Using monotonic timestamps Beginning flip-vs-panning on crtc 8, connector 18 1024x768 60 1024 1048 1184 1344 768 771 777 806 0xa 0x40 65000
Run ./kms_pipe_crc_basic --run-subtest hang-read-crc-pipe-C, system also hang, and has similar dmesg.
Oh fuck, fbdev panic handling killed the oops here. /me cries Please rebuild the kernel with CONFIG_DRM_I915_FBDEV=n (only for this bug here, it will kill the console), reproduce the issue and attach a new dmesg (with debugging, as usual). That way we should be able to capture the oops correctly.
Oh and since this looks super-nasty: Please try to figure out whether older stable kernels work properly or whether there's a different behaviour. I'm pretty sure that this worked once and is a regression.
Created attachment 109918 [details] dmesg(CONFIG_DRM_I915_FBDEV=n)
[ 4483.626048] Kernel panic - not syncing: Timeout synchronizing machine check over CPUs [ 4484.664475] Shutting down cpus with NMI That's a fatal mce, which is either busted hw or broken bios. It's not pretty that the i915 panic handler then kills the box, but fixing that is much larger problem (and atm not at the top). The underlying MCE issue which kills the machine otoh is plain hw issues, so closing as invalid. I guess you need to decomission/replace this machine if a bios upgrade doesn't fix this.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.