Created attachment 80986 [details] dmesg System Environment: -------------------------- Arch: x86_64 Platform: Haswell Kernel: (drm-intel-next-queued)cab8b5862acd55019fbeede6940d1a601912d6b8 Bug detailed description: ----------------------------- module_reload randomly causes system hang, It happens 1 in 5 runs on haswell with drm-intel-next-queued kernel.It works well on drm-intel-fixes kernel. It also works well on Ivybridge. output: module successfully unloaded Call trace: [ 168.675415] ---[ end trace 51d2d549a433189e ]--- [ 168.675416] Kernel panic - not syncing: Fatal exception in interrupt [ 168.686053] ------------[ cut here ]------------ [ 168.686158] WARNING: at arch/x86/kernel/smp.c:123 update_process_times+0x50/0x5c() [ 168.686318] Modules linked in: i915(+) drm_kms_helper drm netconsole configfs ipv6 dm_mod acpi_cpufreq coretemp kvm_intel kvm snd_hda_codec_realtek microcode pcspkr i2c_i801 iTCO_wdt iTCO_vendor_support snd_hda_intel lpc_ich snd_hda_codec mfd_core snd_hwdep snd_pcm snd_page_alloc snd_timer snd soundcore video button mperf freq_table [last unloaded: drm] [ 168.687894] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G D 3.10.0-rc5_nightlytop_c18149_20130618_+ #4155 [ 168.688052] Hardware name: Intel Corporation Shark Bay Client platform/Flathead Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013 [ 168.688221] ffffffff816deda2 0000000000000000 ffffffff8102c68d ffff88024e20d218 [ 168.688535] 0000000000000000 ffff88024e20d120 ffffffff81a58410 0000000000000000 [ 168.688848] 0000000000000000 ffff88024e20d218 ffffffff81037010 ffff88024e20d350 [ 168.689155] Call Trace: [ 168.689252] <IRQ> [<ffffffff816deda2>] ? dump_stack+0xd/0x17 [ 168.689438] [<ffffffff8102c68d>] ? warn_slowpath_common+0x5f/0x77 [ 168.689539] [<ffffffff81037010>] ? update_process_times+0x50/0x5c [ 168.689647] [<ffffffff81064db5>] ? tick_sched_handle+0x30/0x3b [ 168.689751] [<ffffffff81065073>] ? tick_sched_timer+0x30/0x4c [ 168.689858] [<ffffffff8104778b>] ? __run_hrtimer.isra.25+0x4a/0xa2 [ 168.689962] [<ffffffff81047d47>] ? hrtimer_interrupt+0xe3/0x1de [ 168.690067] [<ffffffff8101bc48>] ? smp_apic_timer_interrupt+0x7e/0x91 [ 168.690171] [<ffffffff816e960a>] ? apic_timer_interrupt+0x6a/0x70 [ 168.690275] [<ffffffff81048dcd>] ? up+0xb/0x36 [ 168.690379] [<ffffffff816daca8>] ? panic+0x184/0x1bb [ 168.690481] [<ffffffff816dac1b>] ? panic+0xf7/0x1bb [ 168.690585] [<ffffffff816e45bd>] ? oops_end+0x99/0xa6 [ 168.690685] [<ffffffff816da716>] ? no_context+0x24a/0x275 [ 168.690788] [<ffffffff816e67b5>] ? __atomic_notifier_call_chain+0xa/0xc [ 168.690896] [<ffffffff816e66d1>] ? __do_page_fault+0x3cd/0x449 [ 168.691001] [<ffffffff8105294b>] ? enqueue_task_fair+0x7af/0x85b [ 168.691107] [<ffffffff810508f0>] ? select_task_rq_fair+0x271/0x558 [ 168.691212] [<ffffffff8104c1f0>] ? check_preempt_curr+0x36/0x62 [ 168.691318] [<ffffffff8105294b>] ? enqueue_task_fair+0x7af/0x85b [ 168.691424] [<ffffffff810410ac>] ? wq_worker_waking_up+0xb/0x51 [ 168.691530] [<ffffffff816e3c32>] ? page_fault+0x22/0x30 [ 168.691641] [<ffffffffa007751c>] ? ivb_can_enable_err_int+0x20/0x33 [i915] [ 168.691755] [<ffffffffa007ccde>] ? ivybridge_irq_handler+0x429/0x491 [i915] [ 168.691863] [<ffffffff81082bbb>] ? handle_irq_event_percpu+0x24/0x119 [ 168.691970] [<ffffffff81082cde>] ? handle_irq_event+0x2e/0x4c [ 168.692075] [<ffffffff81085019>] ? handle_edge_irq+0xbb/0xde [ 168.692175] [<ffffffff810038c1>] ? handle_irq+0x15/0x1d [ 168.692279] [<ffffffff810035bd>] ? do_IRQ+0x41/0xa6 [ 168.692383] [<ffffffff816e3a2a>] ? common_interrupt+0x6a/0x6a [ 168.692484] <EOI> [<ffffffff815ff34c>] ? cpuidle_enter_state+0x43/0xa6 [ 168.692665] [<ffffffff815ff345>] ? cpuidle_enter_state+0x3c/0xa6 [ 168.692770] [<ffffffff815ff46e>] ? cpuidle_idle_call+0xbf/0x10b [ 168.692875] [<ffffffff810085cb>] ? arch_cpu_idle+0x6/0x17 [ 168.692979] [<ffffffff8105dd7c>] ? cpu_startup_entry+0xa1/0xfd [ 168.693082] [<ffffffff81b08c6a>] ? start_kernel+0x378/0x383 [ 168.693186] [<ffffffff81b08708>] ? repair_env_string+0x57/0x57 [ 168.693290] ---[ end trace 51d2d549a433189f ]--- Reproduce steps: ---------------------------- 1. run ./module_reload 5 cycles
The panic suggests that an interrupt occurred after the code segment was removed (module unload). Can you please bisect this? I suspect some recent interrupt reworking.
bisect shows: commit eda63ffb906c2fb3b609a0e87aeb63c0f25b9e6b is the first bad commit. commit eda63ffb906c2fb3b609a0e87aeb63c0f25b9e6b Author: Ben Widawsky <ben@bwidawsk.net> AuthorDate: Tue May 28 19:22:26 2013 -0700 Commit: Daniel Vetter <daniel.vetter@ffwll.ch> CommitDate: Fri May 31 20:54:16 2013 +0200 drm/i915: Add PM regs to pre/post install At the moment, these values are wiped out anyway by the rps enable/disable. That will be changed in the next patch though. v2: Add post install setup to address issue found by Damien in the next patch. replaced WARN_ON(dev_priv->rps.pm_iir != 0); with rps.pm_iir = 0; With the v2 of this patch and the deferred pm enabling (which changed since the original patches) we're now able to get PM interrupts before we've brought up enabled rps. At this point in boot, we don't want to do anything about it, so we simply ignore it. Since writing the original assertion, the code has changed quite a bit, and I believe removing this assertion is perfectly safe. Signed-off-by: Ben Widawsky <ben@bwidawsk.net> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com> [danvet: I don't agree with the justification to drop the WARN and added a FIXME to that effect.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Can you please test the irq-review branch from my personal git repo: http://cgit.freedesktop.org/~danvet/drm/log/?h=irq-review
Quick note on bug filing BKMs: When pasting a backtrace always paste the first backtrace, not the last one like here. Usually later backtraces are just follow-up fallout. So in this case the right paste would be [ 168.675293] IP: [<ffffffffa007751c>] ivb_can_enable_err_int+0x20/0x33 [i915] [ 168.675306] PGD 0 [ 168.675308] Oops: 0000 [#1] SMP [ 168.675309] Modules linked in: i915(+) drm_kms_helper drm netconsole configfs ipv6 dm_mod acpi_cpufreq coretemp kvm_intel kvm snd_hda_codec_realtek microcode pcspkr i2c_i801 iTCO_wdt iTCO_vendor_support snd_hda_intel lpc_ich snd_hda_codec mfd_core snd_hwdep snd_pcm snd_page_alloc snd_timer snd soundcore video button mperf freq_table [last unloaded: drm] [ 168.675321] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-rc5_nightlytop_c18149_20130618_+ #4155 [ 168.675322] Hardware name: Intel Corporation Shark Bay Client platform/Flathead Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013 [ 168.675323] task: ffffffff81a58410 ti: ffffffff81a48000 task.ti: ffffffff81a48000 [ 168.675324] RIP: 0010:[<ffffffffa007751c>] [<ffffffffa007751c>] ivb_can_enable_err_int+0x20/0x33 [i915] [ 168.675334] RSP: 0018:ffff88024e203ea0 EFLAGS: 00010002 [ 168.675335] RAX: 0000000000000001 RBX: ffff88023e118000 RCX: 0000000000000003 [ 168.675336] RDX: ffff88023e118000 RSI: 0000000000000000 RDI: ffff880243ed5800 [ 168.675336] RBP: ffff880243ed5800 R08: ffffffff816f42f0 R09: 000000000000b5a1 [ 168.675337] R10: 00000000000000f6 R11: 00000000000000f6 R12: 0000000000000010 [ 168.675337] R13: 0000000000000001 R14: 00000000ffffffff R15: 0000000000000002 [ 168.675338] FS: 0000000000000000(0000) GS:ffff88024e200000(0000) knlGS:0000000000000000 [ 168.675339] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 168.675340] CR2: 0000000000000920 CR3: 000000024316f000 CR4: 00000000001407f0 [ 168.675341] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 168.675341] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 168.675341] Stack: [ 168.675342] ffffffffa007ccde 000000000000012c f400252900000040 ffff88024e212150 [ 168.675343] ffff88023f137c00 ffff88023f3a0500 000000000000002b 0000000000000000 [ 168.675345] 0000000000000000 000000000000f000 ffffffff81082bbb ffff880245400000 [ 168.675346] Call Trace: [ 168.675347] <IRQ> [<ffffffffa007ccde>] ? ivybridge_irq_handler+0x429/0x491 [i915] [ 168.675358] [<ffffffff81082bbb>] ? handle_irq_event_percpu+0x24/0x119 [ 168.675362] [<ffffffff81082cde>] ? handle_irq_event+0x2e/0x4c [ 168.675364] [<ffffffff81085019>] ? handle_edge_irq+0xbb/0xde [ 168.675366] [<ffffffff810038c1>] ? handle_irq+0x15/0x1d [ 168.675368] [<ffffffff810035bd>] ? do_IRQ+0x41/0xa6 [ 168.675370] [<ffffffff816e3a2a>] ? common_interrupt+0x6a/0x6a [ 168.675372] <EOI> [<ffffffff815ff34c>] ? cpuidle_enter_state+0x43/0xa6 [ 168.675376] [<ffffffff815ff345>] ? cpuidle_enter_state+0x3c/0xa6 [ 168.675377] [<ffffffff815ff46e>] ? cpuidle_idle_call+0xbf/0x10b [ 168.675378] [<ffffffff810085cb>] ? arch_cpu_idle+0x6/0x17 [ 168.675380] [<ffffffff8105dd7c>] ? cpu_startup_entry+0xa1/0xfd [ 168.675382] [<ffffffff81b08c6a>] ? start_kernel+0x378/0x383 [ 168.675384] [<ffffffff81b08708>] ? repair_env_string+0x57/0x57 [ 168.675386] Code: 05 e1 b8 01 00 00 00 c3 0f 1f 00 48 8b 97 30 03 00 00 48 8b 42 10 8a 48 04 31 c0 83 e1 07 eb 14 48 8b b4 c2 d8 22 00 00 48 ff c0 <80> be 20 09 00 00 00 75 07 39 c1 77 e8 b0 01 c3 31 c0 c3 8b 97 [ 168.675404] RIP [<ffffffffa007751c>] ivb_can_enable_err_int+0x20/0x33 [i915] [ 168.675413] RSP <ffff88024e203ea0> [ 168.675414] CR2: 0000000000000920 [ 168.675415] ---[ end trace 51d2d549a433189e ]--- [ 168.675416] Kernel panic - not syncing: Fatal exception in interrupt Although in this case it looks a bit like 2 backtraces are a bit interleaved, so a more complicated situation. CC'ing Sun Yi and Gordon Jin to take note of this BKM.
(In reply to comment #3) > Can you please test the irq-review branch from my personal git repo: > > http://cgit.freedesktop.org/~danvet/drm/log/?h=irq-review It still happens with this patch.
(In reply to comment #5) > (In reply to comment #3) > > Can you please test the irq-review branch from my personal git repo: > > > > http://cgit.freedesktop.org/~danvet/drm/log/?h=irq-review > > It still happens with this patch. Just to clarify: Have you tested with just the patch applied, or the entire branch? The branch contains about 20 patches which all try to improve correctness around irq handling, so we need to test the entire pile.
I've updated my irq-review branch with a quick hack which hopefully resolves the issue here: commit 196ae5b931fd693d90fde6042d0294f5e652760f Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Jun 25 14:32:18 2013 +0200 HACK: setup interrupts only after the crtcs are registered properly But please test the entire branch, not just this little hack, thanks.
Oops, new branch tip, the old one had a few missing hunks: commit be6881b4a756d3c238077a7da298278d8dcb0135 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Jun 25 14:32:18 2013 +0200 HACK: setup interrupts only after the crtcs are registered properly v2: Actually git add everything. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65880
I seem to be especially incompetent today. New irq-review branch tip: commit 8d1647eabb428bccbf6584115b1ecbe63b18112a Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Tue Jun 25 14:32:18 2013 +0200 HACK: setup interrupts only after the crtcs are registered properly v2: Actually git add everything. v3: _Really_ git add everything. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=6588
(In reply to comment #9) > I seem to be especially incompetent today. New irq-review branch tip: > > commit 8d1647eabb428bccbf6584115b1ecbe63b18112a > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > Date: Tue Jun 25 14:32:18 2013 +0200 > > HACK: setup interrupts only after the crtcs are registered properly > > v2: Actually git add everything. > > v3: _Really_ git add everything. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=6588 System boot fail with this commit on irq-review branch.
I've tested the irq-review branch on my own haswell, and it boots fine. Do you have any details on where it dies?
Created attachment 81781 [details] boot serial log
Created attachment 82050 [details] [review] hack to reorder irq setup
(In reply to comment #12) > Created attachment 81781 [details] > boot serial log I haven't found any backtrace in here. Can you please retest with the hack patch I've just attached and attach a new dmesg with backtrace (if it's still broken) to the bug report?
Created attachment 82056 [details] boot log
(In reply to comment #15) > Created attachment 82056 [details] > boot log Can you please attach the full dmesg (with debugging enabled), not just the backtrace? Cutting out the backtrace is for pasting into bug report comments for a quick overview in the initial report.
I add "drm.debug=0xe" in grub. Following dmesg is all the log. [ 2.368840] [drm:intel_dp_i2c_aux_ch], [ 2.379442] usb 1-1: new high-spee d USB device number 2 using ehci-pci aux_i2c nack [ 2.507776] ------------[ cut here ]------------ [ 2.563940] kernel BUG at drivers/gpu/drm/i915/i915_irq.c:122! [ 2.634889] invalid opcode: 0000 [#1] SMP [ 2.685015] Modules linked in: i915(+) drm_kms_helper drm button video dm_mir ror dm_region_hash dm_log[ 2.790642] tsc: Refined TSC clocksource calibration : 2095.148 MHz [ 2.790646] Switching to clocksource tsc dm_mod [ 2.929994] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-rc5_irq-review_2 0130705+ #1 [ 3.028413] Hardware name: Intel Corporation Shark Bay Client platform/SawToo th Peak, BIOS HSWLPTU1.86C.0126.R00.1305121957 05/12/2013 [ 3.175432] task: ffffffff81a5c410 ti: ffffffff81a4c000 task.ti: ffffffff81a4 c000 [ 3.266451] RIP: 0010:[<ffffffffa008c3b4>] [<ffffffffa008c3b4>] ivb_can_enab le_err_int+0x20/0x41 [i915] [ 3.381965] RSP: 0018:ffff88015ee03ea0 EFLAGS: 00010046 [ 3.446570] RAX: ffffffffa00d0101 RBX: ffff880158350000 RCX: 000000000000000a [ 3.533430] RDX: ffff880158350000 RSI: 0000000000044028 RDI: ffff880158224000 [ 3.620291] RBP: ffff880158224000 R08: 0000000000000000 R09: ffff88015abe07d8 [ 3.707157] R10: ffff88015abe07d8 R11: ffff88015abe07d8 R12: 0000000000000000 [ 3.794017] R13: 0000000000000001 R14: 00000000ffffffff R15: 0000000000000001 [ 3.880878] FS: 0000000000000000(0000) GS:ffff88015ee00000(0000) knlGS:00000 00000000000 [ 3.979439] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.049401] CR2: 00007fff2da2f6b8 CR3: 000000015800b000 CR4: 00000000001407f0 [ 4.136260] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4.223122] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 4.309986] Stack: [ 4.334513] ffffffffa008fbba ffff88015ee0d218 f4002529812b5a59 ffffffff81c30 040 [ 4.425218] ffff88015a571380 ffff880158196a00 0000000000000000 0000000000000 03b [ 4.515934] 0000000000000000 000000000000f000 ffffffff81082ef3 ffff880158196 a00 [ 4.606649] Call Trace: [ 4.636458] <IRQ> [ 4.659852] [<ffffffffa008fbba>] ? ivybridge_irq_handler+0x42b/0x4b2 [i915] [ 4.748179] [<ffffffff81082ef3>] ? handle_irq_event_percpu+0x24/0x117 [ 4.827650] [<ffffffff81083014>] ? handle_irq_event+0x2e/0x4e [ 4.898668] [<ffffffff81085326>] ? handle_edge_irq+0xbb/0xdc [ 4.968625] [<ffffffff810038b6>] ? handle_irq+0x1a/0x1e [ 5.033300] [<ffffffff8100359f>] ? do_IRQ+0x42/0xa7 [ 5.093751] [<ffffffff816e85aa>] ? common_interrupt+0x6a/0x6a [ 5.164765] <EOI> [ 5.188165] [<ffffffff816029de>] ? poll_idle+0x2a/0x67 [ 5.254270] [<ffffffff816029c6>] ? poll_idle+0x12/0x67 [ 5.317893] [<ffffffff81603d92>] ? menu_select+0x3a/0x41e [ 5.384684] [<ffffffff81602c3d>] ? cpuidle_enter_state+0x43/0xac [ 5.458874] [<ffffffff8105ef81>] ? ktime_get+0x4a/0xa7 [ 5.522500] [<ffffffff81602c31>] ? cpuidle_enter_state+0x37/0xac [ 5.596691] [<ffffffff81602d72>] ? cpuidle_idle_call+0xcc/0x10a [ 5.669826] [<ffffffff810085ec>] ? arch_cpu_idle+0x6/0x17 [ 5.736620] [<ffffffff8105e00d>] ? cpu_startup_entry+0x9f/0xfe [ 5.808698] [<ffffffff81b0cc8c>] ? start_kernel+0x378/0x383 [ 5.877604] [<ffffffff81b0c72d>] ? repair_env_string+0x54/0x54 [ 5.949674] Code: 10 10 10 00 89 c2 e9 41 a9 ff ff 48 8b 97 30 03 00 00 66 8b 82 d4 17 00 00 38 c4 74 0e 48 8b 42 10 8a 48 04 31 c0 83 e1 07 eb 17 <0f> 0b 89 c6 48 8b b4 f2 b8 22 00 00 80 be 18 09 00 00 00 75 09 [ 6.189150] RIP [<ffffffffa008c3b4>] ivb_can_enable_err_int+0x20/0x41 [i915] [ 6.276182] RSP <ffff88015ee03ea0> [ 6.318671] ---[ end trace 64fc785fba98d319 ]--- [ 6.374888] Kernel panic - not syncing: Fatal exception in interrupt
Can you please add the output of addr2line -e drivers/gpu/drm/i915/i915.ko ivb_can_enable_err_int+0x20 for that _exact_ kernel build of the last backtrace? Of course you might need to adjust the path to the i915.ko module.
addr2line -e /lib/modules/3.10.0-rc5_irq-review_20130705+/kernel/drivers/gpu/drm/i915/i915.ko ivb_can_enable_err_int+0x20 output: i915_drv.c:0
This should be -fixed (quite a while ago actually). QA, please confirm.
Fixed on latest commit.
Verified.Fixed.
Closing old verified.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.