Bug 65652 - [HSW] It can't enter to OS due to i915 hang up when warm boot or cold boot sometimes
Summary: [HSW] It can't enter to OS due to i915 hang up when warm boot or cold boot so...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: medium critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-11 13:41 UTC by EvaWang
Modified: 2017-07-24 22:58 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
messages (67.14 KB, text/plain)
2013-06-11 13:41 UTC, EvaWang
no flags Details
/var/log/messages after add drm.debug=7 (96.91 KB, text/plain)
2013-08-15 03:28 UTC, EvaWang
no flags Details

Description EvaWang 2013-06-11 13:41:23 UTC
Created attachment 80688 [details]
messages

It can't enter to OS due to i915 hang up when warm boot or cold boot sometimes on haswell chipset, Fail rate is too high, especially on Ultra books.
Kernel: 3.9.2
Please help to give us some suggestion for this issue.

Below is messages, please refer it.if you need other information, please tell me. Thanks a lot!

[    3.623726] i915 0000:00:02.0: setting latency timer to 64
[    3.649096] [drm:i915_write32] *ERROR* Unknown unclaimed register before writing to c5100
[    3.649246] i915 0000:00:02.0: irq 63 for MSI/MSI-X
[    3.649252] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[    3.649253] [drm] Driver supports precise vblank timestamp query.
[    3.649305] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    3.694679] fbcon: inteldrmfb (fb0) is primary device
[    3.694769] Console: switching to colour frame buffer device 200x56
[    3.694778] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[    3.694780] i915 0000:00:02.0: registered panic notifier
[    3.708855] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[    3.708972] IP: [<ffffffffa06e1371>] i915_driver_load+0xe51/0xe90 [i915]
[    3.709086] PGD 14c18e067 PUD 14f482067 PMD 0 
[    3.709176] Oops: 0000 [#1] SMP 
[    3.709231] Modules linked in: i915(+) bnep bluetooth iTCO_wdt iTCO_vendor_support coretemp crc32c_intel joydev ghash_clmulni_intel microcode pcspkr wl(PO) r8169 mii cfg80211 lib80211 lpc_ich mfd_core i2c_i801 snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_intel wmi snd_hda_codec battery ideapad_laptop rfkill snd_hwdep sparse_keymap video i2c_algo_bit drm_kms_helper snd_pcm snd_page_alloc snd_timer drm snd soundcore i2c_core mac_hid
[    3.709910] CPU 1 
[    3.709953] Pid: 340, comm: modprobe Tainted: P          IO 3.9.2-8.1.1.lp19.x86_64 #1 LENOVO SharkBay/INVALID
[    3.710109] RIP: 0010:[<ffffffffa06e1371>]  [<ffffffffa06e1371>] i915_driver_load+0xe51/0xe90 [i915]
[    3.710230] RSP: 0018:ffff88014c9bd918  EFLAGS: 00010246
[    3.710293] RAX: ffff88014c914c80 RBX: ffff88014c914800 RCX: ffff88014c914c80
[    3.710376] RDX: ffff88014c914c78 RSI: ffff88014c0d6438 RDI: ffff88015a802700
[    3.710457] RBP: ffff88014c9bdaa8 R08: 0000000000016fa0 R09: ffff88015f256fa0
[    3.710538] R10: ffffea0005300a80 R11: ffffffff8141a370 R12: 0000000000000000
[    3.710617] R13: 0000000010000000 R14: 0000000000000000 R15: ffff8801421dc000
[    3.710713] FS:  00007f4a13e0f740(0000) GS:ffff88015f240000(0000) knlGS:0000000000000000
[    3.710807] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.710880] CR2: 0000000000000048 CR3: 000000014c02b000 CR4: 00000000001407e0
[    3.710968] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.711051] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    3.711141] Process modprobe (pid: 340, threadinfo ffff88014c9bc000, task ffff88014c88aec0)
[    3.711245] Stack:
[    3.711273]  ffffffffa074f5f5 ffffffffa074f4cb ffffffffa074f4cb ffffffffa074f4cb
[    3.711372]  ffffffffa074f4cb ffffffffa074f5c1 ffffffffa074f4cb ffffffffa074f4cb
[    3.711479]  ffffffffa074f4cb ffffffffa074f4cb ffffffffa074f4cb ffffffffa074f4cb
[    3.711582] Call Trace:
[    3.711627]  [<ffffffffa0047a96>] drm_get_pci_dev+0x186/0x2d0 [drm]
[    3.711723]  [<ffffffffa06dc33c>] i915_pci_probe+0x3c/0x90 [i915]
[    3.711799]  [<ffffffff813c6c3b>] local_pci_probe+0x4b/0x80
[    3.711867]  [<ffffffff813c6f51>] pci_device_probe+0x111/0x120
[    3.711939]  [<ffffffff8148700b>] driver_probe_device+0x8b/0x390
[    3.712012]  [<ffffffff814873bb>] __driver_attach+0xab/0xb0
[    3.712082]  [<ffffffff81487310>] ? driver_probe_device+0x390/0x390
[    3.712162]  [<ffffffff8148505d>] bus_for_each_dev+0x5d/0xa0
[    3.712231]  [<ffffffff8148696e>] driver_attach+0x1e/0x20
[    3.712297]  [<ffffffff8148650e>] bus_add_driver+0x11e/0x2a0
[    3.712366]  [<ffffffffa080c000>] ? 0xffffffffa080bfff
[    3.712428]  [<ffffffffa080c000>] ? 0xffffffffa080bfff
[    3.712491]  [<ffffffff81487a87>] driver_register+0x77/0x170
[    3.712557]  [<ffffffffa080c000>] ? 0xffffffffa080bfff
[    3.712618]  [<ffffffff813c5edc>] __pci_register_driver+0x4c/0x50
[    3.712720]  [<ffffffffa0047cfa>] drm_pci_init+0x11a/0x130 [drm]
[    3.712803]  [<ffffffffa080c000>] ? 0xffffffffa080bfff
[    3.712887]  [<ffffffffa080c066>] i915_init+0x66/0x68 [i915]
[    3.712968]  [<ffffffff8100215a>] do_one_initcall+0x12a/0x180
[    3.713045]  [<ffffffff810c433e>] load_module+0x1c1e/0x27b0
[    3.713124]  [<ffffffff813bac70>] ? ddebug_proc_open+0xc0/0xc0
[    3.713205]  [<ffffffff810c4fa7>] sys_init_module+0xd7/0x120
[    3.713282]  [<ffffffff816ced19>] system_call_fastpath+0x16/0x1b
[    3.713361] Code: 80 1a 00 00 00 00 00 00 e9 01 f7 ff ff 48 c7 c6 00 f6 74 a0 48 c7 c7 50 c5 73 a0 41 bc fb ff ff ff e8 b4 3d 96 ff e9 f4 f8 ff ff <48> 8b 3c 25 48 00 00 00 48 85 ff 0f 84 ce fb ff ff e9 bd fb ff 
[    3.713781] RIP  [<ffffffffa06e1371>] i915_driver_load+0xe51/0xe90 [i915]
[    3.713877]  RSP <ffff88014c9bd918>
[    3.713920] CR2: 0000000000000048
[    4.739363] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[    5.215475] ---[ end trace 363ca17e08482316 ]---
[    7.719131] fuse init (API version 7.21)
Comment 1 Chris Wilson 2013-06-11 13:51:55 UTC
gdb drivers/gpu/drm/i915.ko
list *i915_driver_load+0xe51
Comment 2 EvaWang 2013-06-13 05:09:44 UTC
(gdb) list *i915_driver_load+0xe51
0x5ac1 is in i915_driver_load (include/linux/pci.h:818).
813		return pci_bus_write_config_word(dev->bus, dev->devfn, where, val);
814	}
815	static inline int pci_write_config_dword(const struct pci_dev *dev, int where,
816						 u32 val)
817	{
818		return pci_bus_write_config_dword(dev->bus, dev->devfn, where, val);
819	}
820	
821	int pcie_capability_read_word(struct pci_dev *dev, int pos, u16 *val);
822	int pcie_capability_read_dword(struct pci_dev *dev, int pos, u32 *val);
(gdb)
Comment 3 EvaWang 2013-06-13 09:27:50 UTC
(gdb) list *i915_driver_load+0xe90
0x5b00 is in i915_driver_load (drivers/gpu/drm/i915/i915_dma.c:1163).
1158          PCIBIOS_MIN_MEM,
1159          0, pcibios_align_resource,
1160          dev_priv->bridge_dev);
1161  if (ret) {
1162   DRM_DEBUG_DRIVER("failed bus alloc: %d\n", ret);
1163   dev_priv->mch_res.start = 0;
1164   return ret;
1165  }
1166 
1167  if (INTEL_INFO(dev)->gen >= 4)
(gdb)
Comment 4 Daniel Vetter 2013-07-11 19:23:09 UTC
Can you please retest with intel_iommu=igfx_off added to the kernel cmdline?
Comment 5 EvaWang 2013-07-16 07:39:39 UTC
After add the intel_iommu=igfx_off  to kernel cmdline, it still fail. error code is the same as before.



Jul 27 15:28:26 localhost kernel: [    3.785949] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 27 15:28:26 localhost kernel: [    3.786038] Process udevd (pid: 129, threadinfo ffff88006e606000, task ffff88006dc14620)
Jul 27 15:28:26 localhost kernel: [    3.786138] Stack:
Jul 27 15:28:26 localhost kernel: [    3.786165]  ffffffffa08dc8f5 ffffffffa08dc7cb ffffffffa08dc7cb ffffffffa08dc7cb
Jul 27 15:28:26 localhost kernel: [    3.786266]  ffffffffa08dc7cb ffffffffa08dc8c1 ffffffffa08dc7cb ffffffffa08dc7cb
Jul 27 15:28:26 localhost kernel: [    3.786369]  ffffffffa08dc7cb ffffffffa08dc7cb ffffffffa08dc7cb ffffffffa08dc7cb
Jul 27 15:28:26 localhost kernel: [    3.786470] Call Trace:
Jul 27 15:28:26 localhost kernel: [    3.786520]  [<ffffffffa0124a96>] drm_get_pci_dev+0x186/0x2d0 [drm]
Jul 27 15:28:26 localhost kernel: [    3.786616]  [<ffffffffa086933c>] i915_pci_probe+0x3c/0x90 [i915]
Jul 27 15:28:26 localhost kernel: [    3.786698]  [<ffffffff813c6c3b>] local_pci_probe+0x4b/0x80
Jul 27 15:28:26 localhost kernel: [    3.786772]  [<ffffffff813c6f51>] pci_device_probe+0x111/0x120
Jul 27 15:28:26 localhost kernel: [    3.786849]  [<ffffffff8148700b>] driver_probe_device+0x8b/0x390
Jul 27 15:28:26 localhost kernel: [    3.786925]  [<ffffffff814873bb>] __driver_attach+0xab/0xb0
Jul 27 15:28:26 localhost kernel: [    3.786997]  [<ffffffff81487310>] ? driver_probe_device+0x390/0x390
Jul 27 15:28:26 localhost kernel: [    3.787076]  [<ffffffff8148505d>] bus_for_each_dev+0x5d/0xa0
Jul 27 15:28:26 localhost kernel: [    3.787149]  [<ffffffff8148696e>] driver_attach+0x1e/0x20
Jul 27 15:28:26 localhost kernel: [    3.787218]  [<ffffffff8148650e>] bus_add_driver+0x11e/0x2a0
Jul 27 15:28:26 localhost kernel: [    3.787293]  [<ffffffffa044a000>] ? 0xffffffffa0449fff
Jul 27 15:28:26 localhost kernel: [    3.787360]  [<ffffffffa044a000>] ? 0xffffffffa0449fff
Jul 27 15:28:26 localhost kernel: [    3.787426]  [<ffffffff81487a87>] driver_register+0x77/0x170
Jul 27 15:28:26 localhost kernel: [    3.787499]  [<ffffffffa044a000>] ? 0xffffffffa0449fff
Jul 27 15:28:26 localhost kernel: [    3.787567]  [<ffffffff813c5edc>] __pci_register_driver+0x4c/0x50
Jul 27 15:28:26 localhost kernel: [    3.787654]  [<ffffffffa0124cfa>] drm_pci_init+0x11a/0x130 [drm]
Jul 27 15:28:26 localhost kernel: [    3.787733]  [<ffffffffa044a000>] ? 0xffffffffa0449fff
Jul 27 15:28:26 localhost kernel: [    3.787812]  [<ffffffffa044a066>] i915_init+0x66/0x68 [i915]
Jul 27 15:28:26 localhost kernel: [    3.787886]  [<ffffffff8100215a>] do_one_initcall+0x12a/0x180
Jul 27 15:28:26 localhost kernel: [    3.787962]  [<ffffffff810c433e>] load_module+0x1c1e/0x27b0
Jul 27 15:28:26 localhost kernel: [    3.788034]  [<ffffffff813bac70>] ? ddebug_proc_open+0xc0/0xc0
Jul 27 15:28:26 localhost kernel: [    3.788110]  [<ffffffff810c4fa7>] sys_init_module+0xd7/0x120
Jul 27 15:28:26 localhost kernel: [    3.788183]  [<ffffffff816cf659>] system_call_fastpath+0x16/0x1b
Jul 27 15:28:26 localhost kernel: [    3.788258] Code: 80 1a 00 00 00 00 00 00 e9 01 f7 ff ff 48 c7 c6 00 c9 8d a0 48 c7 c7 50 98 8c a0 41 bc fb ff ff ff e8 b4 3d 8b ff e9 f4 f8 ff ff <48> 8b 3c 25 48 00 00 00 48 85 ff 0f 84 ce fb ff ff e9 bd fb ff
Jul 27 15:28:26 localhost kernel: [    3.788656] RIP  [<ffffffffa086e371>] i915_driver_load+0xe51/0xe90 [i915]
Jul 27 15:28:26 localhost kernel: [    3.788757]  RSP <ffff88006e607918>
Jul 27 15:28:26 localhost kernel: [    3.788802] CR2: 0000000000000048
Jul 27 15:28:27 localhost kdumpctl[275]: E: Dracut module "rpmversion" cannot be found.
Jul 27 15:28:27 localhost kdumpctl[275]: E: Dracut module "rpmversion" cannot be found.
Jul 27 15:28:27 localhost kdumpctl[275]: i18n
Jul 27 15:28:27 localhost kdumpctl[275]: convertfs
Jul 27 15:28:28 localhost kdumpctl[275]: kernel-modules
Jul 27 15:28:28 localhost kernel: [    5.389082] ---[ end trace a23fd74953912f8f ]---
Jul 27 15:28:28 localhost kernel: [    5.888161] [drm] Enabling RC6 states: RC6 off, RC6p off, RC6pp off
Jul 27 15:28:28 localhost udevd[106]: worker [129] terminated by signal 9 (Killed)
Comment 6 Paulo Zanoni 2013-07-31 14:07:20 UTC
Does this problem also happen with newer Kernels? Could you please test 3.10 or 3.11-rc3?


The first error message while booting is this one:

[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: at drivers/iommu/dmar.c:483 warn_invalid_dmar+0x86/0xa0()
[    0.000000] Hardware name: SharkBay
[    0.000000] Your BIOS is broken; DMAR reported at address 0!
[    0.000000] BIOS vendor: LENOVO; Ver: 7CCN12WW; Product Version: INVALID                         
[    0.000000] Modules linked in:
[    0.000000] Pid: 0, comm: swapper Not tainted 3.9.2-8.1.1.lp19.x86_64 #1
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff8105ed6f>] warn_slowpath_common+0x7f/0xc0
[    0.000000]  [<ffffffff8105ee0f>] warn_slowpath_fmt_taint+0x3f/0x50
[    0.000000]  [<ffffffff81d19f51>] ? early_ioremap+0x13/0x15
[    0.000000]  [<ffffffff81d1160a>] ? __acpi_map_table+0x13/0x1a
[    0.000000]  [<ffffffff81594b26>] warn_invalid_dmar+0x86/0xa0
[    0.000000]  [<ffffffff81d42f89>] check_zero_address+0x57/0xf7
[    0.000000]  [<ffffffff81d43040>] detect_intel_iommu+0x17/0xb9
[    0.000000]  [<ffffffff81d0be50>] pci_iommu_alloc+0x4a/0x72
[    0.000000]  [<ffffffff81d19920>] mem_init+0x15/0x133
[    0.000000]  [<ffffffff81d03cc9>] start_kernel+0x1e3/0x3ff
[    0.000000]  [<ffffffff81d038e5>] ? repair_env_string+0x5e/0x5e
[    0.000000]  [<ffffffff81d035de>] x86_64_start_reservations+0x2a/0x2c
[    0.000000]  [<ffffffff81d036d1>] x86_64_start_kernel+0xf1/0x100
[    0.000000] ---[ end trace 363ca17e08482314 ]---

I wonder if this is related to the gfx problems later.
Comment 7 EvaWang 2013-08-01 02:19:46 UTC
After adjust i915 load time later, the issue can't be reproduced.
Comment 8 Chris Wilson 2013-08-11 11:36:40 UTC
[    3.708972] IP: [<ffffffffa06e1371>] i915_driver_load+0xe51/0xe90 [i915] is very likely during the error path. You might need drm.debug=7 to see if we can catch whatever the driver load is racing against.
Comment 9 EvaWang 2013-08-15 03:28:17 UTC
Created attachment 84082 [details]
/var/log/messages after add drm.debug=7
Comment 10 Rodrigo Vivi 2013-12-17 14:12:46 UTC
Hi EvaWang, do you still see this issue with newer kernels?

Thanks,
Rodrigo.
Comment 11 EvaWang 2013-12-18 01:39:18 UTC
Hi Rodrigo, we didn't find the issue on new kernel 3.12. Thanks!
Comment 12 Chris Wilson 2013-12-30 13:24:25 UTC
Gone. Not surprising since it was an impossible bug.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.