Bug 41705

Summary:

[ILK] Multiple hibernate/thaw cycles cause kernel errors with Intel KMS (Ironlake graphics on ThinkPad T510)

Product:

DRI

Reporter:

Bojan Smojver <bojan>

Component:

DRM/Intel

Assignee:

Chris Wilson <chris>

Status:

CLOSED FIXED

QA Contact:

Severity:

major

Priority:

medium

CC:

ben, chris, daniel, eugeni, jbarnes

Version:

unspecified

Hardware:

x86-64 (AMD64)

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
Output of lspci -vv	none

Description Bojan Smojver 2011-10-11 20:08:53 UTC

Created attachment 52250 [details]
Output of lspci -vv

As per discussion on intel-gfx list, repeated hibernate/thaw cycles cause kernel errors, which eventually crash the machine.

The threads:

http://lists.freedesktop.org/archives/intel-gfx/2011-September/012276.html
http://lists.freedesktop.org/archives/intel-gfx/2011-October/012402.html
http://lists.freedesktop.org/archives/intel-gfx/2011-October/012548.html

The only thing that is known for sure is that when nomodeset is passed into the kernel (tested with 3.1.0-rc9), the problem does not occur.

One suggestion was to try the patch from bug #40241, i.e.
https://bugs.freedesktop.org/attachment.cgi?id=50648. This did not help.

Another suggestion was to add memmap=2M#512M memmap=2M#1024M to the kernel command line. It did not help. A typical kernel dump happened after 20 or so hibernate/thaw cycles.

One example of what happens is below:
-----------------------
[  175.770300] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[  175.774934] IP: [<ffffffff81243038>] prio_tree_replace+0x4b/0x66
[  175.779296] PGD 1f88d0067 PUD 1f88af067 PMD 0 
[  175.783593] Oops: 0002 [#1] SMP 
[  175.788025] CPU 2 
[  175.788055] Modules linked in: fuse ppdev parport_pc lp parport sunrpc bnep bluetooth cpufreq_ondemand acpi_cpufreq freq_table mperf arc4 iwlagn snd_hda_codec_hdmi mac80211 snd_hda_codec_conexant uvcvideo snd_hda_intel snd_hda_codec videodev snd_hwdep media snd_seq qcserial v4l2_compat_ioctl32 usb_wwan snd_seq_device snd_pcm cfg80211 thinkpad_acpi snd_timer e1000e intel_ips iTCO_wdt iTCO_vendor_support joydev mxm_wmi snd_page_alloc snd i2c_i801 wmi rfkill pcspkr microcode soundcore ipv6 firewire_ohci sdhci_pci sdhci mmc_core firewire_core crc_itu_t i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
[  175.810238] 
[  175.814616] Pid: 3763, comm: gcm-apply Not tainted 3.1.0-rc9+ #105 LENOVO 4313CTO/4313CTO
[  175.819031] RIP: 0010:[<ffffffff81243038>]  [<ffffffff81243038>] prio_tree_replace+0x4b/0x66
[  175.823327] RSP: 0018:ffff8801f8bbfce8  EFLAGS: 00010207
[  175.827099] RAX: ffff880229b84100 RBX: ffff8801f8bfb100 RCX: 0000000000000000
[  175.829260] RDX: ffff880229b84050 RSI: ffff880229b84100 RDI: ffff88022c12b318
[  175.831374] RBP: ffff8801f8bbfce8 R08: ffff880229b84100 R09: 0000000000000000
[  175.833440] R10: ffff8801f8bfbd48 R11: ffff8801f8bfbd10 R12: ffff880229b84050
[  175.835388] R13: ffff88022c12b318 R14: 0000000000000080 R15: 0000000000000000
[  175.837373] FS:  0000000000000000(0000) GS:ffff88023bd00000(0000) knlGS:0000000000000000
[  175.839407] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  175.841446] CR2: 0000000000000010 CR3: 00000002085a1000 CR4: 00000000000006e0
[  175.843471] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  175.845556] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  175.847616] Process gcm-apply (pid: 3763, threadinfo ffff8801f8bbe000, task ffff8801a864ae40)
[  175.849738] Stack:
[  175.851809]  ffff8801f8bbfd48 ffffffff8124327a ffff880229b84100 000000000000003d
[  175.853961]  000000000000003f 0000000000000000 000000000000003d ffff8801f8bfb0b0
[  175.856109]  ffff8801f8bfb100 ffff8801f8bfbd50 ffff88022c12b2f8 ffff8801f8bfbd48
[  175.858296] Call Trace:
[  175.860435]  [<ffffffff8124327a>] prio_tree_insert+0x16b/0x216
[  175.862599]  [<ffffffff810f12bf>] vma_prio_tree_insert+0x26/0x3c
[  175.864759]  [<ffffffff810fdc3f>] __vma_link_file+0x64/0x66
[  175.866899]  [<ffffffff810fe32e>] vma_link+0x75/0x95
[  175.869014]  [<ffffffff810ffd9a>] mmap_region+0x30a/0x46b
[  175.871114]  [<ffffffff81100194>] do_mmap_pgoff+0x299/0x2f3
[  175.873205]  [<ffffffff81100303>] sys_mmap_pgoff+0x115/0x164
[  175.875334]  [<ffffffff810126d0>] sys_mmap+0x22/0x24
[  175.877409]  [<ffffffff8149dd42>] system_call_fastpath+0x16/0x1b
[  175.879523] Code: 0f 0b 48 89 17 eb 16 48 89 4a 10 48 8b 4e 10 48 39 31 75 05 48 89 11 eb 04 48 89 51 08 48 8b 08 48 39 c1 74 0a 48 89 0a 48 8b 08 
[  175.879735]  89 51 10 48 8b 48 08 48 39 c1 74 0c 48 89 4a 08 48 8b 48 08 
[  175.884169] RIP  [<ffffffff81243038>] prio_tree_replace+0x4b/0x66
[  175.886469]  RSP <ffff8801f8bbfce8>
[  175.888702] CR2: 0000000000000010
[  176.196577] ---[ end trace 75df9d9a11de8acd ]---
[  178.928408] PM: Marking nosave pages: 000000000009e000 - 0000000000100000
[  178.928418] PM: Marking nosave pages: 00000000bb27c000 - 00000000bb282000
[  178.928422] PM: Marking nosave pages: 00000000bb35f000 - 00000000bb40f000
[  178.928430] PM: Marking nosave pages: 00000000bb46f000 - 00000000bb70f000
[  178.928446] PM: Marking nosave pages: 00000000bb717000 - 00000000bb71f000
[  178.928451] PM: Marking nosave pages: 00000000bb76c000 - 00000000bb7ff000
[  178.928457] PM: Marking nosave pages: 00000000bb800000 - 0000000100000000
[  178.930551] PM: Marking nosave pages: 00000001fc000000 - 0000000200000000
[  178.930855] PM: Basic memory bitmaps created
[  178.930859] PM: Syncing filesystems ... done.
[  179.006362] Freezing user space processes ... 
[  198.991114] Freezing of tasks failed after 20.00 seconds (1 tasks refusing to freeze, wq_busy=0):
[  198.991165] gcm-apply       D 0000000000000000     0  3763      1 0x00800084
[  198.991175]  ffff8801f8bbf860 0000000000000086 0000000000000000 ffff880100000000
[  198.991185]  ffff8801a864ae40 ffff8801f8bbffd8 ffff8801f8bbffd8 0000000000012d00
[  198.991194]  ffff88022e5d1720 ffff8801a864ae40 ffff8801a864b2c0 0000000100000002
[  198.991203] Call Trace:
[  198.991213]  [<ffffffff8149612f>] schedule+0x5a/0x5c
[  198.991218]  [<ffffffff81497514>] rwsem_down_failed_common+0xd3/0x105
[  198.991222]  [<ffffffff814975fc>] ? _raw_spin_unlock_irqrestore+0x17/0x19
[  198.991225]  [<ffffffff8149756d>] rwsem_down_read_failed+0x12/0x14
[  198.991231]  [<ffffffff8124a844>] call_rwsem_down_read_failed+0x14/0x30
[  198.991235]  [<ffffffff81496ca4>] ? down_read+0x21/0x25
[  198.991240]  [<ffffffff81091d21>] acct_collect+0x4a/0x182
[  198.991246]  [<ffffffff8105af7c>] do_exit+0x21e/0x722
[  198.991249]  [<ffffffff810591ac>] ? kmsg_dump+0x4b/0xd7
[  198.991253]  [<ffffffff8149876e>] oops_end+0xbc/0xc5
[  198.991256]  [<ffffffff8148ddad>] no_context+0x203/0x212
[  198.991259]  [<ffffffff8148df87>] __bad_area_nosemaphore+0x1cb/0x1ec
[  198.991263]  [<ffffffff8108a8cf>] ? search_module_extables+0x3f/0x69
[  198.991266]  [<ffffffff8148dfbb>] bad_area_nosemaphore+0x13/0x15
[  198.991270]  [<ffffffff8149a716>] do_page_fault+0x1b8/0x37e
[  198.991276]  [<ffffffff811242ae>] ? lookup_page_cgroup+0x28/0x3e
[  198.991282]  [<ffffffff8116ed27>] ? dquot_file_open+0x1b/0x3e
[  198.991285]  [<ffffffff81497c75>] page_fault+0x25/0x30
[  198.991289]  [<ffffffff81243038>] ? prio_tree_replace+0x4b/0x66
[  198.991292]  [<ffffffff8124327a>] prio_tree_insert+0x16b/0x216
[  198.991297]  [<ffffffff810f12bf>] vma_prio_tree_insert+0x26/0x3c
[  198.991302]  [<ffffffff810fdc3f>] __vma_link_file+0x64/0x66
[  198.991305]  [<ffffffff810fe32e>] vma_link+0x75/0x95
[  198.991309]  [<ffffffff810ffd9a>] mmap_region+0x30a/0x46b
[  198.991312]  [<ffffffff81100194>] do_mmap_pgoff+0x299/0x2f3
[  198.991315]  [<ffffffff81100303>] sys_mmap_pgoff+0x115/0x164
[  198.991322]  [<ffffffff810126d0>] sys_mmap+0x22/0x24
[  198.991326]  [<ffffffff8149dd42>] system_call_fastpath+0x16/0x1b
[  198.991330] 
[  198.991331] Restarting tasks ... done.
[  198.992966] PM: Basic memory bitmaps freed
[  198.996762] video LNXVIDEO:00: Restoring backlight state
[  204.166989] PM: Marking nosave pages: 000000000009e000 - 0000000000100000
[  204.166994] PM: Marking nosave pages: 00000000bb27c000 - 00000000bb282000
[  204.166997] PM: Marking nosave pages: 00000000bb35f000 - 00000000bb40f000
[  204.167001] PM: Marking nosave pages: 00000000bb46f000 - 00000000bb70f000
[  204.167014] PM: Marking nosave pages: 00000000bb717000 - 00000000bb71f000
[  204.167016] PM: Marking nosave pages: 00000000bb76c000 - 00000000bb7ff000
[  204.167021] PM: Marking nosave pages: 00000000bb800000 - 0000000100000000
[  204.168816] PM: Marking nosave pages: 00000001fc000000 - 0000000200000000
[  204.169060] PM: Basic memory bitmaps created
[  204.169061] PM: Syncing filesystems ... done.
[  204.242358] Freezing user space processes ... 
[  224.228897] Freezing of tasks failed after 20.00 seconds (3 tasks refusing to freeze, wq_busy=0):
[  224.230914] gnome-settings- D 0000000000000000     0  1716   1553 0x00800084
[  224.232935]  ffff88022a76bcd0 0000000000000086 0000000008100073 ffffea0000000000
[  224.234996]  ffff88022d059720 ffff88022a76bfd8 ffff88022a76bfd8 0000000000012d00
[  224.237043]  ffffffff81a0d020 ffff88022d059720 00000000817c1372 0000000100000001
[  224.239067] Call Trace:
[  224.241040]  [<ffffffff810f7b97>] ? pmd_offset+0x19/0x3f
[  224.243014]  [<ffffffff8149612f>] schedule+0x5a/0x5c
[  224.244981]  [<ffffffff81496843>] __mutex_lock_common+0x102/0x163
[  224.246946]  [<ffffffff814969dc>] __mutex_lock_slowpath+0x1b/0x1d
[  224.248883]  [<ffffffff81496970>] mutex_lock+0x23/0x37
[  224.250799]  [<ffffffff81055b52>] dup_mm+0x2da/0x488
[  224.252704]  [<ffffffff810566db>] copy_process+0x9b1/0x119c
[  224.254592]  [<ffffffff811fc8dd>] ? security_file_alloc+0x16/0x18
[  224.256473]  [<ffffffff81056ff0>] do_fork+0xef/0x22d
[  224.258329]  [<ffffffff81497596>] ? _raw_spin_lock+0xe/0x10
[  224.260184]  [<ffffffff811309b6>] ? path_put+0x1f/0x23
[  224.262024]  [<ffffffff81016336>] sys_clone+0x28/0x2a
[  224.263836]  [<ffffffff8149e063>] stub_clone+0x13/0x20
[  224.265631]  [<ffffffff8149dd42>] ? system_call_fastpath+0x16/0x1b
[  224.267423] gnome-settings- D ffff88022ab2c700     0  1721   1553 0x00800084
[  224.269227]  ffff8802286c3890 0000000000000086 0000000000000000 0000000000000000
[  224.271071]  ffff88022a6a2e40 ffff8802286c3fd8 ffff8802286c3fd8 0000000000012d00
[  224.272912]  ffff880219774560 ffff88022a6a2e40 0000000000000000 0000000000000000
[  224.274740] Call Trace:
[  224.276530]  [<ffffffff8149612f>] schedule+0x5a/0x5c
[  224.278320]  [<ffffffff81497514>] rwsem_down_failed_common+0xd3/0x105
[  224.280105]  [<ffffffff8149756d>] rwsem_down_read_failed+0x12/0x14
[  224.281874]  [<ffffffff8124a844>] call_rwsem_down_read_failed+0x14/0x30
[  224.283615]  [<ffffffff81497a2d>] ? restore_args+0x30/0x30
[  224.285331]  [<ffffffff81496ca4>] ? down_read+0x21/0x25
[  224.287026]  [<ffffffff81497a2d>] ? restore_args+0x30/0x30
[  224.288707]  [<ffffffff8149a723>] do_page_fault+0x1c5/0x37e
[  224.290389]  [<ffffffff81495e9e>] ? __schedule+0x63b/0x669
[  224.292060]  [<ffffffff81075d75>] ? __remove_hrtimer+0x5c/0x83
[  224.293718]  [<ffffffff8149612f>] ? schedule+0x5a/0x5c
[  224.295365]  [<ffffffff81497c75>] page_fault+0x25/0x30
[  224.297025]  [<ffffffff811377e4>] ? do_sys_poll+0x32c/0x389
[  224.298694]  [<ffffffff811377cf>] ? do_sys_poll+0x317/0x389
[  224.300340]  [<ffffffff811368f4>] ? poll_freewait+0xaa/0xaa
[  224.301971]  [<ffffffff811369c0>] ? __pollwait+0xcc/0xcc
[  224.303588]  [<ffffffff814975fc>] ? _raw_spin_unlock_irqrestore+0x17/0x19
[  224.305195]  [<ffffffff8105137e>] ? select_task_rq_fair+0x3cc/0x658
[  224.306785]  [<ffffffff8102ab9c>] ? _flat_send_IPI_mask+0x7b/0x84
[  224.308363]  [<ffffffff8105c41f>] ? current_fs_time+0x37/0x3e
[  224.309931]  [<ffffffff8113afec>] ? touch_atime+0xf8/0x113
[  224.311483]  [<ffffffff81082322>] ? get_futex_key+0x8e/0x274
[  224.313019]  [<ffffffff81082a62>] ? futex_wake+0xfe/0x110
[  224.314539]  [<ffffffff811559d4>] ? fsnotify+0x1eb/0x217
[  224.316046]  [<ffffffff811378e4>] sys_poll+0x51/0xbb
[  224.317618]  [<ffffffff8149dd42>] system_call_fastpath+0x16/0x1b
[  224.319148] gcm-apply       D 0000000000000000     0  3763      1 0x00800084
[  224.320644]  ffff8801f8bbf860 0000000000000086 0000000000000000 ffff880100000000
[  224.322166]  ffff8801a864ae40 ffff8801f8bbffd8 ffff8801f8bbffd8 0000000000012d00
[  224.323680]  ffff88022e5d1720 ffff8801a864ae40 ffff8801a864b2c0 0000000100000002
[  224.325181] Call Trace:
[  224.326640]  [<ffffffff8149612f>] schedule+0x5a/0x5c
[  224.328098]  [<ffffffff81497514>] rwsem_down_failed_common+0xd3/0x105
[  224.329552]  [<ffffffff814975fc>] ? _raw_spin_unlock_irqrestore+0x17/0x19
[  224.330990]  [<ffffffff8149756d>] rwsem_down_read_failed+0x12/0x14
[  224.332418]  [<ffffffff8124a844>] call_rwsem_down_read_failed+0x14/0x30
[  224.333831]  [<ffffffff81496ca4>] ? down_read+0x21/0x25
[  224.335243]  [<ffffffff81091d21>] acct_collect+0x4a/0x182
[  224.336650]  [<ffffffff8105af7c>] do_exit+0x21e/0x722
[  224.338052]  [<ffffffff810591ac>] ? kmsg_dump+0x4b/0xd7
[  224.339455]  [<ffffffff8149876e>] oops_end+0xbc/0xc5
[  224.340861]  [<ffffffff8148ddad>] no_context+0x203/0x212
[  224.342356]  [<ffffffff8148df87>] __bad_area_nosemaphore+0x1cb/0x1ec
[  224.342361]  [<ffffffff8108a8cf>] ? search_module_extables+0x3f/0x69
[  224.342366]  [<ffffffff8148dfbb>] bad_area_nosemaphore+0x13/0x15
[  224.342373]  [<ffffffff8149a716>] do_page_fault+0x1b8/0x37e
[  224.342378]  [<ffffffff811242ae>] ? lookup_page_cgroup+0x28/0x3e
[  224.342382]  [<ffffffff8116ed27>] ? dquot_file_open+0x1b/0x3e
[  224.342385]  [<ffffffff81497c75>] page_fault+0x25/0x30
[  224.342387]  [<ffffffff81243038>] ? prio_tree_replace+0x4b/0x66
[  224.342389]  [<ffffffff8124327a>] prio_tree_insert+0x16b/0x216
[  224.342392]  [<ffffffff810f12bf>] vma_prio_tree_insert+0x26/0x3c
[  224.342395]  [<ffffffff810fdc3f>] __vma_link_file+0x64/0x66
[  224.342397]  [<ffffffff810fe32e>] vma_link+0x75/0x95
[  224.342399]  [<ffffffff810ffd9a>] mmap_region+0x30a/0x46b
[  224.342402]  [<ffffffff81100194>] do_mmap_pgoff+0x299/0x2f3
[  224.342404]  [<ffffffff81100303>] sys_mmap_pgoff+0x115/0x164
[  224.342409]  [<ffffffff810126d0>] sys_mmap+0x22/0x24
[  224.342411]  [<ffffffff8149dd42>] system_call_fastpath+0x16/0x1b
[  224.342414] 
[  224.342414] Restarting tasks ... done.
[  224.344753] PM: Basic memory bitmaps freed
[  224.349953] video LNXVIDEO:00: Restoring backlight state
-----------------------

Output of lspci -vv is attached.

Comment 1 Bojan Smojver 2011-10-11 20:10:27 UTC

Smolt profile:

http://www.smolts.org/client/show/pub_ebd16c9b-ba21-4d39-964a-cfd361713146

Comment 2 Bojan Smojver 2011-10-13 18:31:25 UTC

I tested hibernation/thaw cycles on one of my old machines, an HP Pavilion ZE4201 notebook, which has integrated Radeon graphics (IGP 340M, which is RS200 chip). This box is a 32-bit machine, as opposed to my ThinkPad T510, which is running 64-bit stuff.

Similar behaviour on hibernate/thaw - with nomodeset, no trouble. With KMS, trouble after a few hibernate/thaw cycles (NULL pointers and other kernel dumps onto the console). Interesting. Maybe the problem is not Intel specific after all.

Also, removing intel_ips module does not help.

Comment 3 Eugeni Dodonov 2011-10-19 06:20:47 UTC

Looks like bug #40241, which is also caused by KMS.

Comment 4 Bojan Smojver 2011-10-19 14:30:22 UTC

(In reply to comment #3)
> Looks like bug #40241, which is also caused by KMS.

Yeah, it is. You can see my posts there too (unfortunately).

I opened a separate bug, because I was asked to do so on the intel-gfx list.

As you can see from comment #2, I am suspecting now that this is not Intel specific, but rather something common to KMS code, because I got very similar symptoms on a machine that uses radeon driver (also integrated graphics, BTW). Of course, I have no proof of this, just a feeling.

Comment 5 Eugeni Dodonov 2011-10-19 15:25:23 UTC

If it is coming through KMS, it could probably be not intel-specific indeed.

I think Jesse is working on a patch to disable KMS before suspending, this way we'll be able to rule out this possibility.

Thanks for those reports, I hope we'll be able to fix it soon!

Comment 6 Bojan Smojver 2011-10-19 15:30:19 UTC

(In reply to comment #5)
> If it is coming through KMS, it could probably be not intel-specific indeed.

Obviously, I do not understand this code enough to tell - I'm just guessing. So, use salt in abundance. :-)

All I know is that on two of my machines, with different graphics hardware, hibernation works properly when I pass nomodeset to the kernel. If I leave that option out, I get trouble after several hibernate/thaw cycles.

> I think Jesse is working on a patch to disable KMS before suspending, this way
> we'll be able to rule out this possibility.
> 
> Thanks for those reports, I hope we'll be able to fix it soon!

Cool. I'm ready to test anything you may have!

Comment 7 Bojan Smojver 2011-10-20 21:40:53 UTC

In an effort to confirm that this really is a regression, I remembered an old bug, that was fixed during Fedora 13:

https://bugzilla.redhat.com/show_bug.cgi?id=537494

You will see from https://bugzilla.redhat.com/show_bug.cgi?id=537494#c67 there that another person confirmed that the problem was indeed fixed with a particular kernel release.

So, I downloaded kernel-2.6.34.9-69.fc13.x86_64.rpm from Fedora 13 updates, installed it (into Fedora 15 - yeah, I know crazy) and did over 50 hibernate/thaw cycles on my ThinkPad T510. I am writing these comments from that session. No kernel errors, no segfaults - works fine.

So, it does look like a regression, or at least looks like things have gotten worse (much worse) with newer kernels.

Comment 8 Bojan Smojver 2011-10-25 20:45:37 UTC

Just to follow up on this a bit, I cloned Linus' tree as of today (i.e.
currently staged stuff for 3.2) then pulled Keith's tree
(git://people.freedesktop.org/~keithp/linux drm-intel-next) over the top
and compiled. Did 26 hibernate/thaw cycles and then went to check the
machine.

Unfortunately, I then got:
---------------------------
[  729.195407] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[  729.199345] IP: [<ffffffff8125121d>] __list_add+0x14/0x7f
[  729.203288] PGD 0 
[  729.207051] Oops: 0000 [#1] SMP 
[  729.210874] CPU 0 
[  729.210901] Modules linked in: fuse ppdev parport_pc lp parport bnep bluetooth sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm thinkpad_acpi snd_timer e1000e uvcvideo rfkill snd videodev media v4l2_compat_ioctl32 qcserial usb_wwan mxm_wmi wmi snd_page_alloc microcode i2c_i801 iTCO_wdt iTCO_vendor_support pcspkr intel_ips soundcore joydev ipv6 firewire_ohci firewire_core crc_itu_t sdhci_pci sdhci mmc_core i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
[  729.231392] 
[  729.235351] Pid: 897, comm: dbus-daemon Tainted: G        W   3.1.0+ #2 LENOVO 4313CTO/4313CTO
[  729.239316] RIP: 0010:[<ffffffff8125121d>]  [<ffffffff8125121d>] __list_add+0x14/0x7f
[  729.243143] RSP: 0018:ffff88022ce57d60  EFLAGS: 00010286
[  729.246941] RAX: ffff8801ab1454d0 RBX: 0000000000000000 RCX: 0000000000000054
[  729.250770] RDX: 0000000000000000 RSI: ffff880229777100 RDI: ffff8801ab145520
[  729.254604] RBP: ffff88022ce57d80 R08: ffff88020c7f28e8 R09: 00007f0aaeda4000
[  729.258294] R10: 0000000000015ff8 R11: 0000000000015fa8 R12: ffff880229777100
[  729.261820] R13: ffff8801ab145520 R14: ffff8802297770b0 R15: ffff8801ab1450b0
[  729.265401] FS:  00007f0aaed80800(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000
[  729.269032] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  729.272674] CR2: 0000000000000008 CR3: 000000022d275000 CR4: 00000000000006f0
[  729.276338] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  729.279988] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  729.283616] Process dbus-daemon (pid: 897, threadinfo ffff88022ce56000, task ffff88022ca55c80)
[  729.287278] Stack:
[  729.290885]  ffff88020c7f28e8 ffff88022c192d80 ffff88022a0bd500 ffff8801ab1454d0
[  729.294593]  ffff88022ce57d90 ffffffff810f1fe5 ffff88022ce57e10 ffffffff81055b6b
[  729.298289]  0000000000000000 ffff8801ab1450e8 ffff8801ab1450f0 ffff8801ab1450c8
[  729.301966] Call Trace:
[  729.305664]  [<ffffffff810f1fe5>] vma_prio_tree_add+0x81/0x95
[  729.309405]  [<ffffffff81055b6b>] dup_mm+0x2f3/0x488
[  729.313143]  [<ffffffff810566db>] copy_process+0x9b1/0x119c
[  729.316888]  [<ffffffff811fdca6>] ? security_file_alloc+0x16/0x18
[  729.320631]  [<ffffffff81129db5>] ? get_empty_filp+0xa4/0x133
[  729.324351]  [<ffffffff81056ff0>] do_fork+0xef/0x22d
[  729.328029]  [<ffffffff813daf9e>] ? sock_alloc_file+0xb3/0x114
[  729.331672]  [<ffffffff810440eb>] ? should_resched+0xe/0x2d
[  729.335300]  [<ffffffff8149a9bd>] ? _cond_resched+0xe/0x22
[  729.338914]  [<ffffffff813d9388>] ? might_fault+0xe/0x10
[  729.342532]  [<ffffffff81016336>] sys_clone+0x28/0x2a
[  729.346128]  [<ffffffff814a2ae3>] stub_clone+0x13/0x20
[  729.349661]  [<ffffffff814a27c2>] ? system_call_fastpath+0x16/0x1b
[  729.353249] Code: ad de 48 b9 00 02 20 00 00 00 ad de 48 89 13 48 89 4b 08 5e 5b 5d c3 55 48 89 e5 41 55 49 89 fd 41 54 49 89 f4 53 48 89 d3 41 50 <4c> 8b 42 08 49 39 f0 74 20 49 89 d1 48 89 f1 48 c7 c2 98 15 7e 
[  729.361220] RIP  [<ffffffff8125121d>] __list_add+0x14/0x7f
[  729.365218]  RSP <ffff88022ce57d60>
[  729.369206] CR2: 0000000000000008
[  729.439968] ---[ end trace a0f13f2533f6746a ]---
---------------------------

Followed by machine becoming weird and throwing a whole lot more kernel
errors, which I could not capture any more.

The only other error was unrelated. It looked like this:
---------------------------
[  272.029435] ------------[ cut here ]------------
[  272.029441] WARNING: at drivers/net/ethernet/intel/e1000e/ich8lan.c:870 e1000_acquire_swflag_ich8lan+0x4f/0x143 [e1000e]()
[  272.029443] Hardware name: 4313CTO
[  272.029445] e1000e: eth0: contention for Phy access
[  272.029446] Modules linked in: fuse ppdev parport_pc lp parport bnep bluetooth sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm thinkpad_acpi snd_timer e1000e uvcvideo rfkill snd videodev media v4l2_compat_ioctl32 qcserial usb_wwan mxm_wmi wmi snd_page_alloc microcode i2c_i801 iTCO_wdt iTCO_vendor_support pcspkr intel_ips soundcore joydev ipv6 firewire_ohci firewire_core crc_itu_t sdhci_pci sdhci mmc_core i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
[  272.029480] Pid: 5712, comm: kworker/1:4 Tainted: G        W   3.1.0+ #2
[  272.029482] Call Trace:
[  272.029485]  [<ffffffff81057a36>] warn_slowpath_common+0x83/0x9b
[  272.029488]  [<ffffffff81057af1>] warn_slowpath_fmt+0x46/0x48
[  272.029492]  [<ffffffff81084f41>] ? smp_call_function_single+0x97/0xfd
[  272.029499]  [<ffffffffa0225767>] e1000_acquire_swflag_ich8lan+0x4f/0x143 [e1000e]
[  272.029508]  [<ffffffffa022e5d0>] __e1000_read_phy_reg_hv+0x4d/0x157 [e1000e]
[  272.029518]  [<ffffffffa022ee8a>] e1000_read_phy_reg_hv+0x13/0x15 [e1000e]
[  272.029527]  [<ffffffffa02327b3>] e1000_phy_read_status+0xf6/0x163 [e1000e]
[  272.029537]  [<ffffffffa0236df2>] e1000_watchdog_task+0x104/0x5d2 [e1000e]
[  272.029540]  [<ffffffff8149a93e>] ? __schedule+0x63b/0x669
[  272.029550]  [<ffffffffa0236cee>] ? e1000_update_mng_vlan+0x68/0x68 [e1000e]
[  272.029554]  [<ffffffff8106eab0>] process_one_work+0x176/0x2a9
[  272.029559]  [<ffffffff8106f5be>] worker_thread+0xda/0x15d
[  272.029562]  [<ffffffff8106f4e4>] ? manage_workers+0x176/0x176
[  272.029565]  [<ffffffff81072a0b>] kthread+0x84/0x8c
[  272.029568]  [<ffffffff814a4934>] kernel_thread_helper+0x4/0x10
[  272.029572]  [<ffffffff81072987>] ? kthread_worker_fn+0x148/0x148
[  272.029575]  [<ffffffff814a4930>] ? gs_change+0x13/0x13
[  272.029577] ---[ end trace a0f13f2533f67469 ]---
---------------------------

So, yeah, still there with the latest code.

Comment 9 Daniel Vetter 2012-02-22 04:54:59 UTC

Please try this patch

https://bugs.freedesktop.org/attachment.cgi?id=57170

Comment 10 Bojan Smojver 2012-02-22 14:20:59 UTC

(In reply to comment #9)
> Please try this patch
> 
> https://bugs.freedesktop.org/attachment.cgi?id=57170

Same. See:

https://bugs.freedesktop.org/show_bug.cgi?id=40241#c14

Comment 11 Daniel Vetter 2012-03-31 14:26:22 UTC

Most likely this is cause by fbcon writes after devices supsend. Fixed with

commit 3fa016a0b5c5237e9c387fc3249592b2cb5391c6
Author: Dave Airlie <airlied@redhat.com>
Date:   Wed Mar 28 10:48:49 2012 +0100

    drm/i915: suspend fbdev device around suspend/hibernate

... which is included in 3.4-rc1. Please test that and reopen if you still experience issues.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.