82864 – [HSW i915 MSI-7817] S4 resume on Haswell causes memory corruption (OOM, ext4_, ...)

Bug 82864 - [HSW i915 MSI-7817] S4 resume on Haswell causes memory corruption (OOM, ext4_, ...)

Summary: [HSW i915 MSI-7817] S4 resume on Haswell causes memory corruption (OOM, ext4_...

Status:	CLOSED WORKSFORME

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-08-20 11:46 UTC by Jens
Modified:	2016-10-19 11:13 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:
i915 features:

Attachments
3.17-rc1 Crash during hibernate/resume, OOM failures (28.43 KB, text/plain) 2014-08-20 11:46 UTC, Jens	no flags	Details
3.17-rc1 crash "Watchdog detected hard LOCKUP on CPU #x" (126.46 KB, text/plain) 2014-08-20 11:47 UTC, Jens	no flags	Details
dmesg output of "preload: Corrupted page table at address 1dd4146" (9.18 KB, text/plain) 2014-10-27 20:35 UTC, Jens	no flags	Details
View All

Description Jens 2014-08-20 11:46:24 UTC

Created attachment 104974 [details]
3.17-rc1 Crash during hibernate/resume, OOM failures

I own several Haswell based PCs with MSI-7817 chipset, i5-4570 CPU and 8G RAM. On all of these, before 3.17rc1 I had the problems described in Bug #78424. Since 3.17-rc1 (commit 09fcefee..), the symptoms changed: 

* No more WARNING: messages upon resume
* Multiple resumes work fine, but:
* About one in every fifth resume the machine grinds to a halt with various errors (OOM killer or ext4 related functions in the backtrace or ...)

Setup:
* Ubuntu 14.04 LTS, Kernel 3.16.0+ (3.17rc1 as of now)
* MSI-7817 chipset with i5-i4570
* Boot Lubuntu desktop
* start "make -j4" in the above git checkout
* start Firefox with Youtube video
* hibernate and resume in a loop

I attached dmesg output of a few cases. Also, just today I got the following via netconsole after five successful hibernation/resume cycles:

--- 8< --- cut here --- 8< ---
[  976.247858] CPU: 2 PID: 7160 Comm: rm Tainted: G        W   E  3.17.0-rc1+ #5
[  976.247876] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5 05/30/2014
[  976.247895] task: ffff8800c89ee400 ti: ffff880133e08000 task.ti: ffff880133e08000
[  976.247914] RIP: 0010:[<ffffffff811ba875>]  [<ffffffff811ba875>] kmem_cache_alloc+0x75/0x1e0
[  976.247938] RSP: 0018:ffff880133e0bbb8  EFLAGS: 00010286
[  976.247951] RAX: 0000000000000000 RBX: ffff88021ea5d340 RCX: 0000000000000338
[  976.247969] RDX: 0000000000000337 RSI: 0000000000000050 RDI: ffff88020fdf6600
[  976.247986] RBP: ffff880133e0bbe8 R08: 000000000001b200 R09: ffffffff8128e632
[  976.248002] R10: ffff880133e0bb38 R11: ffffea00083c3040 R12: 0031000000300000
[  976.248019] R13: 0000000000000050 R14: ffff88020fdf6600 R15: ffff88020fdf6600
[  976.248037] FS:  00002b3156a34b80(0000) GS:ffff88021eb00000(0000) knlGS:0000000000000000
[  976.248056] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  976.248070] CR2: 00002b3156d5dfa0 CR3: 000000011799e000 CR4: 00000000001407e0
[  976.248087] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  976.248103] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  976.248120] Stack:
[  976.248125]  ffffffff8128e632 ffff88021ea5d340 ffff88020fe9d800 0000000000000000
[  976.248145]  0000000004cd599c 000000000000599c ffff880133e0bcc0 ffffffff8128e632
[  976.248166]  ffffea00083c3040 0000000000000000 0000000100000000 0000000004cd59a0
[  976.248186] Call Trace:
[  976.248194]  [<ffffffff8128e632>] ? ext4_free_blocks+0x6d2/0xb40
[  976.248210]  [<ffffffff8128e632>] ext4_free_blocks+0x6d2/0xb40
[  976.248226]  [<ffffffff81281369>] ext4_ext_remove_space+0x7d9/0x1050
[  976.248242]  [<ffffffff812954c9>] ? ext4_es_free_extent+0x59/0x60
[  976.248258]  [<ffffffff81283bd0>] ext4_ext_truncate+0xb0/0xe0
[  976.248274]  [<ffffffff8125cf57>] ext4_truncate+0x387/0x3d0
[  976.248289]  [<ffffffff8125db11>] ext4_evict_inode+0x491/0x4f0
[  976.248305]  [<ffffffff811f23b4>] evict+0xb4/0x180
[  976.248317]  [<ffffffff811f2bf5>] iput+0xf5/0x180
[  976.248330]  [<ffffffff811e4db3>] do_unlinkat+0x193/0x2c0
[  976.248344]  [<ffffffff81021d25>] ? syscall_trace_enter+0x145/0x250
[  976.248360]  [<ffffffff811e859b>] SyS_unlinkat+0x1b/0x40
[  976.248375]  [<ffffffff817513ff>] tracesys+0xe1/0xe6
[  976.248387] Code: dd 00 00 49 8b 50 08 4d 8b 20 49 8b 40 10 4d 85 e4 0f 84 17 01 00 00 48 85 c0 0f 84 0e 01 00 00 49 63 46 20 48 8d 4a 01 4d 8b 06 <49> 8b 1c 04 4c 89 e0 65 49 0f c7 08 0f 94 c0 84 c0 74 b9 49 63 
[  976.248475] RIP  [<ffffffff811ba875>] kmem_cache_alloc+0x75/0x1e0
[  976.248491]  RSP <ffff880133e0bbb8>
[  976.254794] ---[ end trace 9b598b75bf3f05bd ]---
[  985.828721] r8169 0000:02:00.0 eth0: link up
--- 8< --- cut here --- 8< ---

Comment 1 Jens 2014-08-20 11:47:48 UTC

Created attachment 104975 [details]
3.17-rc1 crash "Watchdog detected hard LOCKUP on CPU #x"

Here is another (it seems) non-OOM related crash on the mentioned kernel during resume after hibernate which I was only able to catch using a digicam (thus sorry for the file format).

Comment 2 Imre Deak 2014-10-24 10:09:53 UTC

This was also tracked at:
https://bugzilla.kernel.org/show_bug.cgi?id=59321

Waiting now for feedback from Jens.

Comment 3 Jens 2014-10-26 17:46:08 UTC

In response to the comment posted at bugzilla.kernel.org:

> Jens, have you seen the problem since your last report (with or w/o the fixes)?

No.

> Could you still try if you can reproduce the problem with the latest -nightly
> kernel and the same tree with the fixes reverted (resetting to 598ae05fd937 - 
> "drm/i915: Emit even number of dwords when emitting LRIs").

I have checked out the linux-drm-nightly tree as of yesterday (3.18rc1, 88a443f45) and have tried several suspend/resume on one machine during "make -j4". Except for one watchdog timeout (networking) I have not experienced any problems except for the fact that the resume seems to take a little longer(?) than with 3.17rc1.

Regarding 598ae05fd937, "git show" does not find it, but I found this:
http://cgit.freedesktop.org/drm-intel/commit/?id=22a916aaa187946e8df724ab7838a0c13b45a9f4
Is this the same commit? Do you want me to reverse patch it onto 598ae05fd937 and retest?

Regards

Comment 4 Jens 2014-10-26 17:48:57 UTC

> Jens, have you seen the problem since your last report (with or w/o the fixes)?

No.

Specifically, not since I compiled 3.17rc1 (see https://bugzilla.kernel.org/show_bug.cgi?id=59321).

It was reproducable with all kernels I tested before.

Comment 5 Jens 2014-10-26 17:59:50 UTC

> Jens, have you seen the problem since your last report (with or w/o the fixes)?

No. But that's because I had a solution that worked (the patch for 3.17rc1 from the BKO bug report) and thus I did not test any later kernels.

I'm happy to test any other kernels and/or patches though, just tell me.


/* Note to self: think before posting. Sorry for answering this three times. */

Comment 6 Imre Deak 2014-10-27 11:13:18 UTC

(In reply to Jens from comment #3)
> In response to the comment posted at bugzilla.kernel.org:
> 
> > Jens, have you seen the problem since your last report (with or w/o the fixes)?
> 
> No.
> 
> > Could you still try if you can reproduce the problem with the latest -nightly
> > kernel and the same tree with the fixes reverted (resetting to 598ae05fd937 - 
> > "drm/i915: Emit even number of dwords when emitting LRIs").
> 
> I have checked out the linux-drm-nightly tree as of yesterday (3.18rc1,
> 88a443f45) and have tried several suspend/resume on one machine during "make
> -j4". Except for one watchdog timeout (networking) I have not experienced
> any problems except for the fact that the resume seems to take a little
> longer(?) than with 3.17rc1.

Ok. Not sure about the extra delay. Perhaps connected to the network timeout? But for me the important now is that with -nightly you don't see the original (more serious) problem.

> Regarding 598ae05fd937, "git show" does not find it, but I found this:
> http://cgit.freedesktop.org/drm-intel/commit/
> ?id=22a916aaa187946e8df724ab7838a0c13b45a9f4
> Is this the same commit? Do you want me to reverse patch it onto
> 598ae05fd937 and retest?

Yes that's the right commit, just before the suspend-fix patchset went in. Maybe, I gave the wrong SHA1, or -nightly got rebased (it gets rebased regularly). It would help if you could git reset to that commit and see if you can reproduce the problem. I think I will close this bug in any case, but it would help to know if it got fixed by the suspend-fix patchset (that you revert with the git reset), or something else since 3.17rc1.

Thanks!

Comment 7 Imre Deak 2014-10-27 12:11:23 UTC

(In reply to Imre Deak from comment #6)
> (In reply to Jens from comment #3)
> > In response to the comment posted at bugzilla.kernel.org:
> > 
> > > Jens, have you seen the problem since your last report (with or w/o the fixes)?
> > 
> > No.
> > 
> > > Could you still try if you can reproduce the problem with the latest -nightly
> > > kernel and the same tree with the fixes reverted (resetting to 598ae05fd937 - 
> > > "drm/i915: Emit even number of dwords when emitting LRIs").
> > 
> > I have checked out the linux-drm-nightly tree as of yesterday (3.18rc1,
> > 88a443f45) and have tried several suspend/resume on one machine during "make
> > -j4". Except for one watchdog timeout (networking) I have not experienced
> > any problems except for the fact that the resume seems to take a little
> > longer(?) than with 3.17rc1.
> 
> Ok. Not sure about the extra delay. Perhaps connected to the network
> timeout? But for me the important now is that with -nightly you don't see
> the original (more serious) problem.

Btw, if you want to further debug this delay, you could boot with initcall_debug which shows you how long the resume/suspend handler for each driver ran and see if anything took unusually long. Or compare the times with those you get running on 3.17rc1.

Comment 8 Jens 2014-10-27 20:35:09 UTC

Created attachment 108530 [details]
dmesg output of "preload: Corrupted page table at address 1dd4146"

Unforunately I have just experienced another error upon resume. This prevented the machine from shutting down cleanly (although it was still working when I resumed). This is the first time this happened, though.

[32938.478751] video LNXVIDEO:00: Restoring backlight state
[32993.327894] preload: Corrupted page table at address 1dd4146
[32993.327919] PGD 36141067 PUD c66a8067 PMD 4341434143414341 BAD
[32993.327936] Bad pagetable: 000b [#1] SMP
... (see attachment)

I will try the above commit tonight and see if I can reproduce the above error.

Comment 9 Jens 2014-10-27 22:32:16 UTC

With 3.17rc5+ (the above commit) I can not reproduce the same error as above, but after a couple attempts the system simply refuses to suspend - the 'pm-hibernate' process freezes. Also it is not possible to shut down cleanly any more.

There is nothing in the logs or in dmesg to hint at what might be happning (that I could recognize).

I will try again tomorrow to see if I can reproduce this.

Comment 10 Imre Deak 2014-10-28 12:16:11 UTC

(In reply to Jens from comment #8)
> Created attachment 108530 [details]
> dmesg output of "preload: Corrupted page table at address 1dd4146"
> 
> Unforunately I have just experienced another error upon resume. This
> prevented the machine from shutting down cleanly (although it was still
> working when I resumed). This is the first time this happened, though.
> 
> [32938.478751] video LNXVIDEO:00: Restoring backlight state
> [32993.327894] preload: Corrupted page table at address 1dd4146
> [32993.327919] PGD 36141067 PUD c66a8067 PMD 4341434143414341 BAD
> [32993.327936] Bad pagetable: 000b [#1] SMP
> ... (see attachment)

Ok, this doesn't look good. Could you try if you can reproduce it with the same kernel booting with nomodeset?

(In reply to Jens from comment #9)
> With 3.17rc5+ (the above commit) I can not reproduce the same error as
> above, but after a couple attempts the system simply refuses to suspend -
> the 'pm-hibernate' process freezes. Also it is not possible to shut down
> cleanly any more.
> 
> There is nothing in the logs or in dmesg to hint at what might be happning
> (that I could recognize).
> 
> I will try again tomorrow to see if I can reproduce this.

Actually now trying this commit is less important, since even -nightly has the problem, so let's try to narrow down things in -nightly.

Comment 11 Jens 2014-10-29 18:10:23 UTC

I have not yet managed to reproduce it with "nomodeset", but I also cannot start X with this parameter, no matter if cleanly or after a resume. The lightdm background starts up, but that is it.

Even more strange, when I switch to VT1 (Ctrl-ALt-F1) and back, the VT1 text with black background stays on the screen but I can reveal the X lightdm wallpaper bit by bit by "drawing" on the screen with the mouse cursor.

Restarting lightdm after a resume crashes the system (keyboard frozen). No errors in dmesg.

Comment 12 Jens 2014-10-29 19:15:26 UTC

Correction: keyboard is frozen, but system can still be shut down by pressing the power button (i.e. kernel seems to be unharmed).

Comment 13 Imre Deak 2014-10-29 19:53:54 UTC

(In reply to Jens from comment #11)
> I have not yet managed to reproduce it with "nomodeset", but I also cannot
> start X with this parameter, no matter if cleanly or after a resume. The
> lightdm background starts up, but that is it.
> 
> Even more strange, when I switch to VT1 (Ctrl-ALt-F1) and back, the VT1 text
> with black background stays on the screen but I can reveal the X lightdm
> wallpaper bit by bit by "drawing" on the screen with the mouse cursor.
> 
> Restarting lightdm after a resume crashes the system (keyboard frozen). No
> errors in dmesg.

Ok, but I assume you tried to reproduce it by suspending while running something exercising the VM/filesystem like the make -j4 you did earlier?

(In reply to Jens from comment #12)
> Correction: keyboard is frozen, but system can still be shut down by
> pressing the power button (i.e. kernel seems to be unharmed).

Could you still get the dmesg somehow at this point, by ssh'ing in or using netconsole?

Comment 14 Jens 2014-10-29 22:06:58 UTC

Summary:

1)
I could not reproduce the above corruption error with "nomodeset" and only using the console (no X) after about 10 suspend/resume cycles during a parallel make -j4.

2)
Upon restarting lightdm and logging in, this is printed to the console when using "nomodeset":
[+21.57s] WARNING: Error activating login1 session: GDBus.Error:org.freedesktop.DBus.Error.Failed: Operation not supported

I do not get this when not using this boot parameter.

Full boot command line: BOOT_IMAGE=/vmlinuz-3.18.0-rc1+ root=/dev/mapper/ubuntu--vg-root ro quiet splash no_console_suspend=1 netconsole=6666@192.168.178.59/eth0,6666@192.168.178.62/ crashkernel=384M-:128M vt.handoff=7

3)
After console freezes, there is nothing strange in the logs (at least no backtrace, oops or anything obvious.)

Comment 15 Imre Deak 2014-10-30 11:12:03 UTC

(In reply to Jens from comment #14)
> Summary:
> 
> 1)
> I could not reproduce the above corruption error with "nomodeset" and only
> using the console (no X) after about 10 suspend/resume cycles during a
> parallel make -j4.
> 
> 2)
> Upon restarting lightdm and logging in, this is printed to the console when
> using "nomodeset":
> [+21.57s] WARNING: Error activating login1 session:
> GDBus.Error:org.freedesktop.DBus.Error.Failed: Operation not supported
> 
> I do not get this when not using this boot parameter.
> 
> Full boot command line: BOOT_IMAGE=/vmlinuz-3.18.0-rc1+
> root=/dev/mapper/ubuntu--vg-root ro quiet splash no_console_suspend=1
> netconsole=6666@192.168.178.59/eth0,6666@192.168.178.62/
> crashkernel=384M-:128M vt.handoff=7
> 
> 3)
> After console freezes, there is nothing strange in the logs (at least no
> backtrace, oops or anything obvious.)

Ok. Atm, I'm without much ideas what can cause this, I'll try to reproduce it on my Haswell here. One more thing, is i915 built-in or module for you? We end up with a different resume sequence in the two cases, so trying if you can reproduce the bug with both ways could be useful.

Comment 16 Jens 2014-11-03 18:45:08 UTC

On 3.18.0rc1 (as of 2010.10.25, 88a443f454a4d):

I just got another watchdog failure for eth0 (r8169) upon resume. This time though, the ethernet succeeded to reconnect after some delay. The previous two resumes had to be rebooted because the network was dead.

[13729.752788] ------------[ cut here ]------------
[13729.752796] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x24f/0x260()
[13729.752807] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[13729.752807] Modules linked in: bnep(E) rfcomm(E) bluetooth(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E) snd_hwdep(E) snd_pcm(E) snd_seq_midi(E) snd_seq_midi_event(E) intel_rapl(E) snd_rawmidi(E) snd_seq(E) snd_seq_device(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_timer(E) shpchp(E) coretemp(E) lpc_ich(E) mei_me(E) snd(E) mei(E) kvm_intel(E) soundcore(E) kvm(E) serio_raw(E) tpm_infineon(E) 8250_fintek(E) intel_smartconnect(E) mac_hid(E) parport_pc(E) ppdev(E) lp(E) parport(E) dm_crypt(E) netconsole(E) configfs(E) hid_generic(E) usbhid(E) hid(E) mxm_wmi(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) i915(E) ahci(E) libahci(E) i2c_algo_bit(E) drm_kms_helper(E) r8169(E) mii(E) drm(E) wmi(E) video(E)
[13729.752846] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G            E  3.18.0-rc1+ #1
[13729.752847] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5 05/30/2014
[13729.752847]  0000000000000009 ffff88021eb83d48 ffffffff8176091c 000000000000c56a
[13729.752851]  ffff88021eb83d98 ffff88021eb83d88 ffffffff8106e1c1 0000000000000005
[13729.752852]  0000000000000000 ffff8802130e4000 0000000000000001 ffff8800c67db280
[13729.752855] Call Trace:
[13729.752856]  <IRQ>  [<ffffffff8176091c>] dump_stack+0x46/0x58
[13729.752864]  [<ffffffff8106e1c1>] warn_slowpath_common+0x81/0xa0
[13729.752867]  [<ffffffff8106e226>] warn_slowpath_fmt+0x46/0x50
[13729.752869]  [<ffffffff8168246f>] dev_watchdog+0x24f/0x260
[13729.752872]  [<ffffffff81682220>] ? dev_graft_qdisc+0x80/0x80
[13729.752874]  [<ffffffff810d271a>] call_timer_fn+0x3a/0x110
[13729.752877]  [<ffffffff81682220>] ? dev_graft_qdisc+0x80/0x80
[13729.752879]  [<ffffffff810d3ebf>] run_timer_softirq+0x20f/0x310
[13729.752882]  [<ffffffff81072295>] __do_softirq+0xf5/0x2d0
[13729.752884]  [<ffffffff81072765>] irq_exit+0x115/0x120
[13729.752887]  [<ffffffff8176bc3a>] smp_apic_timer_interrupt+0x4a/0x60
[13729.752890]  [<ffffffff81769d1d>] apic_timer_interrupt+0x6d/0x80
[13729.752891]  <EOI>  [<ffffffff81608470>] ? cpuidle_enter_state+0x70/0x170
[13729.752899]  [<ffffffff8160845d>] ? cpuidle_enter_state+0x5d/0x170
[13729.752902]  [<ffffffff81608627>] cpuidle_enter+0x17/0x20
[13729.752904]  [<ffffffff810adb65>] cpu_startup_entry+0x2f5/0x390
[13729.752908]  [<ffffffff81046446>] start_secondary+0x156/0x180
[13729.752910] ---[ end trace 2316ebb6a7713aa7 ]---

This happens quite often, more often than the machine crashing or failing to resume properly now.

Comment 17 Jens 2014-11-27 18:06:39 UTC

I compiled 3.18.0rc6+ / linux-drm-nightly as of yesterday (a834a782adf3ab4b508cd80e9082960263bcc4ed) and did one pm-hibernate/resume cycle during "make -j4" in the kernel tree. Upon resume I get this:

[   40.501301] init: samba-ad-dc main process (1405) terminated with status 1
[   55.521833] ------------[ cut here ]------------
[   55.521853] WARNING: CPU: 3 PID: 1943 at drivers/gpu/drm/i915/i915_gem_execbuffer.c:125 eb_lookup_vmas.isra.15+0x363/0x400 [i915]()
[   55.521854] GPU use of dumb buffer is illegal.
[   55.521855] Modules linked in: bnep(E) rfcomm(E) bluetooth(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E) snd_hwdep(E) intel_rapl(E) snd_pcm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_seq_midi(E) snd_seq_midi_event(E) coretemp(E) snd_rawmidi(E) snd_seq(E) kvm_intel(E) snd_seq_device(E) kvm(E) snd_timer(E) snd(E) soundcore(E) mei_me(E) shpchp(E) mei(E) lpc_ich(E) serio_raw(E) tpm_infineon(E) intel_smartconnect(E) mac_hid(E) parport_pc(E) ppdev(E) lp(E) parport(E) dm_crypt(E) netconsole(E) configfs(E) hid_generic(E) usbhid(E) hid(E) mxm_wmi(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) i915(E) ahci(E) i2c_algo_bit(E) libahci(E) drm_kms_helper(E) r8169(E) mii(E) drm(E) wmi(E) video(E)
[   55.521873] CPU: 3 PID: 1943 Comm: Xorg Tainted: G            E  3.18.0-rc6+ #7
[   55.521874] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5 05/30/2014
[   55.521875]  0000000000000009 ffff8802108efb48 ffffffff81762cfc 0000000000000000
[   55.521876]  ffff8802108efb98 ffff8802108efb88 ffffffff8106f0b1 ffff8802108efc18
[   55.521877]  ffff8802108efc38 ffff880210e73780 0000000000000001 ffff880210e737b8
[   55.521879] Call Trace:
[   55.521882]  [<ffffffff81762cfc>] dump_stack+0x46/0x58
[   55.521885]  [<ffffffff8106f0b1>] warn_slowpath_common+0x81/0xa0
[   55.521887]  [<ffffffff8106f116>] warn_slowpath_fmt+0x46/0x50
[   55.521896]  [<ffffffffa00e56b3>] eb_lookup_vmas.isra.15+0x363/0x400 [i915]
[   55.521904]  [<ffffffffa00e5c6d>] i915_gem_do_execbuffer.isra.22+0x51d/0xd90 [i915]
[   55.521906]  [<ffffffff811bf12c>] ? kmem_cache_alloc_trace+0x3c/0x1f0
[   55.521915]  [<ffffffffa00eca05>] ? i915_gem_object_get_pages+0x45/0xc0 [i915]
[   55.521923]  [<ffffffffa00e7601>] i915_gem_execbuffer2+0xb1/0x2c0 [i915]
[   55.521930]  [<ffffffffa001aa54>] drm_ioctl+0x1a4/0x630 [drm]
[   55.521933]  [<ffffffff81123f0c>] ? acct_account_cputime+0x1c/0x20
[   55.521934]  [<ffffffff811f0520>] do_vfs_ioctl+0x2e0/0x4c0
[   55.521937]  [<ffffffff8109e304>] ? vtime_account_user+0x54/0x60
[   55.521938]  [<ffffffff811f0781>] SyS_ioctl+0x81/0xa0
[   55.521940]  [<ffffffff8176b3b4>] ? int_check_syscall_exit_work+0x34/0x3d
[   55.521942]  [<ffffffff8176b12d>] system_call_fastpath+0x16/0x1b
[   55.521943] ---[ end trace 853866804709104b ]---
[   55.832915] init: plymouth-upstart-bridge main process ended, respawning
[   55.835816] init: plymouth-upstart-bridge main process (2918) terminated with status 1
[   55.835831] init: plymouth-upstart-bridge main process ended, respawning
[   58.563397] audit: type=1400 audit(1416991047.231:77): apparmor="STATUS" operation="profile_replace" name="/usr/lib/cups/backend/cups-pdf" pid=2981 comm="apparmor_parser"
[   58.563401] audit: type=1400 audit(1416991047.231:78): apparmor="STATUS" operation="profile_replace" name="/usr/sbin/cupsd" pid=2981 comm="apparmor_parser"
[   58.563595] audit: type=1400 audit(1416991047.231:79): apparmor="STATUS" operation="profile_replace" name="/usr/sbin/cupsd" pid=2981 comm="apparmor_parser"
[  815.742431] init: anacron main process (1210) killed by TERM signal
[  819.770858] PM: Syncing filesystems ... done.
[  820.315110] Freezing user space processes ... (elapsed 0.001 seconds) done.

However, no more crashes, freezes or Oopses.

Also, after a few suspend/resume cycles (twice in 12) I still have the problem that the network does not come up again after a resume. When it does, I get

[ 3846.934341] r8169 0000:02:00.0 eth0: link up

in dmesg. When it doesn't, I get

[ 6221.007206] show_signal_msg: 120 callbacks suppressed
[ 6221.007209] Watchdog[2700]: segfault at 0 ip 00007ffe51c623e8 sp 00007ffe41dc7560 error 6 in libcontent.so[7ffe513e8000+11d8000]
[ 6243.712345] Watchdog[29313]: segfault at 0 ip 00007f49e1a3d3e8 sp 00007f49d1ba2560 error 6 in libcontent.so[7f49e11c3000+11d8000]

but I don't know if these are related. I also occasionally get this

[ 6520.964686] Restarting tasks ... 
[ 6520.964841] pci_bus 0000:04: Allocating resources
[ 6520.964855] pci 0000:03:00.0: PCI bridge to [bus 04]
[ 6520.964859] pci 0000:03:00.0:   bridge window [io  0x3000-0x3fff]
[ 6520.964866] pci 0000:03:00.0:   bridge window [mem 0xdf600000-0xdf7fffff]
[ 6520.964870] pci 0000:03:00.0:   bridge window [mem 0xdf800000-0xdf9fffff 64bit pref]
[ 6520.968218] done.
[ 6520.968224] video LNXVIDEO:00: Restoring backlight state
[ 6528.107156] r8169 0000:02:00.0 eth0: link down
[ 6528.107204] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 6528.107448] r8169 0000:02:00.0 eth0: link down
[ 6531.536977] r8169 0000:02:00.0 eth0: link up
[ 6531.536983] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 6543.357696] ------------[ cut here ]------------
[ 6543.357703] WARNING: CPU: 0 PID: 20681 at net/sched/sch_generic.c:303 dev_watchdog+0x24f/0x260()
[ 6543.357704] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
[ 6543.357705] Modules linked in: bnep(E) rfcomm(E) bluetooth(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E) snd_hwdep(E) intel_rapl(E) snd_pcm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) snd_seq_midi(E) snd_seq_midi_event(E) coretemp(E) snd_rawmidi(E) snd_seq(E) kvm_intel(E) snd_seq_device(E) kvm(E) snd_timer(E) snd(E) soundcore(E) mei_me(E) shpchp(E) mei(E) lpc_ich(E) serio_raw(E) tpm_infineon(E) intel_smartconnect(E) mac_hid(E) parport_pc(E) ppdev(E) lp(E) parport(E) dm_crypt(E) netconsole(E) configfs(E) hid_generic(E) usbhid(E) hid(E) mxm_wmi(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) i915(E) ahci(E) i2c_algo_bit(E) libahci(E) drm_kms_helper(E) r8169(E) mii(E) drm(E) wmi(E) video(E)
[ 6543.357738] CPU: 0 PID: 20681 Comm: cc1 Tainted: G        W   E  3.18.0-rc6+ #7
[ 6543.357739] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5 05/30/2014
[ 6543.357740]  0000000000000009 ffff88021ea03d48 ffffffff81762cfc 0000000000000000
[ 6543.357741]  ffff88021ea03d98 ffff88021ea03d88 ffffffff8106f0b1 ffff88021ea03d70
[ 6543.357743]  0000000000000000 ffff88020fb08000 0000000000000001 ffff8800c65e1e80
[ 6543.357744] Call Trace:
[ 6543.357745]  <IRQ>  [<ffffffff81762cfc>] dump_stack+0x46/0x58
[ 6543.357751]  [<ffffffff8106f0b1>] warn_slowpath_common+0x81/0xa0
[ 6543.357753]  [<ffffffff8106f116>] warn_slowpath_fmt+0x46/0x50
[ 6543.357755]  [<ffffffff8168469f>] dev_watchdog+0x24f/0x260
[ 6543.357756]  [<ffffffff81684450>] ? dev_graft_qdisc+0x80/0x80
[ 6543.357759]  [<ffffffff810d39fa>] call_timer_fn+0x3a/0x110
[ 6543.357760]  [<ffffffff81684450>] ? dev_graft_qdisc+0x80/0x80
[ 6543.357762]  [<ffffffff810d519f>] run_timer_softirq+0x20f/0x310
[ 6543.357763]  [<ffffffff810731b5>] __do_softirq+0xf5/0x2d0
[ 6543.357764]  [<ffffffff81073685>] irq_exit+0x115/0x120
[ 6543.357766]  [<ffffffff8176dfaa>] smp_apic_timer_interrupt+0x4a/0x60
[ 6543.357769]  [<ffffffff8176c07d>] apic_timer_interrupt+0x6d/0x80
[ 6543.357769]  <EOI> 
[ 6543.357770] ---[ end trace 853866804709104c ]---
[ 6543.375603] r8169 0000:02:00.0 eth0: link up

after which the network works again.

Is the network issue being worked on actively? If so, I can try on a second machine and report back.

Comment 18 Imre Deak 2014-11-28 08:18:13 UTC

(In reply to Jens from comment #17)
> I compiled 3.18.0rc6+ / linux-drm-nightly as of yesterday
> (a834a782adf3ab4b508cd80e9082960263bcc4ed) and did one pm-hibernate/resume
> cycle during "make -j4" in the kernel tree. Upon resume I get this:
> 
> [   40.501301] init: samba-ad-dc main process (1405) terminated with status 1
> [   55.521833] ------------[ cut here ]------------
> [   55.521853] WARNING: CPU: 3 PID: 1943 at
> drivers/gpu/drm/i915/i915_gem_execbuffer.c:125
> eb_lookup_vmas.isra.15+0x363/0x400 [i915]()
> [   55.521854] GPU use of dumb buffer is illegal.
> [   55.521855] Modules linked in: bnep(E) rfcomm(E) bluetooth(E)
> snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E)
> snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E) snd_hwdep(E)
> intel_rapl(E) snd_pcm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E)
> snd_seq_midi(E) snd_seq_midi_event(E) coretemp(E) snd_rawmidi(E) snd_seq(E)
> kvm_intel(E) snd_seq_device(E) kvm(E) snd_timer(E) snd(E) soundcore(E)
> mei_me(E) shpchp(E) mei(E) lpc_ich(E) serio_raw(E) tpm_infineon(E)
> intel_smartconnect(E) mac_hid(E) parport_pc(E) ppdev(E) lp(E) parport(E)
> dm_crypt(E) netconsole(E) configfs(E) hid_generic(E) usbhid(E) hid(E)
> mxm_wmi(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E)
> aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E)
> ablk_helper(E) cryptd(E) i915(E) ahci(E) i2c_algo_bit(E) libahci(E)
> drm_kms_helper(E) r8169(E) mii(E) drm(E) wmi(E) video(E)
> [   55.521873] CPU: 3 PID: 1943 Comm: Xorg Tainted: G            E 
> 3.18.0-rc6+ #7
> [   55.521874] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5
> 05/30/2014
> [   55.521875]  0000000000000009 ffff8802108efb48 ffffffff81762cfc
> 0000000000000000
> [   55.521876]  ffff8802108efb98 ffff8802108efb88 ffffffff8106f0b1
> ffff8802108efc18
> [   55.521877]  ffff8802108efc38 ffff880210e73780 0000000000000001
> ffff880210e737b8
> [   55.521879] Call Trace:
> [   55.521882]  [<ffffffff81762cfc>] dump_stack+0x46/0x58
> [   55.521885]  [<ffffffff8106f0b1>] warn_slowpath_common+0x81/0xa0
> [   55.521887]  [<ffffffff8106f116>] warn_slowpath_fmt+0x46/0x50
> [   55.521896]  [<ffffffffa00e56b3>] eb_lookup_vmas.isra.15+0x363/0x400
> [i915]
> [   55.521904]  [<ffffffffa00e5c6d>]
> i915_gem_do_execbuffer.isra.22+0x51d/0xd90 [i915]
> [   55.521906]  [<ffffffff811bf12c>] ? kmem_cache_alloc_trace+0x3c/0x1f0
> [   55.521915]  [<ffffffffa00eca05>] ? i915_gem_object_get_pages+0x45/0xc0
> [i915]
> [   55.521923]  [<ffffffffa00e7601>] i915_gem_execbuffer2+0xb1/0x2c0 [i915]
> [   55.521930]  [<ffffffffa001aa54>] drm_ioctl+0x1a4/0x630 [drm]
> [   55.521933]  [<ffffffff81123f0c>] ? acct_account_cputime+0x1c/0x20
> [   55.521934]  [<ffffffff811f0520>] do_vfs_ioctl+0x2e0/0x4c0
> [   55.521937]  [<ffffffff8109e304>] ? vtime_account_user+0x54/0x60
> [   55.521938]  [<ffffffff811f0781>] SyS_ioctl+0x81/0xa0
> [   55.521940]  [<ffffffff8176b3b4>] ? int_check_syscall_exit_work+0x34/0x3d
> [   55.521942]  [<ffffffff8176b12d>] system_call_fastpath+0x16/0x1b
> [   55.521943] ---[ end trace 853866804709104b ]---
> [   55.832915] init: plymouth-upstart-bridge main process ended, respawning
> [   55.835816] init: plymouth-upstart-bridge main process (2918) terminated
> with status 1
> [   55.835831] init: plymouth-upstart-bridge main process ended, respawning
> [   58.563397] audit: type=1400 audit(1416991047.231:77): apparmor="STATUS"
> operation="profile_replace" name="/usr/lib/cups/backend/cups-pdf" pid=2981
> comm="apparmor_parser"
> [   58.563401] audit: type=1400 audit(1416991047.231:78): apparmor="STATUS"
> operation="profile_replace" name="/usr/sbin/cupsd" pid=2981
> comm="apparmor_parser"
> [   58.563595] audit: type=1400 audit(1416991047.231:79): apparmor="STATUS"
> operation="profile_replace" name="/usr/sbin/cupsd" pid=2981
> comm="apparmor_parser"
> [  815.742431] init: anacron main process (1210) killed by TERM signal
> [  819.770858] PM: Syncing filesystems ... done.
> [  820.315110] Freezing user space processes ... (elapsed 0.001 seconds)
> done.

This looks like a problem in X, trying to use an invalid GEM buffer for rendering. Does it really happen only after S4 resume, or also during normal booting? CC'ing Chris.

> However, no more crashes, freezes or Oopses.
> 
> Also, after a few suspend/resume cycles (twice in 12) I still have the
> problem that the network does not come up again after a resume. When it
> does, I get
> 
> [ 3846.934341] r8169 0000:02:00.0 eth0: link up
> 
> in dmesg. When it doesn't, I get
> 
> [ 6221.007206] show_signal_msg: 120 callbacks suppressed
> [ 6221.007209] Watchdog[2700]: segfault at 0 ip 00007ffe51c623e8 sp
> 00007ffe41dc7560 error 6 in libcontent.so[7ffe513e8000+11d8000]
> [ 6243.712345] Watchdog[29313]: segfault at 0 ip 00007f49e1a3d3e8 sp
> 00007f49d1ba2560 error 6 in libcontent.so[7f49e11c3000+11d8000]
> 
> but I don't know if these are related. I also occasionally get this
> 
> [ 6520.964686] Restarting tasks ... 
> [ 6520.964841] pci_bus 0000:04: Allocating resources
> [ 6520.964855] pci 0000:03:00.0: PCI bridge to [bus 04]
> [ 6520.964859] pci 0000:03:00.0:   bridge window [io  0x3000-0x3fff]
> [ 6520.964866] pci 0000:03:00.0:   bridge window [mem 0xdf600000-0xdf7fffff]
> [ 6520.964870] pci 0000:03:00.0:   bridge window [mem 0xdf800000-0xdf9fffff
> 64bit pref]
> [ 6520.968218] done.
> [ 6520.968224] video LNXVIDEO:00: Restoring backlight state
> [ 6528.107156] r8169 0000:02:00.0 eth0: link down
> [ 6528.107204] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [ 6528.107448] r8169 0000:02:00.0 eth0: link down
> [ 6531.536977] r8169 0000:02:00.0 eth0: link up
> [ 6531.536983] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [ 6543.357696] ------------[ cut here ]------------
> [ 6543.357703] WARNING: CPU: 0 PID: 20681 at net/sched/sch_generic.c:303
> dev_watchdog+0x24f/0x260()
> [ 6543.357704] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
> [ 6543.357705] Modules linked in: bnep(E) rfcomm(E) bluetooth(E)
> snd_hda_codec_realtek(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E)
> snd_hda_intel(E) snd_hda_controller(E) snd_hda_codec(E) snd_hwdep(E)
> intel_rapl(E) snd_pcm(E) x86_pkg_temp_thermal(E) intel_powerclamp(E)
> snd_seq_midi(E) snd_seq_midi_event(E) coretemp(E) snd_rawmidi(E) snd_seq(E)
> kvm_intel(E) snd_seq_device(E) kvm(E) snd_timer(E) snd(E) soundcore(E)
> mei_me(E) shpchp(E) mei(E) lpc_ich(E) serio_raw(E) tpm_infineon(E)
> intel_smartconnect(E) mac_hid(E) parport_pc(E) ppdev(E) lp(E) parport(E)
> dm_crypt(E) netconsole(E) configfs(E) hid_generic(E) usbhid(E) hid(E)
> mxm_wmi(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E)
> aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E)
> ablk_helper(E) cryptd(E) i915(E) ahci(E) i2c_algo_bit(E) libahci(E)
> drm_kms_helper(E) r8169(E) mii(E) drm(E) wmi(E) video(E)
> [ 6543.357738] CPU: 0 PID: 20681 Comm: cc1 Tainted: G        W   E 
> 3.18.0-rc6+ #7
> [ 6543.357739] Hardware name: MSI MS-7817/CSM-B85M-E45 (MS-7817), BIOS V10.5
> 05/30/2014
> [ 6543.357740]  0000000000000009 ffff88021ea03d48 ffffffff81762cfc
> 0000000000000000
> [ 6543.357741]  ffff88021ea03d98 ffff88021ea03d88 ffffffff8106f0b1
> ffff88021ea03d70
> [ 6543.357743]  0000000000000000 ffff88020fb08000 0000000000000001
> ffff8800c65e1e80
> [ 6543.357744] Call Trace:
> [ 6543.357745]  <IRQ>  [<ffffffff81762cfc>] dump_stack+0x46/0x58
> [ 6543.357751]  [<ffffffff8106f0b1>] warn_slowpath_common+0x81/0xa0
> [ 6543.357753]  [<ffffffff8106f116>] warn_slowpath_fmt+0x46/0x50
> [ 6543.357755]  [<ffffffff8168469f>] dev_watchdog+0x24f/0x260
> [ 6543.357756]  [<ffffffff81684450>] ? dev_graft_qdisc+0x80/0x80
> [ 6543.357759]  [<ffffffff810d39fa>] call_timer_fn+0x3a/0x110
> [ 6543.357760]  [<ffffffff81684450>] ? dev_graft_qdisc+0x80/0x80
> [ 6543.357762]  [<ffffffff810d519f>] run_timer_softirq+0x20f/0x310
> [ 6543.357763]  [<ffffffff810731b5>] __do_softirq+0xf5/0x2d0
> [ 6543.357764]  [<ffffffff81073685>] irq_exit+0x115/0x120
> [ 6543.357766]  [<ffffffff8176dfaa>] smp_apic_timer_interrupt+0x4a/0x60
> [ 6543.357769]  [<ffffffff8176c07d>] apic_timer_interrupt+0x6d/0x80
> [ 6543.357769]  <EOI> 
> [ 6543.357770] ---[ end trace 853866804709104c ]---
> [ 6543.375603] r8169 0000:02:00.0 eth0: link up
> 
> after which the network works again.
> 
> Is the network issue being worked on actively? If so, I can try on a second
> machine and report back.

I'm not sure, but this is a network driver problem, so could you let the maintainers of it know about this? IIRC you opened a bug about this already.

Comment 19 Mauro Molinari 2014-11-28 08:25:23 UTC

I also have this problem with thawing after hibernation (using Linux Mint 17, based on Ubuntu 14.04, hence kernel version is 3.13.0-39-generic #66-Ubuntu SMP). 
I read the whole original bug report on kernel.org and this, but I'm not sure I understand the state of this bug. Is it supposed to be fixed in some newer kernel versions?
If not, could the network problem be somewhat related to the memory corruption problem mentioned here? In my case I see random process failing after a thawing, before the system completely crashes or freezes, so the network layer may also be impacted.

Or rather the original memory corruption problem is fixed? If so, why is this "reopened"?

Comment 20 Jens 2014-11-28 09:46:51 UTC

@Mauro, the thawing issues were resolved for me after 3.17rc1 only. All kernels before that were unstable especially when hibernating under load. Check which kernel Linux Mint 17 is based on and upgrade to a newer version if necessary.

@Imre: the "i915_gem" bug only occured on the first resume after first hibernation and it was the first time I saw it. However, it was also the first time I booted this kernel. I will keep my eyes open.

Will update the bko report at https://bugzilla.kernel.org/show_bug.cgi?id=84681. 


Another issue occured (something that already occurs since 3.17rc1+):
Occasionally the system will simply refuse to hibernate when calling "sudo pm-hibernate". No error, no syslog or dmesg output, nothing. When hibernating via desktop, the screensaver will enable but the system won't hibernate. This happened about once in ten hibernation attempts so far. Looking at the processes I see stuck processes:

16859 ?        S      0:00 sh -c /usr/sbin/pm-hibernate
16860 ?        S      0:00 /bin/sh /usr/sbin/pm-hibernate

Waiting for ten minutes doesn't change anything. But yesterday the moment I called 'sudo strace -p 16860' to look at what is happening, the hibernation process woke up and continued 8-)

I'll try and reproduce this. Here's the strace output (just the beginning):


jens@linuxkiste:~$ sudo strace -p 16860
Process 16860 attached
dup2(11, 1)                             = 1
close(11)                               = 0
open("/sys/power/disk", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
fcntl(1, F_DUPFD, 10)                   = 11
close(1)                                = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
dup2(4, 1)                              = 1
close(4)                                = 0
dup2(11, 1)                             = 1
close(11)                               = 0
pipe([4, 5])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa4f1076a10) = 24552
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24552, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn()                          = 24552
close(5)                                = 0
read(4, "Fri Nov 28 08:53:51 CET 2014\n", 128) = 29
read(4, "", 128)                        = 0
close(4)                                = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 24552
write(1, "Fri Nov 28 08:53:51 CET 2014: Aw"..., 37) = 37
pipe([4, 5])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa4f1076a10) = 24554
close(5)                                = 0
read(4, "Fri Nov 28 08:53:51 CET 2014\n", 128) = 29
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24554, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn()                          = 29
read(4, "", 128)                        = 0
close(4)                                = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 24554
write(1, "Fri Nov 28 08:53:51 CET 2014: Ru"..., 53) = 53
open("/dev/null", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4
fcntl(1, F_DUPFD, 10)                   = 11
close(1)                                = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
dup2(4, 1)                              = 1
close(4)                                = 0
fcntl(2, F_DUPFD, 10)                   = 12
close(2)                                = 0
fcntl(12, F_SETFD, FD_CLOEXEC)          = 0
dup2(1, 2)                              = 2
write(1, "before_hooks is a shell function"..., 33) = 33
dup2(11, 1)                             = 1
close(11)                               = 0
dup2(12, 2)                             = 2
close(12)                               = 0
pipe([4, 5])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa4f1076a10) = 24555
close(5)                                = 0
read(4, "novatel_3g_suspend\n99video\n99_ch"..., 128) = 128
read(4, "60_wpa_supplicant\n50unload_alx\n1"..., 128) = 128
read(4, "change\n", 128)                = 7
read(4, "", 128)                        = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24555, si_status=0, si_utime=0, si_stime=0} ---
rt_sigreturn()                          = 0
close(4)                                = 0
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 24555
stat("/var/run/pm-utils/pm-suspend/storage/parameters.new", 0x7fff52a38ad0) = -1 ENOENT (No such file or directory)
stat("/etc/pm/sleep.d/novatel_3g_suspend", {st_mode=S_IFREG|0755, st_size=1260, ...}) = 0
write(1, "Running hook /etc/pm/sleep.d/nov"..., 64) = 64
stat("/var/run/pm-utils/pm-suspend/storage/disable_hook:novatel_3g_suspend", 0x7fff52a38530) = -1 ENOENT (No such file or directory)
stat("/var/run/pm-utils/pm-suspend/storage/disable_hook:novatel_3g_suspend", 0x7fff52a38560) = -1 ENOENT (No such file or directory)
geteuid()                               = 0
stat("/etc/pm/sleep.d/novatel_3g_suspend", {st_mode=S_IFREG|0755, st_size=1260, ...}) = 0
faccessat(AT_FDCWD, "/etc/pm/sleep.d/novatel_3g_suspend", X_OK) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa4f1076a10) = 24559
...
...


Does this make any sense to you at all?

Thank you!

Comment 21 Jens 2014-12-06 09:30:24 UTC

I just xperienced another crash after about 20 suspend/resume cycles. I don't know if the crash is related to this bug so I submitted it at https://bugzilla.kernel.org/show_bug.cgi?id=89321. Please have a look.

Thanks!

Comment 22 Jens 2015-01-22 20:55:56 UTC

Cannot reproduce any more with 3.19.0-rc2+ (drm-intel-nightly as of Jan 3, 2015).

Comment 23 Jari Tahvanainen 2016-10-19 11:13:10 UTC

Closing resolved+worksforme. Verified by Reporter.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.