Summary: | [HSW regression] resume from s4 sporadically causes call trace and system hang, with warm boot | ||
---|---|---|---|
Product: | DRI | Reporter: | shui yangwei <yangweix.shui> |
Component: | DRM/Intel | Assignee: | Paulo Zanoni <przanoni> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | critical | ||
Priority: | highest | CC: | huax.lu, jinxianx.guo, lei.a.liu, ming.yao, przanoni, rodrigo.vivi, yi.sun |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=82340 | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
shui yangwei
2013-06-07 07:51:01 UTC
The important part of the oops has scrolled off the screen already :( Can you please boot with pause_on_oops=60 so that the kernel waits 1 minute once the first oops shows up until it continues? That way you should be able to catch it. I'll add this to our QA bug filing BKMs. Any system-hang on resume should be considered critical - updgrading (In reply to comment #1) > The important part of the oops has scrolled off the screen already :( > > Can you please boot with pause_on_oops=60 so that the kernel waits 1 minute > once the first oops shows up until it continues? That way you should be able > to catch it. > > I'll add this to our QA bug filing BKMs. I added pause_on_oops=60 in "/boot/grub2/grub.cfg", test with 3.9.5 RC2 release kernel, this issue also come up at 13th S4 tests. Machine will call trace and hang. (In reply to comment #3) > (In reply to comment #1) > > The important part of the oops has scrolled off the screen already :( > > > > Can you please boot with pause_on_oops=60 so that the kernel waits 1 minute > > once the first oops shows up until it continues? That way you should be able > > to catch it. > > > > I'll add this to our QA bug filing BKMs. > > I added pause_on_oops=60 in "/boot/grub2/grub.cfg", test with 3.9.5 RC2 > release kernel, this issue also come up at 13th S4 tests. Machine will call > trace and hang. Yangwei, You may mis-understand what Daniel said. The option just makes system pause while the error happened. Then you can take a picture and attach here, I think. BTW, I think you just test this issue with dinq branch, but not 3.9.y branch. Created attachment 80757 [details] picture: S4 sporadically cause call trace and system hang(-next-queued kernel) Environment: -------------------- Kernel: (drm-intel-next-queued)a43adf0747ecde0d211f29adcbebf067f92e9cbb Some additional commit info: Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jun 10 11:20:22 2013 +0100 drm/i915: Eliminate the addr/seqno from the hangcheck warning Description: -------------------- I find latest -next-queued kernel is much more easier to reproduce this bug. I tried three times reliability tests, all can reproduced within 4 round S4 testing. I appended the picture just using the oops command. Just to make sure: this never happens if i915.ko is blacklisted, right? (In reply to comment #6) > Just to make sure: this never happens if i915.ko is blacklisted, right? Yes, when i915.ko is blacklisted, I loop running S4 100 times, it is 100% pass. I'll see if I can reproduce this. Will provide more information as available. Make sure you're testing the latest BIOS too; there have been fixes for suspend/resume issues for recent regressions and failures. (In reply to comment #9) > Make sure you're testing the latest BIOS too; there have been fixes for > suspend/resume issues for recent regressions and failures. Upgrade BIOS to v128. Call trace appears at the 23rd time while doing a S4 cycle. (In reply to comment #10) > (In reply to comment #9) > > Make sure you're testing the latest BIOS too; there have been fixes for > > suspend/resume issues for recent regressions and failures. > > Upgrade BIOS to v128. > Call trace appears at the 23rd time while doing a S4 cycle. Latest BIOS is 131.3. Does it solve the issue? Thus far, I have not seen the call trace on my ULT in the testing I have done. I've tried this with kernels built from linux-stable, linux-next and torvald's tree. That said, the machine has failed to fully resume from sleep, although there were no indicators in dmesg or syslog as to the cause of the failure. At this point, I'm going to update the BIOS per the recommendation above and retest the same kernels to see if I can either a) get the call trace, b) no longer fails to resume or c) fails in a way that yields something useful. I'll update when I have the information from that testing. (In reply to comment #12) > Thus far, I have not seen the call trace on my ULT in the testing I have > done. I've tried this with kernels built from linux-stable, linux-next and > torvald's tree. That said, the machine has failed to fully resume from > sleep, although there were no indicators in dmesg or syslog as to the cause > of the failure. > > At this point, I'm going to update the BIOS per the recommendation above and > retest the same kernels to see if I can either a) get the call trace, b) no > longer fails to resume or c) fails in a way that yields something useful. > I'll update when I have the information from that testing. An advise: please notice loop running S4 with reliability test, almost about 100 times. (In reply to comment #12) > Thus far, I have not seen the call trace on my ULT in the testing I have > done. I've tried this with kernels built from linux-stable, linux-next and > torvald's tree. That said, the machine has failed to fully resume from > sleep, although there were no indicators in dmesg or syslog as to the cause > of the failure. > > At this point, I'm going to update the BIOS per the recommendation above and > retest the same kernels to see if I can either a) get the call trace, b) no > longer fails to resume or c) fails in a way that yields something useful. > I'll update when I have the information from that testing. I just use the latest -next-queued kernel on a HSW desktop to test S4, this issue happens at 4th time. I will test ULT later and you can test desktop too. I will upgrade our BIOS version(now is v128) to see if this will be any difference. I take a picture of this calltrace. :) Created attachment 83802 [details]
call trace and hang on hsw desktop
Created attachment 83804 [details]
call trace of S4 at 4th time on HSW ULT
I did S4 cycle on HSW ULT using latest -next-queued kernel, and call trace and hang happened at 4th time. I attached the picture as hsw_ULT_S4.jpg. HSW ULT BIOS version: 131.1 I did S4 cycle again. This time I grabbed the dmesg info by serial port that will be useful for you. :) (In reply to comment #18) > I did S4 cycle again. This time I grabbed the dmesg info by serial port that > will be useful for you. :) Forget to say, on HSW ULT. Created attachment 83806 [details]
ult S4 dmesg by using serial port
Hm, the ULT backtraces might be something else since its just the NMI handler which takes forever to run. But maybe that's just because the machine is dead already and the useful backtraces scrolled off the screen already ... (In reply to comment #21) > Hm, the ULT backtraces might be something else since its just the NMI > handler which takes forever to run. But maybe that's just because the > machine is dead already and the useful backtraces scrolled off the screen > already ... We upgraded HSW desktop's BIOS to newest version 131.3, it results in no system output and boot failure. The machine can't start up even if using the original BIOS..This situation happened on our two hsw desktop, I'm afraid I won't upgrade a third desktop temporarily.. I updated the BIOS to 131.3 and did not experience any of the boot or system failure issues reported. After the BIOS update, I performed multiple test sequences of varying counts of suspend/resume cycles, none of which resulted in a call trace or system hang. I also did not see any failures to resume as I previously sighted with earlier BIOS revisions. I'm going to run 150-cycle test today and see if I can get something to happen. Finally, with a fresh clone of the linux-next kernel and BIOS 131.3, I'm seeing some call traces on resume in less than 10 suspend/resume cycles. The call traces are different than those posted here, so I need to do more investigation on what's going on. I'll post clean log captures when available for comparison. Todd, what is the VBIOS version on this 131.3 bios you are using? Thanks VBIOS version is 2173 Using Daniel's top of tree, I'm also able to reproduce the problem in short order. Multiple call traces in the logs after 11 or 12 runs. Thanks Todd. for the log: I asked because SuSE are still facing S4 errors on machine with VBIOS 2175. AFAIK they workaround the issue on their image, but not a real fix yet. This is appearing to be a memory corruption issue, based on the fact that the call traces I'm seeing are different each time it happens. I've enabled some of the built-in memory check facilities in the kernel and built kernels from drm-intel-nightly and drm-intel-fixes. None of the kernels built from these two branches successfully boot the machine. Investigation continues. I've been doing some investigation and although I see backtraces mostly for file-system operations, they always seem to happen when we're doing gmbus/dp-aux. I also get "general protection fault: 0000" messages. These errors don't crash my machine. (In reply to comment #29) > I've been doing some investigation and although I see backtraces mostly for > file-system operations, they always seem to happen when we're doing > gmbus/dp-aux. I also get "general protection fault: 0000" messages. > > These errors don't crash my machine. I spent even more time debugging this today, and again: most of the error messages I see come when we're doing gmbus/dp-aux. Example: [ 206.436517] [drm:gmbus_xfer], GMBUS [i915 gmbus dpb] NAK for addr: 0050 r(1) [ 206.496851] [drm:drm_do_probe_ddc_edid], drm: skipping non-existent adapter i915 gmbus dpb [ 206.529782] ------------[ cut here ]------------ [ 206.529787] WARNING: CPU: 3 PID: 24 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0() [ 206.529788] list_del corruption. prev->next should be ffff880142da7010, but was 00d5000000d40000 [ 206.529796] Modules linked in: parport_pc bnep ppdev rfcomm bluetooth lp parport ext2 dm_crypt e1000e ptp i915 i2c_ algo_bit drm_kms_helper pps_core drm video [ 206.529798] CPU: 3 PID: 24 Comm: ksoftirqd/3 Not tainted 3.11.0-rc7.1309041757+ #2159 [ 206.529799] Hardware name: Intel Corporation Shark Bay Client platform/WhiteTip Mountain 1, BIOS HSWLPTU1.86C.0124. R02.1305030131 05/03/2013 [ 206.529801] 0000000000000009 ffff880147001bc0 ffffffff816195dd ffff880147001c08 [ 206.529803] ffff880147001bf8 ffffffff81041c92 ffff880142da6fd0 0000000000000286 [ 206.529804] ffffea0004db7200 ffffffff811699e6 0000000000000000 ffff880147001c58 [ 206.529805] Call Trace: [ 206.529807] [<ffffffff816195dd>] dump_stack+0x54/0x74 [ 206.529810] [<ffffffff81041c92>] warn_slowpath_common+0x82/0xb0 [ 206.529812] [<ffffffff811699e6>] ? __d_free+0x46/0x70 [ 206.529814] [<ffffffff81041d77>] warn_slowpath_fmt+0x47/0x50 [ 206.529815] [<ffffffff811699e6>] ? __d_free+0x46/0x70 [ 206.529817] [<ffffffff812b7801>] __list_del_entry+0xa1/0xd0 [ 206.529819] [<ffffffff8114eb82>] __delete_object+0x32/0xb0 [ 206.529821] [<ffffffff8114f46c>] delete_object_full+0x1c/0x30 [ 206.529824] [<ffffffff8160c4c1>] kmemleak_free+0x21/0x50 [ 206.529827] [<ffffffff81141190>] kmem_cache_free+0x140/0x1a0 [ 206.529828] [<ffffffff811699e6>] __d_free+0x46/0x70 [ 206.529831] [<ffffffff810e1b8a>] rcu_process_callbacks+0x1ea/0x5a0 [ 206.529834] [<ffffffff8104727a>] __do_softirq+0xda/0x1b0 [ 206.529836] [<ffffffff8104737d>] run_ksoftirqd+0x2d/0x60 [ 206.529839] [<ffffffff81070ff6>] smpboot_thread_fn+0x156/0x1f0 [ 206.529840] [<ffffffff81070ea0>] ? lg_global_unlock+0xb0/0xb0 [ 206.529843] [<ffffffff810687d5>] kthread+0xe5/0xf0 [ 206.529845] [<ffffffff810686f0>] ? kthread_create_on_node+0x140/0x140 [ 206.529847] [<ffffffff8162936c>] ret_from_fork+0x7c/0xb0 [ 206.529849] [<ffffffff810686f0>] ? kthread_create_on_node+0x140/0x140 [ 206.529850] ---[ end trace 86de9cb15e206270 ]--- [ 208.249590] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:20:HDMI-A-1] disconnected [ 208.323782] [drm:drm_mode_getconnector], [CONNECTOR:22:?] [ 208.369864] [drm:drm_helper_probe_single_connector_modes], [CONNECTOR:22:DP-2] I really think our best bet is to try to bisect the bug. Does it happen with 3.11.0? Does it happen with 3.10.0? Does it happen with 3.10.10? We should probably try to find some version that works and then bisect from there. Thanks, Paulo Agreed. This was reported 6/7, so I'm going to start with a kernel from that era to see if I can find where this began occurring. -T (In reply to comment #32) > Agreed. This was reported 6/7, so I'm going to start with a kernel from that > era to see if I can find where this began occurring. > > -T The exactly report time was 4/16, and we found this issue exists on much more earlier kernels than that era. You can have a look at the original Bug #63586. I have mentioned when reported. Hi I did some tests, and it seems that if I disable fbcon, vgacon and their friends I can't reproduce the problem. Can you please confirm that? Also, my tests show that the problem happens even if we don't start X. Can you also confirm that? In the meantime, I'll keep testing. Thanks, Paulo *** Bug 66301 has been marked as a duplicate of this bug. *** Hi I did some more investigation and I discovered the following: - It seems that, after resuming, if you run "slabinfo -v" (from tools/vm/), there's a good chance you'll see dmesg messages saying we detected corruption on our slabs. It seems to me that it is much much easier to reproduce the bug with "hibernate, resume, run slabinfo -v, check dmesg, hibernate, resume, etc" than with just "hibernate, resume". Can you confirm that? - It also seems that the bug goes away if the kernel that resumes the machine doesn't load i915.ko. So an experiment you can try is: boot the machine normally, with i915.ko loaded, tell it to hibernate. Then make the machine wake-up, and use the "modprobe.blacklist=i915" option when loading the kernel that will resume the machine. After it resumes, check if the bug is there (possibly with slabinfo -v). The bug should be gone. Can you please confirm that? Thanks, Paulo Hi Can you please try the patches from comments 22 and 23 form bug https://bugzilla.kernel.org/show_bug.cgi?id=59321 ? Thanks, Paulo (In reply to comment #37) > Hi > > Can you please try the patches from comments 22 and 23 form bug > https://bugzilla.kernel.org/show_bug.cgi?id=59321 ? > > Thanks, > Paulo Apply these two patches on latest -next-queued, it comes to call trace and hang at the first round S4 testing. (In reply to comment #37) > Hi > > Can you please try the patches from comments 22 and 23 form bug > https://bugzilla.kernel.org/show_bug.cgi?id=59321 ? > > Thanks, > Paulo Addition: ------------------ No matter patches from comment 22 only or with 23, I find my HSW Desktop failed to suspend from S4, I saw indicator light output is 0004, and the fan isn't stop. I also tried the latest -next-queued without patches, it can resume but with call trace and hang at first round. CCing Ben since he wrote the patches. (In reply to comment #39) > (In reply to comment #37) > > Hi > > > > Can you please try the patches from comments 22 and 23 form bug > > https://bugzilla.kernel.org/show_bug.cgi?id=59321 ? > > > > Thanks, > > Paulo > > Addition: > ------------------ > No matter patches from comment 22 only or with 23, I find my HSW Desktop > failed to suspend from S4, I saw indicator light output is 0004, and the fan > isn't stop. I also tried the latest -next-queued without patches, it can > resume but with call trace and hang at first round. Can you please push the branch you tested somewhere so I can confirm the patches are indeed correct. Also, can you collect the error state? Created attachment 87366 [details] netconsole grab information (In reply to comment #41) > (In reply to comment #39) > > (In reply to comment #37) > > > Hi > > > > > > Can you please try the patches from comments 22 and 23 form bug > > > https://bugzilla.kernel.org/show_bug.cgi?id=59321 ? > > > > > > Thanks, > > > Paulo > > > > Addition: > > ------------------ > > No matter patches from comment 22 only or with 23, I find my HSW Desktop > > failed to suspend from S4, I saw indicator light output is 0004, and the fan > > isn't stop. I also tried the latest -next-queued without patches, it can > > resume but with call trace and hang at first round. > > Can you please push the branch you tested somewhere so I can confirm the > patches are indeed correct. > Oh, all my tests are based on -next-queued latest and the patches also applied on it. The commit be used yesterday: -------------------- commit a94b013b91de055572183c6772865123fa955027 Author: Paulo Zanoni <paulo.r.zanoni@intel.com> Date: Thu Sep 19 17:03:06 2013 -0300 drm/i915: wait for IPS_ENABLE when enabling IPS At the end of haswell_crtc_enable we have an intel_wait_for_vblank with a big comment, and the message suggests it's a workaround for something we don't really understand. So I removed that wait and started getting HW state readout error messages saying that the IPS state is not what we expected. > Also, can you collect the error state? OK, get the errors through netconsole. I find there's call trace. You could find the messages from the attachment. [ 65.311787] [ BUG: systemd-udevd/2856 still has locks held! ] [ 65.311810] 3.12.0-rc3_drm-intel-next-queued_a94b01_20131009+ #1 Not tainted [ 65.311836] ------------------------------------- [ 65.311854] 2 locks held by systemd-udevd/2856: [ 65.311872] Freezing user space processes ... [ 65.311872] #0: (microcode_mutex){+.+.+.}, at: [<ffffffffa039b0a7>] microcode_init+0xa7/0x1b4 [microcode] [ 65.311945] #1: (subsys mutex#4){+.+.+.}, at: [<ffffffff81415ee2>] subsys_interface_register+0x51/0xd9 [ 65.311992] [ 65.311992] stack backtrace: [ 65.312010] CPU: 1 PID: 2856 Comm: systemd-udevd Not tainted 3.12.0-rc3_drm-intel-next-queued_a94b01_20131009+ #1 [ 65.312048] Hardware name: Intel Corporation Shark Bay Client platform/SthiPpvRsvd2, BIOS HSWLPTU1.86C.0120.R00.1303312001 03/31/2013 [ 65.312092] ffff880438dbdee0 ffff88003731da78 ffffffff817f313c 0000000000000006 [ 65.312126] ffff880438dbdee0 ffff88003731da98 ffffffff8108df33 0000000000000004 [ 65.312160] 0000000000000000 ffff88003731db08 ffffffff8104df35 ffff88003731dad8 [ 65.312194] Call Trace: [ 65.312208] [<ffffffff817f313c>] dump_stack+0x46/0x58 [ 65.312230] [<ffffffff8108df33>] debug_check_no_locks_held+0x8f/0x93 [ 65.312255] [<ffffffff8104df35>] usermodehelper_read_trylock+0xa9/0xfa [ 65.312282] [<ffffffff810581e5>] ? __init_waitqueue_head+0x50/0x50 [ 65.312308] [<ffffffff814211bc>] _request_firmware+0x285/0x880 [ 65.312331] [<ffffffff81421847>] request_firmware+0x38/0x4c (In reply to comment #42) > Created attachment 87366 [details] > netconsole grab information > > (In reply to comment #41) > > (In reply to comment #39) > > > (In reply to comment #37) > > > > Hi > > > > > > > > Can you please try the patches from comments 22 and 23 form bug > > > > https://bugzilla.kernel.org/show_bug.cgi?id=59321 ? > > > > > > > > Thanks, > > > > Paulo > > > > > > Addition: > > > ------------------ > > > No matter patches from comment 22 only or with 23, I find my HSW Desktop > > > failed to suspend from S4, I saw indicator light output is 0004, and the fan > > > isn't stop. I also tried the latest -next-queued without patches, it can > > > resume but with call trace and hang at first round. > > > > Can you please push the branch you tested somewhere so I can confirm the > > patches are indeed correct. > > > > Oh, all my tests are based on -next-queued latest and the patches also > applied on it. > > The commit be used yesterday: > -------------------- > commit a94b013b91de055572183c6772865123fa955027 > Author: Paulo Zanoni <paulo.r.zanoni@intel.com> > Date: Thu Sep 19 17:03:06 2013 -0300 > > drm/i915: wait for IPS_ENABLE when enabling IPS > > At the end of haswell_crtc_enable we have an intel_wait_for_vblank > with a big comment, and the message suggests it's a workaround for > something we don't really understand. So I removed that wait and > started getting HW state readout error messages saying that the IPS > state is not what we expected. > > > Can you please get me a link to the code which you've tested so I can make sure it was applied correctly. This patch fixes the issue for others, so it's surprising it doesn't fix it for you. [remote "origin"] fetch = +refs/heads/*:refs/remotes/origin/* url = git://people.freedesktop.org/~danvet/drm-intel [branch "drm-intel-next-queued"] remote = origin merge = refs/heads/drm-intel-next-queued (In reply to comment #44) > [remote "origin"] > fetch = +refs/heads/*:refs/remotes/origin/* > url = git://people.freedesktop.org/~danvet/drm-intel > [branch "drm-intel-next-queued"] > remote = origin > merge = refs/heads/drm-intel-next-queued I want to see the patched code. Can you please push it somewhere so I can see it? Created attachment 87496 [details] patch: Ben's patch in outside bugzilla comment 22 (In reply to comment #45) > (In reply to comment #44) > > [remote "origin"] > > fetch = +refs/heads/*:refs/remotes/origin/* > > url = git://people.freedesktop.org/~danvet/drm-intel > > [branch "drm-intel-next-queued"] > > remote = origin > > merge = refs/heads/drm-intel-next-queued > > I want to see the patched code. Can you please push it somewhere so I can > see it? A little puzzled here, I just applied the patch you gave on Daniel's tree(current -next-queued commit). Your patch which been used is in attachment, and I will appreciate your detail description. Quite sorry. :) (In reply to comment #45) > (In reply to comment #44) > > [remote "origin"] > > fetch = +refs/heads/*:refs/remotes/origin/* > > url = git://people.freedesktop.org/~danvet/drm-intel > > [branch "drm-intel-next-queued"] > > remote = origin > > merge = refs/heads/drm-intel-next-queued > > I want to see the patched code. Can you please push it somewhere so I can > see it? Ben, on https://bugzilla.kernel.org/show_bug.cgi?id=59321 's comment 22 and 23, these two patches form you attached: https://bugzilla.kernel.org/attachment.cgi?id=110401 and https://bugzilla.kernel.org/attachment.cgi?id=110411 (In reply to comment #45) > (In reply to comment #44) > > [remote "origin"] > > fetch = +refs/heads/*:refs/remotes/origin/*drm-intel-next-queued > > url = git://people.freedesktop.org/~danvet/drm-intel > > [branch "drm-intel-next-queued"] > > remote = origin > > merge = refs/heads/drm-intel-next-queued > > I want to see the patched code. Can you please push it somewhere so I can > see it? You can visit the link:http://tinderbox.sh.intel.com/drivers to access the drivers dictory of patched kernel source . We just apply your two patches on the drm-intel-next-queued (540b5d02766863c561afe9f9d56ce0425022a731 ) . with these patches the issue still occurs. Please retest on latest nightly. Created attachment 87976 [details] Picture: S4 resume hang and screen out put call trace (In reply to comment #49) > Please retest on latest nightly. This issue also exists on one HSW desktop, it's a unique problem on this desktop platform, S4 100% resume hang. Another HSW desktop can resume correctly, Reliability tests are on the way. Other platform like ULT and mobile is not the same. ULT is good, mobile S4 reboot. bad desktop machine: ---------------- 00:02.0 VGA compatible controller [0300]: Intel Corporation Haswell Integrated Graphics Controller [8086:0412] (rev 06) CPU i5-4570 3.2GHz, GT2 1150MHz; BIOS version: V120; good desktop machine: ---------------- 00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:0412] (rev 06) CPU i5-4670T 2.3GHz, GT2 1200MHz; BIOS version: V131.3(FC_Production_Q87_5MB_NXP_BIOS-131.3_ME-9.0.20.1447.v2) Created attachment 88135 [details] [review] Fault PDEs too Please test this patch on the latest nightly. (In reply to comment #51) > Created attachment 88135 [details] [review] [review] > Fault PDEs too > > Please test this patch on the latest nightly. Test this patch on latest -nightly, machine will hang during suspend part, After execute S4, I find the indicator turn to "00FF" and machine is not suspend,I can ssh on the machine,then about 30sec later, the indicator turn to "0004" and screen turn to black, machine will be unconnected. Latest -nightly kernel also exists the same issue in comment 50. Netconsole grasp: --------------------- [ 202.605258] netpoll: netconsole: local IP 10.239.47.103 [ 202.605340] console [netcon0] enabled [ 202.605355] netconsole: network logging started [ 219.684843] PM: Syncing filesystems ... done. [ 245.422619] Freezing user space processes ... [ 245.422763] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba (elapsed 0.001 seconds) done. [ 245.424184] PM: Marking nosave pages: [mem 0x0009a000-0x000fffff] [ 245.424212] PM: Marking nosave pages: [mem 0x95963000-0x95963fff] [ 245.424235] PM: Marking nosave pages: [mem 0x9f904000-0x9f907fff] [ 245.424258] PM: Marking nosave pages: [mem 0x9fa9d000-0x9fa9efff] [ 245.424281] PM: Marking nosave pages: [mem 0x9fcbb000-0x9fcbbfff] [ 245.424305] PM: Marking nosave pages: [mem 0x9fcbe000-0x9fcbefff] [ 245.424327] PM: Marking nosave pages: [mem 0x9fcc5000-0x9fcc5fff] [ 245.424350] PM: Marking nosave pages: [mem 0x9fd17000-0x9fd19fff] [ 245.424373] PM: Marking nosave pages: [mem 0xa2807000-0xa2fe9fff] [ 245.424416] PM: Marking nosave pages: [mem 0xa3000000-0xffffffff] [ 245.425157] PM: Basic memory bitmaps created [ 245.425544] PM: Preallocating image memory... done (allocated 162884 pages) [ 245.700877] PM: Allocated 651536 kbytes in 0.27 seconds (2413.09 MB/s) [ 245.700903] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 245.702972] Suspending console(s) (use no_console_suspend to debug) Could you clarify. Are you saying the behavior with this patch, and without (on nightly) is identical? (In reply to comment #53) > Could you clarify. Are you saying the behavior with this patch, and without > (on nightly) is identical? Sorry, I haven't described clearly. with patch: ---------------- After execute S4, I find the indicator turn to "00FF" and machine is not suspend,I can ssh on the machine,then about 30sec later, the indicator turn to "0004" and screen turn to black, machine will be unconnected. machine hangs there. without patch: ---------------- Machine can resume, but there's call trace and hang. Just like comment 50 described. (In reply to comment #54) > (In reply to comment #53) > > Could you clarify. Are you saying the behavior with this patch, and without > > (on nightly) is identical? > > Sorry, I haven't described clearly. > > > with patch: > ---------------- > After execute S4, I find the indicator turn to "00FF" and machine is not > suspend,I can ssh on the machine,then about 30sec later, the indicator turn > to "0004" and screen turn to black, machine will be unconnected. machine > hangs there. > > > without patch: > ---------------- > Machine can resume, but there's call trace and hang. Just like comment 50 > described. Does anything appear in dmesg during the 30 second period? Also, can you try to get an exact time (instead of "about 30sec"), and see if it's repeatably the same time every test? This may give some clues. Thanks. BIOS version updated from "HSWLPTU1.86C.0120.R00.1303312001" to "HSWPTU1.86C.0134.R00.1310022130". On latest nightly, S4 can resume well on the problematic HSW Desktop now. Strength test is on the way. I will test the kernel without patch at first, and update the status later. Loop running S4 for 100 times, all passed. Maybe this issue really been fixed, or do you think we need more reliability tests for a period, if S4 worked stably, then we will comment you to close this bug. Papered over. Reopening, since we applied "[PATCH] drm/i915: Undo gtt scratch pte unmapping again". *** Bug 78056 has been marked as a duplicate of this bug. *** Paulo, any update on this issue? This is the hsw pte sanitizing thing which regressed on earlier platforms. My proposal is to remap to the stolen range (if we can) on all platforms. need a new round of retesting, hua&jinxian, please give feedback. resume from s4, system hang still can sporadically occur . I didn't find Call Trace like this bug report description. I test base on drm-intel-testing branch. This is a regression from a regression revert, so upgrading. This issue still exits on latest -nightly. Update. S4 still causes system hang, but I didn't find Call Trace. Just like Bug 82340 descripts. Update, This issue exists on release kernel(62de88e8e65811010deac5375f8f0d8b14dc4d94). The feedback I'm waiting for in bug 82340 could clarify things here, so setting that as a dependency (also based on comment 68). Lei, please check with Ming to see if https://bugs.freedesktop.org/show_bug.cgi?id=82340#c10 (system hang gone) applies here. (In reply to comment #71) > Lei, please check with Ming to see if > https://bugs.freedesktop.org/show_bug.cgi?id=82340#c10 (system hang gone) > applies here. Actually I used the same machine test this two bugs. No call trace found since bug 82340 appears, so I think bug 65496 can not be reproduced. closing this one. Moving discussion to bug 82340. Closing verified+fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.