Bug 86697 - [BSW] GPU hang at the second cycle to execute S4 command
Summary: [BSW] GPU hang at the second cycle to execute S4 command
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: high blocker
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-25 11:11 UTC by wendy.wang
Modified: 2017-10-06 14:33 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
s4-fail-dmesg (112.60 KB, text/plain)
2014-11-25 11:11 UTC, wendy.wang
no flags Details
s4-good-dmesg-after-i915- disable (71.47 KB, text/plain)
2014-11-25 11:12 UTC, wendy.wang
no flags Details
s4faildmesg-fab2 (118.83 KB, text/plain)
2014-11-26 06:16 UTC, wendy.wang
no flags Details
s4gooddmesg-i915disable-fab2 (70.95 KB, text/plain)
2014-11-26 06:17 UTC, wendy.wang
no flags Details
gpuhang_after_s4-after_xinit.log (392.96 KB, application/zip)
2014-11-27 03:06 UTC, wendy.wang
no flags Details
i915_error_state_2nd_S4_execute.log (388.19 KB, application/zip)
2014-11-27 03:07 UTC, wendy.wang
no flags Details
[PATCH] drm/i915: Don't frob Gunit registers on CHV (1.45 KB, patch)
2014-11-28 14:18 UTC, Ville Syrjala
no flags Details | Splinter Review
v47_s4_dmesg.log (119.54 KB, text/plain)
2014-12-01 12:03 UTC, wendy.wang
no flags Details
v47_S4_i915_error.log (389.11 KB, text/plain)
2014-12-01 12:31 UTC, wendy.wang
no flags Details
v47_S4_i915_error--reattach (389.11 KB, text/plain)
2014-12-02 02:54 UTC, wendy.wang
no flags Details
patch_s4_calltrace_dmesg2.log (120.34 KB, text/plain)
2014-12-02 06:00 UTC, wendy.wang
no flags Details
patch_s4_dmesg1.log (120.34 KB, text/plain)
2014-12-02 06:02 UTC, wendy.wang
no flags Details
Patch_S4_Resume-V45-dmesg3.log (119.91 KB, text/plain)
2014-12-02 06:02 UTC, wendy.wang
no flags Details
Patch_S4_Resume-V45-dmesg4.log (120.16 KB, text/plain)
2014-12-02 06:03 UTC, wendy.wang
no flags Details
With HDMI connected dmesg log (454.27 KB, text/plain)
2014-12-24 07:08 UTC, Jeff Zheng
no flags Details

Description wendy.wang 2014-11-25 11:11:45 UTC
Created attachment 109999 [details]
s4-fail-dmesg

==System Environment==
BSW RVP FAB1 with B1 CPU
BIOS: V41.0
KSC: 1.05

==Failed Kernel==
BSW alpha release kernel: tag: drm-intel-testing 2014-11-21

==Bug detailed description==
-----------------------------
With i915 driver loaded the fail symptom as below:
1. 1st cycle to execute "echo disk > /sys/power/state"
2. test machine will enter S4 successfully, then automatically resume back from S4.
3. Second time to try executing "echo disk > /sys/power/state", system will fail to enter into S4 
s4-fail-dmesg log file attached.

if disable i915 display with "modprobe.blacklist=i915" parameter, do not have this S4 issue, which mean second time still can put system into S4, and system will not automatically wake up.
Attached one good dmesg file for compare: s4-good-dmesg-i915-disable log
Comment 1 wendy.wang 2014-11-25 11:12:33 UTC
Created attachment 110000 [details]
s4-good-dmesg-after-i915- disable
Comment 2 wendy.wang 2014-11-25 11:15:05 UTC
(-fixes, -next-queued, -nightly) branch's behavior about this bug will update on next day.
Comment 3 Ville Syrjala 2014-11-25 12:45:06 UTC
Works for me on fab2 w/ V43 BIOS. Can you try the same?
Comment 4 wendy.wang 2014-11-26 06:13:26 UTC
Helo Ville,
We are having problem to upgrade BIOS to V43 on BSW fab1 or fab2: because of once upgrade BIOS to V43, we always see board debug LED show 0000 and cannot boot up system and we are still under analysis.

Then double checked S4 behavior again:

Fab2+ V41.0 BIOS or Fab2+V40.0 BIOS,
Second cycle to execute S4 command"echo disk > /sys/power/state", we did see the system cannot enter S4 failure.

And disable the i915 display via "modprobe.blacklist=i915" parameter,can successfully multi-times to put system enter into S4, there is no S4 problem.

I attached S4-fail-dmesg.log and S4-good-dmesg-i915-disable.log for your analysis.

Fab1 board+ V41.0 BIOS has the same S4 failures as FAB2.

I'm not sure if this is the regression now, as I have not find a workable Kernel right now.
Comment 5 wendy.wang 2014-11-26 06:16:50 UTC
Created attachment 110032 [details]
s4faildmesg-fab2
Comment 6 wendy.wang 2014-11-26 06:17:33 UTC
Created attachment 110033 [details]
s4gooddmesg-i915disable-fab2
Comment 7 Ville Syrjala 2014-11-26 07:31:29 UTC
There's a GPU hang in there. Are you able to run any GPU workload after a single S4 cycle?
Comment 8 wendy.wang 2014-11-27 03:04:35 UTC
Hello Ville,

Test on BSW FAB2 B1 CPU with V40 BIOS.
KSC is 1.05
Kernel tag: drm-intel-testing 2014-11-21

Reproduce scenario 1:
1. Boot up system and xinit &
2. Put system enter into S4 with command :"echo disk > /sys/power/state"
3. Resume system back with pressing power button
4. Check /sys/kernel/debug/dri/0/i915_error_state reg, there is no error
5. Pkill x and restart X,  will have GPU hang error: pls see attached gpuhang_after_s4-after_xinit.log

Reproduce scenario 2:
1. Boot up system and Put system enter into S4 with command :"echo disk > /sys/power/state"
2. Resume system back with pressing power button
3. There is no GPU hang error in /sys/kernel/debug/dri/0/i915_error_state reg
4. Second time to execute "echo disk > /sys/power/state", GPU will hang, pls refer to attached i915_error_state_2nd_S4_execute.log
Comment 9 wendy.wang 2014-11-27 03:06:30 UTC
Created attachment 110101 [details]
gpuhang_after_s4-after_xinit.log
Comment 10 wendy.wang 2014-11-27 03:07:28 UTC
Created attachment 110102 [details]
i915_error_state_2nd_S4_execute.log
Comment 11 Ville Syrjala 2014-11-28 13:05:01 UTC
Based on the error state it just hung on the first command it tried to execute from the blitter ring, which in this case was an LRI. So the CS seems pretty much dead here if even a simple LRI doesn't work.

What does 'intel_reg_read 0x9400' say?
Comment 12 Ville Syrjala 2014-11-28 14:18:54 UTC
Created attachment 110163 [details] [review]
[PATCH] drm/i915: Don't frob Gunit registers on CHV

Random idea of the day. Please try this and report back.
Comment 13 wendy.wang 2014-12-01 06:31:02 UTC
(In reply to Ville Syrjala from comment #11)
> Based on the error state it just hung on the first command it tried to
> execute from the blitter ring, which in this case was an LRI. So the CS
> seems pretty much dead here if even a simple LRI doesn't work.
> 
> What does 'intel_reg_read 0x9400' say?

After GPU hang, checked as below:
root@x-bsw03:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tools# intel_reg_read 0x9400
0x9400 : 0x80
Comment 14 wendy.wang 2014-12-01 11:59:37 UTC
(In reply to Ville Syrjala from comment #12)
> Created attachment 110163 [details] [review] [review]
> [PATCH] drm/i915: Don't frob Gunit registers on CHV
> 
> Random idea of the day. Please try this and report back.

Hello Ville, applied your patch base on latest drm-intel-nightly branch kernel, still cannot do 2nd cycle S4 entering.

Dmesg file will report you tomorrow.
Comment 15 wendy.wang 2014-12-01 12:03:01 UTC
Created attachment 110293 [details]
v47_s4_dmesg.log
Comment 16 wendy.wang 2014-12-01 12:07:39 UTC
Tested S4 with BIOS v47, with i915 driver loaded, we still observed GPU hang issue at 2nd S4 entering cycle, log files attached:
v47_s4_dmesg.log
v47_S4_i915_error.log

Configuration:
Platform Board: Braswell RVP Fab2
CPU : B1 1.36GHz 2Cores/4Thread 6/12/2 E6XC
Software 
Linux distribution: Ubuntu 14.04 LTS 64 bits 
GFX Kernel tag: drm-intel-testing 2014-11-21
BIOS : BSW_SPI_1_r8_BRASWEL_X64_R_0047_00_ME-2.0.0.1033
Ksc : 1.05
Comment 17 wendy.wang 2014-12-01 12:31:56 UTC
Created attachment 110294 [details]
v47_S4_i915_error.log
Comment 18 Ville Syrjala 2014-12-01 13:04:13 UTC
(In reply to wendy.wang from comment #17)
> Created attachment 110294 [details]
> v47_S4_i915_error.log

The mime types of your attachments are wrong, please fix.
Comment 19 wendy.wang 2014-12-02 02:54:07 UTC
Created attachment 110337 [details]
v47_S4_i915_error--reattach

Re-attached v47_S4_i915_error.log, pls check, thanks.
Comment 20 wendy.wang 2014-12-02 03:20:11 UTC
(In reply to Ville Syrjala from comment #18)
> (In reply to wendy.wang from comment #17)
> > Created attachment 110294 [details]
> > v47_S4_i915_error.log
> 
> The mime types of your attachments are wrong, please fix.

Sent you email about the logS for V47 bios+ S4 test results
v47_S4_i915_error.log is zip file.
Comment 21 wendy.wang 2014-12-02 05:57:49 UTC
(In reply to Ville Syrjala from comment #12)
> Created attachment 110163 [details] [review] [review]
> [PATCH] drm/i915: Don't frob Gunit registers on CHV
> 
> Random idea of the day. Please try this and report back.

Hello Ville,

About this patch, it's hard to describe the S4 symptom I've observed, so list here with 2 kinds of status:

Status 1:
After boot up the kernel with this patch, at the 2nd trying to do S4, I saw system hang up: with Keyboard no response.

Status 2: encountered other call trace when doing 2nd or 3rd time S4 command, seems not related to i915. in this scenario, I did not see GPU hang problem.
Captured some dmesg log, if you are interesting in them.
Comment 22 wendy.wang 2014-12-02 06:00:58 UTC
Created attachment 110343 [details]
patch_s4_calltrace_dmesg2.log
Comment 23 wendy.wang 2014-12-02 06:02:06 UTC
Created attachment 110344 [details]
patch_s4_dmesg1.log
Comment 24 wendy.wang 2014-12-02 06:02:56 UTC
Created attachment 110345 [details]
Patch_S4_Resume-V45-dmesg3.log
Comment 25 wendy.wang 2014-12-02 06:03:33 UTC
Created attachment 110346 [details]
Patch_S4_Resume-V45-dmesg4.log
Comment 26 Jeff Zheng 2014-12-24 07:08:29 UTC
Created attachment 111261 [details]
With HDMI connected dmesg log

I connected a HP2309P monitor and try S3/S4, I am able to suspend/resume with S4 for 4 times. The attached is the dmesg log
Comment 27 ye.tian 2015-02-06 09:11:40 UTC
I've manual test S4 ten times with latest nightly kernel(75ce8a) and drm-intel-testing-2015-01-30, S4 can works well. BIOS version: v55.
verified this bug.
Comment 28 Elizabeth 2017-10-06 14:33:24 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.