105539 – rc6 enablement fails in suspend resume stress test

Bug 105539 - rc6 enablement fails in suspend resume stress test

Summary: rc6 enablement fails in suspend resume stress test

Status:	CLOSED NOTOURBUG

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	DRI git
Hardware:	Other All

Importance:	high major
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-03-16 08:04 UTC by Abhijeet Kumar
Modified:	2018-04-20 10:56 UTC (History)
CC List:	5 users (show)

See Also:
i915 platform:	KBL
i915 features:	power/GT

Attachments

Description Abhijeet Kumar 2018-03-16 08:04:52 UTC

while true; do
    echo 0 > /sys/class/rtc/rtc0/wakealarm
    echo `date '+%s' -d '+ 3 seconds'` > /sys/class/rtc/rtc0/wakealarm
    dmesg|grep "SLP" |tail -1
    # sleep 1 #with and without commenting this line
    echo freeze >/sys/power/state
done

We would see Suspend-To-Idle entry failing with the below Warning message while suspend stress test. Above is step to repro the issue. Occurrence is 20-30% and it's sporadic.

2018-02-23T03:12:24.347693-08:00 DEBUG kernel: [ 479.756025] PM: suspend-to-idle
2018-02-23T03:12:24.347710-08:00 WARNING kernel: [ 480.756211] CPU did not 2018-02-23T03:12:24.348021-08:00 DEBUG kernel: [ 480.756576] PM: resume from suspend-to-idle enter SLP S0 for suspend-to-idle.

Comment 1 Abhijeet Kumar 2018-03-16 08:32:03 UTC

Way to repro on Ubuntu is use the script below and the attached patch which has RC6 and DC6 counter status 


while true; do
    echo 0 > /sys/class/rtc/rtc0/wakealarm
    echo `date '+%s' -d '+ 3 seconds'` > /sys/class/rtc/rtc0/wakealarm
    dmesg|grep "DC6" |tail -1
    # sleep 1 #with and without commenting this line
    echo freeze >/sys/power/state
done

In failure case both DC6 and RC6 residency counter wouldn't have incremented. Example-
2018-02-23T03:12:24.348124-08:00 INFO kernel: [ 480.779917] i915 0000:00:02.0: Abhijeet: PM residency counters DC5=(0001bc43->0001bc45) DC6=(0001bbbb->0001bbbb) RC6=(15b1c57c->15b1c57c)


in the above stress test , RC6 is getting disabled which is leading to soix failure. By making below changes , system is able to enter RC6. Our analysis is that the resume was called, so RC6 was disabled and system tried to enter suspend again , where RC6 was not enabled from i915_gem_do_execbuffer.

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 5fdd2414ca31..cebf0fb67f81 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1843,6 +1843,7 @@ static int i915_pm_suspend(struct device *kdev)
 {
 	struct pci_dev *pdev = to_pci_dev(kdev);
 	struct drm_device *dev = pci_get_drvdata(pdev);
+	struct drm_i915_private *dev_priv = to_i915(dev);
 
 	if (!dev) {
 		dev_err(kdev, "DRM not initialized, aborting suspend.\n");
@@ -1852,13 +1853,28 @@ static int i915_pm_suspend(struct device *kdev)
 	if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
 		return 0;
 
+     printk(KERN_ERR "Abhijeet RC6 state =  0x%08x\n", (I915_READ(GEN6_RC_CONTROL) & GEN6_RC_CTL_HW_ENABLE));
+	intel_enable_gt_powersave(dev_priv);
+
+
 	return i915_drm_suspend(dev);
 }

Comment 2 Chris Wilson 2018-03-16 10:37:29 UTC

The explanation does not match the current code base. Please test against upstream.

Comment 3 Jani Saarinen 2018-03-29 07:11:45 UTC

First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.

Comment 4 marc.herbert 2018-04-11 17:35:17 UTC

Just for the record, a newer version of the patch in comment #1 has been merged there:

https://chromium-review.googlesource.com/q/Ia399473bc20773c0bc3
https://chromium.googlesource.com/chromiumos/third_party/kernel/+log/chromeos-4.4/drivers/gpu/drm/i915/i915_drv.c

CHROMIUM: drm/i915: Configure GPU PM in ->suspend() if not configured
    
    The patch implements workaround for a scenario, where the GPU Power
    Management, if not configured prior to platform suspend entry, will block
    SoC S0ix entry in suspend-to-idle...

Comment 5 Abhijeet Kumar 2018-04-12 18:46:50 UTC

The current codebase has changed. There is no need to load the context to enable rc6 :)


[  153.475078] Abhijeet i'm enabling rc6
[  153.475085] Abhijeet addr = 0000A090 value= 00000000
[  153.475094] Abhijeet addr = 0000A090 value= 88040000
[  153.475100] CPU: 2 PID: 3057 Comm: kworker/u8:67 Tainted: G        W        4.16.0-rc5-31709-g0e5bad01e4a9-dirty #17
[  153.475103] Hardware name: HP Soraka/Soraka, BIOS Google_Soraka.10086.0.0 10/30/2017
[  153.475110] Workqueue: events_unbound async_run_entry_fn
[  153.475114] Call Trace:
[  153.475122]  dump_stack+0x4f/0x81
[  153.475127]  intel_enable_gt_powersave+0x1057/0x1941
[  153.475133]  ? __pm_runtime_resume+0x5f/0x8a
[  153.475138]  i915_request_alloc+0xc8/0x40e
[  153.475143]  i915_gem_switch_to_kernel_context+0xbf/0x144
[  153.475150]  i915_gem_resume+0x70/0xc9
[  153.475155]  ? pci_pm_suspend+0x1ac/0x1ac
[  153.475160]  i915_drm_resume+0x75/0x10f
[  153.475165]  ? pci_pm_suspend+0x1ac/0x1ac
[  153.475168]  i915_pm_resume+0x1e/0x22
[  153.475172]  dpm_run_callback+0x45/0x80
[  153.475177]  device_resume+0x1f1/0x25c
[  153.475181]  async_resume+0x1c/0x42
[  153.475187]  async_run_entry_fn+0x3f/0xd2
[  153.475193]  process_one_work+0x18d/0x2de
[  153.475194] call usb2+ returned 0 after 218 usecs
[  153.475202]  worker_thread+0x194/0x329
[  153.475208]  ? worker_clr_flags+0x52/0x52
[  153.475213]  kthread+0xf1/0x101
[  153.475215] call 00:02+ returned 0 after 220 usecs
[  153.475218] calling  00:03+ @ 2742, parent: pnp0
[  153.475220] call 00:03+ returned 0 after 0 usecs
[  153.475223] calling  00:04+ @ 2742, parent: pnp0
[  153.475225] calling  2-2+ @ 3020, parent: usb2
[  153.475230]  ? worker_clr_flags+0x52/0x52
[  153.475235] call 00:04+ returned 0 after 0 usecs
[  153.475237]  ? rcu_read_unlock_sched_notrace+0x4d/0x4d
[  153.475242]  ret_from_fork+0x35/0x40



Closing the bug since its no more applicable.

Comment 6 Jani Saarinen 2018-04-20 10:56:59 UTC

Thanks for the feedback.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.