Bug 102102

Summary:	[BAT][CTG] rtcwake failed with 256
Product:	DRI	Reporter:	Martin Peres <martin.peres>
Component:	DRM/Intel	Assignee:	Intel GFX Bugs mailing list <intel-gfx-bugs>
Status:	CLOSED WORKSFORME	QA Contact:	Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity:	critical
Priority:	high	CC:	intel-gfx-bugs
Version:	DRI git
Hardware:	Other
OS:	All
Whiteboard:	ReadyForDev
i915 platform:	GM45	i915 features:	power/Other

Description Martin Peres 2017-08-08 06:59:40 UTC

On CI_DRM_2931, the machine fi-ctg-p8600 produced the following error when running igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
	
rtcwake: write error
(kms_pipe_crc_basic:3915) igt-aux-CRITICAL: Test assertion failure function suspend_via_rtcwake, file igt_aux.c:771:
(kms_pipe_crc_basic:3915) igt-aux-CRITICAL: Failed assertion: ret == 0
(kms_pipe_crc_basic:3915) igt-aux-CRITICAL: rtcwake failed with 256


Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2931/fi-ctg-p8600/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b.html

Comment 1 Martin Peres 2017-08-08 07:02:48 UTC

Can we make sure that the change in IGT's code related to rtcwake handling did not introduce this regression?

Comment 2 Chris Wilson 2017-08-08 09:05:54 UTC

[  598.531479] PM: Syncing filesystems ... done.
[  598.590879] PM: Preparing system for sleep (mem)
[  598.591612] Freezing user space processes ...
[  618.594711] Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
[  618.594741] fstrim          D    0  3896   3893 0x00000004
[  618.594764] Call Trace:
[  618.594775]  __schedule+0x3c3/0xb00
[  618.594781]  ? queue_unplugged+0x61/0x1b0
[  618.594789]  ? wait_for_common_io.constprop.1+0xe5/0x180
[  618.594793]  schedule+0x3b/0x90
[  618.594798]  schedule_timeout+0x24c/0x490
[  618.594804]  ? trace_hardirqs_on_caller+0xe3/0x1b0
[  618.594809]  ? trace_hardirqs_on+0xd/0x10
[  618.594817]  ? wait_for_common_io.constprop.1+0xe5/0x180
[  618.594822]  io_schedule_timeout+0x19/0x40
[  618.594826]  ? io_schedule_timeout+0x19/0x40
[  618.594831]  wait_for_common_io.constprop.1+0x104/0x180
[  618.594837]  ? wake_up_q+0x80/0x80
[  618.594844]  wait_for_completion_io+0x13/0x20
[  618.594848]  submit_bio_wait+0x54/0x60
[  618.594858]  blkdev_issue_discard+0x6c/0xb0
[  618.594862]  ? ext4_trim_fs+0x424/0xc30
[  618.594871]  ext4_trim_fs+0x4b2/0xc30
[  618.594875]  ? ext4_trim_fs+0x4b2/0xc30
[  618.594891]  ext4_ioctl+0xc36/0x10f0
[  618.594901]  do_vfs_ioctl+0x8f/0x660
[  618.594906]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
[  618.594912]  ? __this_cpu_preempt_check+0x13/0x20
[  618.594916]  ? trace_hardirqs_on_caller+0xe3/0x1b0
[  618.594922]  SyS_ioctl+0x3c/0x70
[  618.594928]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  618.594933] RIP: 0033:0x7ff0c0be2587
[  618.594936] RSP: 002b:00007ffd5fe7eac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  618.594943] RAX: ffffffffffffffda RBX: ffffffff8146cce3 RCX: 00007ff0c0be2587
[  618.594946] RDX: 00007ffd5fe7ead0 RSI: 00000000c0185879 RDI: 0000000000000004
[  618.594950] RBP: ffffc9000112ff88 R08: 000000159f84e9a0 R09: 0000000000000000
[  618.594953] R10: 000000000000000a R11: 0000000000000246 R12: 00007ff0c12f8250
[  618.594957] R13: 0000000000000000 R14: 000000159f849800 R15: 0000000000000004
[  618.594963]  ? __this_cpu_preempt_check+0x13/0x20
[  618.594976] OOM killer enabled.

Comment 3 Chris Wilson 2017-08-08 09:06:05 UTC

(In reply to Martin Peres from comment #1)
> Can we make sure that the change in IGT's code related to rtcwake handling
> did not introduce this regression?

It did not.

Comment 4 Martin Peres 2017-08-08 09:51:11 UTC

[12:48:39] <ickle> but we could do with automatically tracing over suspend
[12:49:10] <mupuf> right, so we can fail, but at least have data to report bugs
[12:49:32] <ickle> find bug, fix bug, test again, that is the way

Comment 5 Maarten Lankhorst 2017-08-08 13:16:22 UTC

Other hosts set this in cmdline for this bug, so perhaps set it for this host too? scsi_mod.use_blk_mq=0

Comment 6 Maarten Lankhorst 2017-08-08 13:16:52 UTC

Oops, overwrite mupuf's changes.

Comment 7 Chris Wilson 2017-08-08 16:29:39 UTC

Similar wait-on-io hang to 5912148.iRCpNe8Dyb@natalenko.name

"Hello Jens, Christoph.

Unfortunately, even with "block: disable runtime-pm for blk-mq" patch applied
blk-mq breaks suspend to RAM for me. It is reproducible on my laptop as well
as in a VM."

Disabling blk-mq seems like something to try, see Maarten's reply.

Comment 8 Martin Peres 2017-08-16 13:41:04 UTC

(In reply to Chris Wilson from comment #7)
> Similar wait-on-io hang to 5912148.iRCpNe8Dyb@natalenko.name
> 
> "Hello Jens, Christoph.
> 
> Unfortunately, even with "block: disable runtime-pm for blk-mq" patch applied
> blk-mq breaks suspend to RAM for me. It is reproducible on my laptop as well
> as in a VM."
> 
> Disabling blk-mq seems like something to try, see Maarten's reply.

We already have this workaround set on all our machines, so this is not the issue.

The bug is also seen here:
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw3/igt@drv_suspend@debugfs-reader-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw4/igt@drv_suspend@fence-restore-tiled2untiled-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw5/igt@drv_suspend@fence-restore-untiled-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw3/igt@gem_exec_suspend@basic-S4.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw6/igt@drv_suspend@sysfs-reader-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw3/igt@drv_suspend@forcewake-hibernate.html

Comment 9 Jani Saarinen 2017-09-29 14:41:57 UTC

Seen only once. lets resolve.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.