102102 – [BAT][CTG] rtcwake failed with 256

Bug 102102 - [BAT][CTG] rtcwake failed with 256

Summary: [BAT][CTG] rtcwake failed with 256

Status:	CLOSED WORKSFORME

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	DRI git
Hardware:	Other All

Importance:	high critical
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	ReadyForDev
Keywords:

Depends on:
Blocks:

Reported:	2017-08-08 06:59 UTC by Martin Peres
Modified:	2017-09-29 14:42 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:	GM45
i915 features:	power/Other

Attachments

Description Martin Peres 2017-08-08 06:59:40 UTC

On CI_DRM_2931, the machine fi-ctg-p8600 produced the following error when running igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
	
rtcwake: write error
(kms_pipe_crc_basic:3915) igt-aux-CRITICAL: Test assertion failure function suspend_via_rtcwake, file igt_aux.c:771:
(kms_pipe_crc_basic:3915) igt-aux-CRITICAL: Failed assertion: ret == 0
(kms_pipe_crc_basic:3915) igt-aux-CRITICAL: rtcwake failed with 256


Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2931/fi-ctg-p8600/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b.html

Comment 1 Martin Peres 2017-08-08 07:02:48 UTC

Can we make sure that the change in IGT's code related to rtcwake handling did not introduce this regression?

Comment 2 Chris Wilson 2017-08-08 09:05:54 UTC

[  598.531479] PM: Syncing filesystems ... done.
[  598.590879] PM: Preparing system for sleep (mem)
[  598.591612] Freezing user space processes ...
[  618.594711] Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
[  618.594741] fstrim          D    0  3896   3893 0x00000004
[  618.594764] Call Trace:
[  618.594775]  __schedule+0x3c3/0xb00
[  618.594781]  ? queue_unplugged+0x61/0x1b0
[  618.594789]  ? wait_for_common_io.constprop.1+0xe5/0x180
[  618.594793]  schedule+0x3b/0x90
[  618.594798]  schedule_timeout+0x24c/0x490
[  618.594804]  ? trace_hardirqs_on_caller+0xe3/0x1b0
[  618.594809]  ? trace_hardirqs_on+0xd/0x10
[  618.594817]  ? wait_for_common_io.constprop.1+0xe5/0x180
[  618.594822]  io_schedule_timeout+0x19/0x40
[  618.594826]  ? io_schedule_timeout+0x19/0x40
[  618.594831]  wait_for_common_io.constprop.1+0x104/0x180
[  618.594837]  ? wake_up_q+0x80/0x80
[  618.594844]  wait_for_completion_io+0x13/0x20
[  618.594848]  submit_bio_wait+0x54/0x60
[  618.594858]  blkdev_issue_discard+0x6c/0xb0
[  618.594862]  ? ext4_trim_fs+0x424/0xc30
[  618.594871]  ext4_trim_fs+0x4b2/0xc30
[  618.594875]  ? ext4_trim_fs+0x4b2/0xc30
[  618.594891]  ext4_ioctl+0xc36/0x10f0
[  618.594901]  do_vfs_ioctl+0x8f/0x660
[  618.594906]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
[  618.594912]  ? __this_cpu_preempt_check+0x13/0x20
[  618.594916]  ? trace_hardirqs_on_caller+0xe3/0x1b0
[  618.594922]  SyS_ioctl+0x3c/0x70
[  618.594928]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  618.594933] RIP: 0033:0x7ff0c0be2587
[  618.594936] RSP: 002b:00007ffd5fe7eac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  618.594943] RAX: ffffffffffffffda RBX: ffffffff8146cce3 RCX: 00007ff0c0be2587
[  618.594946] RDX: 00007ffd5fe7ead0 RSI: 00000000c0185879 RDI: 0000000000000004
[  618.594950] RBP: ffffc9000112ff88 R08: 000000159f84e9a0 R09: 0000000000000000
[  618.594953] R10: 000000000000000a R11: 0000000000000246 R12: 00007ff0c12f8250
[  618.594957] R13: 0000000000000000 R14: 000000159f849800 R15: 0000000000000004
[  618.594963]  ? __this_cpu_preempt_check+0x13/0x20
[  618.594976] OOM killer enabled.

Comment 3 Chris Wilson 2017-08-08 09:06:05 UTC

(In reply to Martin Peres from comment #1)
> Can we make sure that the change in IGT's code related to rtcwake handling
> did not introduce this regression?

It did not.

Comment 4 Martin Peres 2017-08-08 09:51:11 UTC

[12:48:39] <ickle> but we could do with automatically tracing over suspend
[12:49:10] <mupuf> right, so we can fail, but at least have data to report bugs
[12:49:32] <ickle> find bug, fix bug, test again, that is the way

Comment 5 Maarten Lankhorst 2017-08-08 13:16:22 UTC

Other hosts set this in cmdline for this bug, so perhaps set it for this host too? scsi_mod.use_blk_mq=0

Comment 6 Maarten Lankhorst 2017-08-08 13:16:52 UTC

Oops, overwrite mupuf's changes.

Comment 7 Chris Wilson 2017-08-08 16:29:39 UTC

Similar wait-on-io hang to 5912148.iRCpNe8Dyb@natalenko.name

"Hello Jens, Christoph.

Unfortunately, even with "block: disable runtime-pm for blk-mq" patch applied
blk-mq breaks suspend to RAM for me. It is reproducible on my laptop as well
as in a VM."

Disabling blk-mq seems like something to try, see Maarten's reply.

Comment 8 Martin Peres 2017-08-16 13:41:04 UTC

(In reply to Chris Wilson from comment #7)
> Similar wait-on-io hang to 5912148.iRCpNe8Dyb@natalenko.name
> 
> "Hello Jens, Christoph.
> 
> Unfortunately, even with "block: disable runtime-pm for blk-mq" patch applied
> blk-mq breaks suspend to RAM for me. It is reproducible on my laptop as well
> as in a VM."
> 
> Disabling blk-mq seems like something to try, see Maarten's reply.

We already have this workaround set on all our machines, so this is not the issue.

The bug is also seen here:
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw3/igt@drv_suspend@debugfs-reader-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw4/igt@drv_suspend@fence-restore-tiled2untiled-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw5/igt@drv_suspend@fence-restore-untiled-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw3/igt@gem_exec_suspend@basic-S4.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw6/igt@drv_suspend@sysfs-reader-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw3/igt@drv_suspend@forcewake-hibernate.html

Comment 9 Jani Saarinen 2017-09-29 14:41:57 UTC

Seen only once. lets resolve.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.