Bug 102102 - [BAT][CTG] rtcwake failed with 256
Summary: [BAT][CTG] rtcwake failed with 256
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-08 06:59 UTC by Martin Peres
Modified: 2017-09-29 14:42 UTC (History)
1 user (show)

See Also:
i915 platform: GM45
i915 features: power/Other


Attachments

Description Martin Peres 2017-08-08 06:59:40 UTC
On CI_DRM_2931, the machine fi-ctg-p8600 produced the following error when running igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
	
rtcwake: write error
(kms_pipe_crc_basic:3915) igt-aux-CRITICAL: Test assertion failure function suspend_via_rtcwake, file igt_aux.c:771:
(kms_pipe_crc_basic:3915) igt-aux-CRITICAL: Failed assertion: ret == 0
(kms_pipe_crc_basic:3915) igt-aux-CRITICAL: rtcwake failed with 256


Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2931/fi-ctg-p8600/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b.html
Comment 1 Martin Peres 2017-08-08 07:02:48 UTC
Can we make sure that the change in IGT's code related to rtcwake handling did not introduce this regression?
Comment 2 Chris Wilson 2017-08-08 09:05:54 UTC
[  598.531479] PM: Syncing filesystems ... done.
[  598.590879] PM: Preparing system for sleep (mem)
[  598.591612] Freezing user space processes ...
[  618.594711] Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
[  618.594741] fstrim          D    0  3896   3893 0x00000004
[  618.594764] Call Trace:
[  618.594775]  __schedule+0x3c3/0xb00
[  618.594781]  ? queue_unplugged+0x61/0x1b0
[  618.594789]  ? wait_for_common_io.constprop.1+0xe5/0x180
[  618.594793]  schedule+0x3b/0x90
[  618.594798]  schedule_timeout+0x24c/0x490
[  618.594804]  ? trace_hardirqs_on_caller+0xe3/0x1b0
[  618.594809]  ? trace_hardirqs_on+0xd/0x10
[  618.594817]  ? wait_for_common_io.constprop.1+0xe5/0x180
[  618.594822]  io_schedule_timeout+0x19/0x40
[  618.594826]  ? io_schedule_timeout+0x19/0x40
[  618.594831]  wait_for_common_io.constprop.1+0x104/0x180
[  618.594837]  ? wake_up_q+0x80/0x80
[  618.594844]  wait_for_completion_io+0x13/0x20
[  618.594848]  submit_bio_wait+0x54/0x60
[  618.594858]  blkdev_issue_discard+0x6c/0xb0
[  618.594862]  ? ext4_trim_fs+0x424/0xc30
[  618.594871]  ext4_trim_fs+0x4b2/0xc30
[  618.594875]  ? ext4_trim_fs+0x4b2/0xc30
[  618.594891]  ext4_ioctl+0xc36/0x10f0
[  618.594901]  do_vfs_ioctl+0x8f/0x660
[  618.594906]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
[  618.594912]  ? __this_cpu_preempt_check+0x13/0x20
[  618.594916]  ? trace_hardirqs_on_caller+0xe3/0x1b0
[  618.594922]  SyS_ioctl+0x3c/0x70
[  618.594928]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  618.594933] RIP: 0033:0x7ff0c0be2587
[  618.594936] RSP: 002b:00007ffd5fe7eac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  618.594943] RAX: ffffffffffffffda RBX: ffffffff8146cce3 RCX: 00007ff0c0be2587
[  618.594946] RDX: 00007ffd5fe7ead0 RSI: 00000000c0185879 RDI: 0000000000000004
[  618.594950] RBP: ffffc9000112ff88 R08: 000000159f84e9a0 R09: 0000000000000000
[  618.594953] R10: 000000000000000a R11: 0000000000000246 R12: 00007ff0c12f8250
[  618.594957] R13: 0000000000000000 R14: 000000159f849800 R15: 0000000000000004
[  618.594963]  ? __this_cpu_preempt_check+0x13/0x20
[  618.594976] OOM killer enabled.
Comment 3 Chris Wilson 2017-08-08 09:06:05 UTC
(In reply to Martin Peres from comment #1)
> Can we make sure that the change in IGT's code related to rtcwake handling
> did not introduce this regression?

It did not.
Comment 4 Martin Peres 2017-08-08 09:51:11 UTC
[12:48:39] <ickle> but we could do with automatically tracing over suspend
[12:49:10] <mupuf> right, so we can fail, but at least have data to report bugs
[12:49:32] <ickle> find bug, fix bug, test again, that is the way
Comment 5 Maarten Lankhorst 2017-08-08 13:16:22 UTC
Other hosts set this in cmdline for this bug, so perhaps set it for this host too? scsi_mod.use_blk_mq=0
Comment 6 Maarten Lankhorst 2017-08-08 13:16:52 UTC
Oops, overwrite mupuf's changes.
Comment 7 Chris Wilson 2017-08-08 16:29:39 UTC
Similar wait-on-io hang to 5912148.iRCpNe8Dyb@natalenko.name

"Hello Jens, Christoph.

Unfortunately, even with "block: disable runtime-pm for blk-mq" patch applied
blk-mq breaks suspend to RAM for me. It is reproducible on my laptop as well
as in a VM."

Disabling blk-mq seems like something to try, see Maarten's reply.
Comment 8 Martin Peres 2017-08-16 13:41:04 UTC
(In reply to Chris Wilson from comment #7)
> Similar wait-on-io hang to 5912148.iRCpNe8Dyb@natalenko.name
> 
> "Hello Jens, Christoph.
> 
> Unfortunately, even with "block: disable runtime-pm for blk-mq" patch applied
> blk-mq breaks suspend to RAM for me. It is reproducible on my laptop as well
> as in a VM."
> 
> Disabling blk-mq seems like something to try, see Maarten's reply.

We already have this workaround set on all our machines, so this is not the issue.

The bug is also seen here:
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw3/igt@drv_suspend@debugfs-reader-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw4/igt@drv_suspend@fence-restore-tiled2untiled-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw5/igt@drv_suspend@fence-restore-untiled-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw3/igt@gem_exec_suspend@basic-S4.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw6/igt@drv_suspend@sysfs-reader-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw3/igt@drv_suspend@forcewake-hibernate.html
Comment 9 Jani Saarinen 2017-09-29 14:41:57 UTC
Seen only once. lets resolve.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.