Bug 102102

Summary: [BAT][CTG] rtcwake failed with 256
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED WORKSFORME QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: high CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: GM45 i915 features: power/Other

Description Martin Peres 2017-08-08 06:59:40 UTC
On CI_DRM_2931, the machine fi-ctg-p8600 produced the following error when running igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
	
rtcwake: write error
(kms_pipe_crc_basic:3915) igt-aux-CRITICAL: Test assertion failure function suspend_via_rtcwake, file igt_aux.c:771:
(kms_pipe_crc_basic:3915) igt-aux-CRITICAL: Failed assertion: ret == 0
(kms_pipe_crc_basic:3915) igt-aux-CRITICAL: rtcwake failed with 256


Full logs: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2931/fi-ctg-p8600/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b.html
Comment 1 Martin Peres 2017-08-08 07:02:48 UTC
Can we make sure that the change in IGT's code related to rtcwake handling did not introduce this regression?
Comment 2 Chris Wilson 2017-08-08 09:05:54 UTC
[  598.531479] PM: Syncing filesystems ... done.
[  598.590879] PM: Preparing system for sleep (mem)
[  598.591612] Freezing user space processes ...
[  618.594711] Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
[  618.594741] fstrim          D    0  3896   3893 0x00000004
[  618.594764] Call Trace:
[  618.594775]  __schedule+0x3c3/0xb00
[  618.594781]  ? queue_unplugged+0x61/0x1b0
[  618.594789]  ? wait_for_common_io.constprop.1+0xe5/0x180
[  618.594793]  schedule+0x3b/0x90
[  618.594798]  schedule_timeout+0x24c/0x490
[  618.594804]  ? trace_hardirqs_on_caller+0xe3/0x1b0
[  618.594809]  ? trace_hardirqs_on+0xd/0x10
[  618.594817]  ? wait_for_common_io.constprop.1+0xe5/0x180
[  618.594822]  io_schedule_timeout+0x19/0x40
[  618.594826]  ? io_schedule_timeout+0x19/0x40
[  618.594831]  wait_for_common_io.constprop.1+0x104/0x180
[  618.594837]  ? wake_up_q+0x80/0x80
[  618.594844]  wait_for_completion_io+0x13/0x20
[  618.594848]  submit_bio_wait+0x54/0x60
[  618.594858]  blkdev_issue_discard+0x6c/0xb0
[  618.594862]  ? ext4_trim_fs+0x424/0xc30
[  618.594871]  ext4_trim_fs+0x4b2/0xc30
[  618.594875]  ? ext4_trim_fs+0x4b2/0xc30
[  618.594891]  ext4_ioctl+0xc36/0x10f0
[  618.594901]  do_vfs_ioctl+0x8f/0x660
[  618.594906]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
[  618.594912]  ? __this_cpu_preempt_check+0x13/0x20
[  618.594916]  ? trace_hardirqs_on_caller+0xe3/0x1b0
[  618.594922]  SyS_ioctl+0x3c/0x70
[  618.594928]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  618.594933] RIP: 0033:0x7ff0c0be2587
[  618.594936] RSP: 002b:00007ffd5fe7eac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  618.594943] RAX: ffffffffffffffda RBX: ffffffff8146cce3 RCX: 00007ff0c0be2587
[  618.594946] RDX: 00007ffd5fe7ead0 RSI: 00000000c0185879 RDI: 0000000000000004
[  618.594950] RBP: ffffc9000112ff88 R08: 000000159f84e9a0 R09: 0000000000000000
[  618.594953] R10: 000000000000000a R11: 0000000000000246 R12: 00007ff0c12f8250
[  618.594957] R13: 0000000000000000 R14: 000000159f849800 R15: 0000000000000004
[  618.594963]  ? __this_cpu_preempt_check+0x13/0x20
[  618.594976] OOM killer enabled.
Comment 3 Chris Wilson 2017-08-08 09:06:05 UTC
(In reply to Martin Peres from comment #1)
> Can we make sure that the change in IGT's code related to rtcwake handling
> did not introduce this regression?

It did not.
Comment 4 Martin Peres 2017-08-08 09:51:11 UTC
[12:48:39] <ickle> but we could do with automatically tracing over suspend
[12:49:10] <mupuf> right, so we can fail, but at least have data to report bugs
[12:49:32] <ickle> find bug, fix bug, test again, that is the way
Comment 5 Maarten Lankhorst 2017-08-08 13:16:22 UTC
Other hosts set this in cmdline for this bug, so perhaps set it for this host too? scsi_mod.use_blk_mq=0
Comment 6 Maarten Lankhorst 2017-08-08 13:16:52 UTC
Oops, overwrite mupuf's changes.
Comment 7 Chris Wilson 2017-08-08 16:29:39 UTC
Similar wait-on-io hang to 5912148.iRCpNe8Dyb@natalenko.name

"Hello Jens, Christoph.

Unfortunately, even with "block: disable runtime-pm for blk-mq" patch applied
blk-mq breaks suspend to RAM for me. It is reproducible on my laptop as well
as in a VM."

Disabling blk-mq seems like something to try, see Maarten's reply.
Comment 8 Martin Peres 2017-08-16 13:41:04 UTC
(In reply to Chris Wilson from comment #7)
> Similar wait-on-io hang to 5912148.iRCpNe8Dyb@natalenko.name
> 
> "Hello Jens, Christoph.
> 
> Unfortunately, even with "block: disable runtime-pm for blk-mq" patch applied
> blk-mq breaks suspend to RAM for me. It is reproducible on my laptop as well
> as in a VM."
> 
> Disabling blk-mq seems like something to try, see Maarten's reply.

We already have this workaround set on all our machines, so this is not the issue.

The bug is also seen here:
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw3/igt@drv_suspend@debugfs-reader-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw4/igt@drv_suspend@fence-restore-tiled2untiled-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw5/igt@drv_suspend@fence-restore-untiled-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw3/igt@gem_exec_suspend@basic-S4.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw6/igt@drv_suspend@sysfs-reader-hibernate.html
 - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2968/shard-hsw3/igt@drv_suspend@forcewake-hibernate.html
Comment 9 Jani Saarinen 2017-09-29 14:41:57 UTC
Seen only once. lets resolve.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.