Bug 105762

Summary:	[CI][CFL only] igt@kms_@suspend* - dmesg-warn - Delta way too big! 18446743856563626466 ts=18446744054496180323 write stamp = 197932553857 - WARNING: CPU: 0 PID: 1596 at kernel/trace/ring_buffer.c:2640 rb_handle_timestamp.isra.33+0x6c/0x70
Product:	DRI	Reporter:	Marta Löfstedt <marta.lofstedt>
Component:	DRM/Intel	Assignee:	Intel GFX Bugs mailing list <intel-gfx-bugs>
Status:	CLOSED DUPLICATE	QA Contact:	Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity:	normal
Priority:	medium	CC:	imre.deak, intel-gfx-bugs
Version:	DRI git
Hardware:	Other
OS:	All
Whiteboard:	ReadyForDev
i915 platform:	CFL, GLK	i915 features:	power/suspend-resume

Description Marta Löfstedt 2018-03-27 09:30:06 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_7/fi-cfl-s3/igt@kms_vblank@pipe-c-ts-continuation-dpms-suspend.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_7/fi-cfl-s3/igt@kms_cursor_crc@cursor-256x256-suspend.html

<4>[  364.610611] Delta way too big! 18446743856563626466 ts=18446744054496180323 write stamp = 197932553857
                  If you just came from a suspend/resume,
                  please switch to the trace global clock:
                    echo global > /sys/kernel/debug/tracing/trace_clock
<4>[  364.610617] WARNING: CPU: 0 PID: 1596 at kernel/trace/ring_buffer.c:2640 rb_handle_timestamp.isra.33+0x6c/0x70
<4>[  364.610618] Modules linked in: vgem i915 cdc_ether usbnet x86_pkg_temp_thermal r8152 intel_powerclamp mii coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel prime_numbers e1000e mei_me mei
<4>[  364.610634] CPU: 0 PID: 1596 Comm: kworker/u24:11 Tainted: G     U           4.16.0-rc6-g94f5d9189e61-drmtip_7+ #1
<4>[  364.610634] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X118.B19.1802080131 02/08/2018
<4>[  364.610637] Workqueue: events_unbound async_run_entry_fn
<4>[  364.610639] RIP: 0010:rb_handle_timestamp.isra.33+0x6c/0x70
<4>[  364.610640] RSP: 0018:ffffaaaf00de3c40 EFLAGS: 00010286
<4>[  364.610642] RAX: 00000000000000e1 RBX: ffffaaaf00de3c58 RCX: 0000000000000003
<4>[  364.610643] RDX: 0000000080000003 RSI: ffffffffa50eaa49 RDI: 00000000ffffffff
<4>[  364.610644] RBP: ffff91f31b42fb08 R08: 0000000008db1ed0 R09: 0000000000000000
<4>[  364.610645] R10: ffffaaaf00de3c50 R11: 0000000000000000 R12: ffff91f31b42fb28
<4>[  364.610646] R13: 00000000000003e8 R14: ffffcaaeffa08e6c R15: 0000000000000001
<4>[  364.610647] FS:  0000000000000000(0000) GS:ffff91f31d200000(0000) knlGS:0000000000000000
<4>[  364.610648] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  364.610649] CR2: 000055e6ac6130f6 CR3: 00000002e6210002 CR4: 00000000003606f0
<4>[  364.610650] Call Trace:
<4>[  364.610652]  ring_buffer_lock_reserve+0x1a4/0x370
<4>[  364.610673]  ? intel_gpu_reset+0x17d/0x200 [i915]
<4>[  364.610676]  trace_vbprintk+0xb4/0x220
<4>[  364.610678]  ? sched_clock_local+0x12/0x80
<4>[  364.610697]  ? gen6_reset_engines+0x90/0x90 [i915]
<4>[  364.610700]  __trace_bprintk+0x4a/0x60
<4>[  364.610720]  intel_gpu_reset+0x198/0x200 [i915]
<4>[  364.610739]  ? gen6_reset_engines+0x90/0x90 [i915]
<4>[  364.610752]  ? vlv_resume_prepare+0x670/0x670 [i915]
<4>[  364.610768]  i915_gem_sanitize+0x36/0x80 [i915]
<4>[  364.610782]  i915_pm_resume_early+0xba/0x150 [i915]
<4>[  364.610785]  dpm_run_callback+0x5d/0x2f0
<4>[  364.610787]  device_resume_early+0x86/0xc0
<4>[  364.610790]  async_resume_early+0x14/0x40
<4>[  364.610791]  async_run_entry_fn+0x34/0x160
<4>[  364.610794]  process_one_work+0x224/0x680
<4>[  364.610797]  worker_thread+0x35/0x380
<4>[  364.610800]  ? process_one_work+0x680/0x680
<4>[  364.610801]  kthread+0x119/0x130
<4>[  364.610803]  ? _kthread_create_on_node+0x30/0x30
<4>[  364.610807]  ret_from_fork+0x3a/0x50
<4>[  364.610811] Code: 24 48 8b 73 08 85 c0 48 8b 13 48 c7 c0 b8 b0 06 a5 49 c7 c0 ae 6c 07 a5 4c 0f 44 c0 48 8b 0f 48 c7 c7 40 b1 06 a5 e8 34 85 f3 ff <0f> 0b eb a8 41 55 41 54 55 53 48 83 ec 28 65 ff 05 2f ca ec 5b 
<4>[  364.610862] WARNING: CPU: 0 PID: 1596 at kernel/trace/ring_buffer.c:2640 rb_handle_timestamp.isra.33+0x6c/0x70
<4>[  364.610863] ---[ end trace 855bffddab9ab224 ]---

Comment 1 Marta Löfstedt 2018-04-03 10:04:58 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_8/fi-cfl-s3/igt@kms_vblank@pipe-b-ts-continuation-suspend.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_11/fi-cfl-s3/igt@gem_workarounds@suspend-resume.html

Comment 2 Chris Wilson 2018-04-03 10:40:50 UTC

Patches heading upstream; merged to topic/core-for-CI for safe keeping.

commit 344cbd8805a44f2692623eb29b7231d3d52902ad
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Mar 27 10:44:52 2018 +0100

    trace: Default to using trace_global_clock if sched_clock is unstable
    
    Across suspend, we may see a very large drift in timestamps if the sched
    clock is unstable, prompting the global trace's ringbuffer code to warn
    and suggest switching to the global clock. Preempt this request by
    detecting when the sched clock is unstable (determined during
    late_initcall) and automatically switching the default clock over to
    trace_global_clock.
    
    This should prevent requiring user interaction to resolve warnings such
    as:
    
        Delta way too big! 18446743856563626466 ts=18446744054496180323 write stamp = 197932553857
        If you just came from a suspend/resume,
        please switch to the trace global clock:
        echo global > /sys/kernel/debug/tracing/trace_clock
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
    ---
    v2: Tell the user what's happening and what they can do to correct it.
    Blessed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180330150132.16903-1-chris@chris-wilson.co.uk

Comment 3 Marta Löfstedt 2018-04-04 07:09:47 UTC

Need to wait for more drmtip runs to be able to close

Comment 4 Marta Löfstedt 2018-04-06 11:14:28 UTC

The issue was not reproduced on drmtip_12 on CFL-s3. I will close

Comment 5 Marta Löfstedt 2018-04-09 11:20:32 UTC

Meh:
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_16/fi-cfl-s3/igt@gem_softpin@noreloc-s3.html

Comment 6 Marta Löfstedt 2018-04-09 11:21:25 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_16/fi-cfl-s3/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_13/fi-cfl-s3/igt@drv_suspend@forcewake.html

Comment 7 Chris Wilson 2018-04-09 11:46:11 UTC

Please file cfl separately as TSC is not mean to be broken on new hardware, and please let peterz/hpj know.

<4>[  134.519360] clocksource: timekeeping watchdog on CPU4: Marking clocksource 'tsc' as unstable because the skew is too large:
<4>[  134.519436] clocksource:                       'hpet' wd_now: c05a4d67 wd_last: bfc4584b mask: ffffffff
<4>[  134.519450] clocksource:                       'tsc' cs_now: 72c02e8d6e cs_last: 7261d1ce0f mask: ffffffffffffffff
<6>[  134.520117] tsc: Marking TSC unstable due to clocksource watchdog
<4>[  134.520482] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.

Comment 8 Marta Löfstedt 2018-04-09 12:04:11 UTC

(In reply to Chris Wilson from comment #7)
> Please file cfl separately as TSC is not mean to be broken on new hardware,
> and please let peterz/hpj know.
> 
> <4>[  134.519360] clocksource: timekeeping watchdog on CPU4: Marking
> clocksource 'tsc' as unstable because the skew is too large:
> <4>[  134.519436] clocksource:                       'hpet' wd_now: c05a4d67
> wd_last: bfc4584b mask: ffffffff
> <4>[  134.519450] clocksource:                       'tsc' cs_now:
> 72c02e8d6e cs_last: 7261d1ce0f mask: ffffffffffffffff
> <6>[  134.520117] tsc: Marking TSC unstable due to clocksource watchdog
> <4>[  134.520482] TSC found unstable after boot, most likely due to broken
> BIOS. Use 'tsc=unstable'.

but it is only CFL.

Comment 9 Chris Wilson 2018-04-09 12:08:59 UTC

Oh disregard me, I was thought I was fixing the elk issue where we knew the clocksource was unstable from boot...

Comment 10 Marta Löfstedt 2018-04-12 10:30:27 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_17/fi-cfl-s3/igt@gem_eio@in-flight-suspend.html

Comment 11 Marta Löfstedt 2018-04-13 05:52:21 UTC

Also, on GLK NUC shards

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4050/shard-glk2/igt@kms_vblank@pipe-c-ts-continuation-dpms-suspend.html

[  441.435413]  cache: parent cpu1 should not be sleeping
[  441.492889] ------------[ cut here ]------------
[  441.492896] Delta way too big! 18446743614755384942 ts=18446744055966652910 write stamp = 441211267968
               If you just came from a suspend/resume,
               please switch to the trace global clock:
                 echo global > /sys/kernel/debug/tracing/trace_clock
               or add trace_clock=global to the kernel command line
[  441.492919] WARNING: CPU: 1 PID: 1498 at kernel/trace/ring_buffer.c:2641 rb_handle_timestamp.isra.31+0x6c/0x70
[  441.492924] Modules linked in: vgem btusb btrtl btbcm btintel x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul bluetooth ghash_clmulni_intel ecdh_generic snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic r8169 mii i915 i2c_hid snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm pinctrl_geminilake pinctrl_intel mei_me prime_numbers mei
[  441.493066] CPU: 1 PID: 1498 Comm: kworker/u8:17 Tainted: G     U           4.16.0-rc7-CI-CI_DRM_4050+ #1
[  441.493071] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0027.2018.0125.1347 01/25/2018
[  441.493079] Workqueue: events_unbound async_run_entry_fn
[  441.493090] RIP: 0010:rb_handle_timestamp.isra.31+0x6c/0x70
[  441.493095] RSP: 0000:ffffc9000075fc00 EFLAGS: 00010286
[  441.493104] RAX: 0000000000000000 RBX: ffffc9000075fc18 RCX: 0000000000000003
[  441.493109] RDX: 0000000080000003 RSI: ffffffff820b3117 RDI: 00000000ffffffff
[  441.493114] RBP: ffff880276f67068 R08: ffff8801f24f5720 R09: 0000000025a6c509
[  441.493118] R10: ffffc9000075fd28 R11: 0000000000000000 R12: ffff880276f67088
[  441.493123] R13: 00000000000003e8 R14: ffffe8ffffc8a11c R15: 0000000000000001
[  441.493128] FS:  0000000000000000(0000) GS:ffff88027fc80000(0000) knlGS:0000000000000000
[  441.493134] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  441.493138] CR2: 0000000000000000 CR3: 000000024131a000 CR4: 0000000000340ee0
[  441.493143] Call Trace:
[  441.493155]  ring_buffer_lock_reserve+0x227/0x3a0
[  441.493262]  ? intel_gpu_reset+0x170/0x210 [i915]
[  441.493262]  trace_vbprintk+0xba/0x240
[  441.493262]  ? gen6_reset_engines+0xc0/0xc0 [i915]
[  441.493262]  __trace_bprintk+0x57/0x80
[  441.493262]  intel_gpu_reset+0x189/0x210 [i915]
[  441.493262]  ? gen6_reset_engines+0xc0/0xc0 [i915]
[  441.493262]  ? i915_pm_thaw_early+0x10/0x10 [i915]
[  441.493262]  i915_gem_sanitize+0x36/0x80 [i915]
[  441.493262]  i915_pm_restore_early+0xad/0x140 [i915]
[  441.493262]  dpm_run_callback+0x5f/0x310
[  441.493262]  device_resume_early+0xa6/0xe0
[  441.493262]  async_resume_early+0x14/0x40
[  441.493262]  async_run_entry_fn+0x2e/0x160
[  441.493262]  process_one_work+0x21a/0x640
[  441.493262]  worker_thread+0x48/0x3a0
[  441.493262]  kthread+0xfb/0x130
[  441.493262]  ? process_one_work+0x640/0x640
[  441.493262]  ? _kthread_create_on_node+0x60/0x60
[  441.493262]  ret_from_fork+0x3a/0x50
[  441.493262] Code: 24 48 8b 73 08 85 c0 48 8b 13 48 c7 c0 a0 83 08 82 49 c7 c0 17 31 0b 82 4c 0f 44 c0 48 8b 0f 48 c7 c7 60 84 08 82 e8 54 28 f3 ff <0f> 0b eb a8 41 55 41 54 55 53 48 83 ec 30 65 48 8b 04 25 28 00 
[  441.493262] WARNING: CPU: 1 PID: 1498 at kernel/trace/ring_buffer.c:2641 rb_handle_timestamp.isra.31+0x6c/0x70
[  441.493262] ---[ end trace b50f43c86b8cffe4 ]---

Comment 12 Marta Löfstedt 2018-04-18 10:07:48 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_22/fi-cfl-s3/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c.html

Comment 13 Chris Wilson 2018-05-25 11:56:16 UTC


*** This bug has been marked as a duplicate of bug 106612 ***

Comment 14 Jani Saarinen 2018-06-19 14:26:03 UTC

Closing, thanks.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.