105560 – [CI] igt@gem_exec_suspend@basic-s3 - dmesg-warn - Delta way too big!

Bug 105560 - [CI] igt@gem_exec_suspend@basic-s3 - dmesg-warn - Delta way too big!

Summary: [CI] igt@gem_exec_suspend@basic-s3 - dmesg-warn - Delta way too big!

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	XOrg git
Hardware:	Other All

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	ReadyForDev
Keywords:

Depends on:
Blocks:

Reported:	2018-03-16 17:50 UTC by Martin Peres
Modified:	2018-04-20 13:00 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:	GM45, I915G
i915 features:	power/suspend-resume

Attachments

Description Martin Peres 2018-03-16 17:50:42 UTC

[  180.968583] hpet1: lost 7571 rtc interrupts
[  180.976416] ------------[ cut here ]------------
[  180.976420] Delta way too big! 18446743896193258192 ts=18446744055554634448 write stamp = 159361376256
               If you just came from a suspend/resume,
               please switch to the trace global clock:
                 echo global > /sys/kernel/debug/tracing/trace_clock
[  180.976432] WARNING: CPU: 0 PID: 2799 at kernel/trace/ring_buffer.c:2640 rb_handle_timestamp.isra.31+0x6c/0x70
[  180.976435] Modules linked in: vgem i915 coretemp snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm lpc_ich e1000e mei_me mei prime_numbers
[  180.976467] CPU: 0 PID: 2799 Comm: kworker/u8:50 Tainted: G     U           4.16.0-rc5-CI-CI_DRM_3937+ #1
[  180.976469] Hardware name: Hewlett-Packard HP Compaq 8000 Elite CMT PC/3647h, BIOS 786G7 v01.02 10/22/2009
[  180.976473] Workqueue: events_unbound async_run_entry_fn
[  180.976478] RIP: 0010:rb_handle_timestamp.isra.31+0x6c/0x70
[  180.976480] RSP: 0018:ffffc900007d7b68 EFLAGS: 00010292
[  180.976484] RAX: 00000000000000e1 RBX: ffffc900007d7b80 RCX: 0000000000000003
[  180.976486] RDX: 0000000080000003 RSI: ffffffff82110241 RDI: 00000000ffffffff
[  180.976488] RBP: ffff880117027b08 R08: ffff880106d0d668 R09: 0000000028f324eb
[  180.976490] R10: ffffc900007d7c80 R11: 0000000000000000 R12: ffff880117027b28
[  180.976492] R13: 00000000000003e8 R14: ffffe8ffffc023dc R15: 0000000000000005
[  180.976495] FS:  0000000000000000(0000) GS:ffff88011bc00000(0000) knlGS:0000000000000000
[  180.976497] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  180.976499] CR2: 00007f1df82a9440 CR3: 0000000114a0a000 CR4: 00000000000406f0
[  180.976501] Call Trace:
[  180.976507]  ring_buffer_lock_reserve+0x1ff/0x370
[  180.976549]  ? i915_request_await_object+0x330/0x330 [i915]
[  180.976552]  trace_vbprintk+0xba/0x240
[  180.976560]  __trace_bprintk+0x4a/0x60
[  180.976604]  __i915_request_add+0x4f/0x760 [i915]
[  180.976644]  i915_gem_switch_to_kernel_context+0x22e/0x270 [i915]
[  180.976652]  ? pci_pm_suspend_late+0x30/0x30
[  180.976689]  i915_gem_resume+0x99/0xc0 [i915]
[  180.976721]  i915_pm_restore+0x7c/0x130 [i915]
[  180.976726]  dpm_run_callback+0x5f/0x310
[  180.976732]  device_resume+0xa3/0x1b0
[  180.976737]  ? dpm_watchdog_set+0x60/0x60
[  180.976744]  async_resume+0x14/0x40
[  180.976747]  async_run_entry_fn+0x2e/0x160
[  180.976753]  process_one_work+0x215/0x620
[  180.976762]  worker_thread+0x48/0x3a0
[  180.976769]  kthread+0xfb/0x130
[  180.976772]  ? process_one_work+0x620/0x620
[  180.976775]  ? _kthread_create_on_node+0x30/0x30
[  180.976780]  ret_from_fork+0x3a/0x50
[  180.976793] Code: 24 48 8b 73 08 85 c0 48 8b 13 48 c7 c0 30 7d 08 82 49 c7 c0 bf 29 0b 82 4c 0f 44 c0 48 8b 0f 48 c7 c7 b8 7d 08 82 e8 44 89 f3 ff <0f> 0b eb a8 41 55 41 54 55 53 48 83 ec 28 65 ff 05 af 0e ed 7e 
[  180.976906] WARNING: CPU: 0 PID: 2799 at kernel/trace/ring_buffer.c:2640 rb_handle_timestamp.isra.31+0x6c/0x70
[  180.976908] ---[ end trace aa37328ed9e770e9 ]---

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3937/fi-elk-e7500/igt@gem_exec_suspend@basic-s3.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3937/fi-ctg-p8600/igt@gem_exec_suspend@basic-s3.html

Comment 1 Martin Peres 2018-03-16 17:54:45 UTC

This seems to come from one of the following commits as it is reproduced on multiple machines, and for 3 kernels in a row:

ac697ae8013a drm/i915: Stop engines when declaring the machine wedged
d9b13c4dde6c drm/i915: Trace GEM steps between submit and wedging

Comment 2 Chris Wilson 2018-03-28 20:51:11 UTC

Just an issue in kernel/trace/trace.c assuming its default clock is stable. It only showed up recently due to using trace_printk on more machines by default.

https://patchwork.freedesktop.org/series/40728/

Comment 3 Chris Wilson 2018-04-04 21:30:34 UTC

On the second attempt:

commit eaf328ff6661fa9be7d3d2ea3a9dff2eeb91348e
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Mar 30 15:58:19 2018 +0100

    trace: Mention trace_clock=global when warning about unstable clocks
    
    Mention the alternative of adding trace_clock=global to the kernel
    command line when we detect that we've used an unstable clock across a
    suspend/resume cycle.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    ---
    Blessed-by: Steven Rostedt <rostedt@goodmis.org>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180330150132.16903-2-chris@chris-wilson.co.uk

Upstream linux-trace.git:

commit 932066a153350b4af7350f4a9ccbc93a0b9cc86b
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Mar 30 16:01:31 2018 +0100

    tracing: Default to using trace_global_clock if sched_clock is unstable
    
    Across suspend, we may see a very large drift in timestamps if the sched
    clock is unstable, prompting the global trace's ringbuffer code to warn
    and suggest switching to the global clock. Preempt this request by
    detecting when the sched clock is unstable (determined during
    late_initcall) and automatically switching the default clock over to
    trace_global_clock.

+ fixup

    trace: Fixup logic inversion on setting trace_global_clock defaults
    
    In commit 932066a15335 ("tracing: Default to using trace_global_clock if
    sched_clock is unstable"), the logic for deciding to override the
    default clock if unstable was reversed from the earlier posting. I was
    trying to reduce the width of the message by using an early return
    rather than a if-block, but reverted back to using the if-block and
    accidentally left the predicate inverted.
    
    Fixes: 932066a15335 ("tracing: Default to using trace_global_clock if sched_
clock is unstable")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>

Comment 4 Jani Saarinen 2018-04-20 10:57:43 UTC

Martin, do you see this in CI still?

Comment 5 Martin Peres 2018-04-20 13:00:19 UTC

Not seen in 10 runs when it used to be reproducible in all runs, closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.