On CI https://intel-gfx-ci.01.org/CI/CI_DRM_2254/fi-ctg-l9400/igt@gem_exec_suspend@basic-s3.html Dmesg: https://intel-gfx-ci.01.org/CI/CI_DRM_2254/fi-ctg-l9400/dmesg-during.log Igt log: https://intel-gfx-ci.01.org/CI/CI_DRM_2254/fi-ctg-l9400/igt.log
[ 254.327753] =============================== [ 254.327753] [ ERR: suspicious RCU usage. ] [ 254.327753] 4.10.0-CI-CI_DRM_2254+ #1 Not tainted [ 254.327753] ------------------------------- [ 254.327753] kernel/workqueue.c:712 sched RCU or wq_pool_mutex should be held! [ 254.327753] other info that might help us debug this: [ 254.327753] RCU used illegally from offline CPU! rcu_scheduler_active = 2, debug_locks = 0 [ 254.327753] no locks held by swapper/1/0. [ 254.327753] stack backtrace: [ 254.327753] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.10.0-CI-CI_DRM_2254+ #1 [ 254.327753] Hardware name: LENOVO 7465CTO/7465CTO, BIOS 6DET44WW (2.08 ) 04/22/2009 [ 254.327753] Call Trace: [ 254.327753] dump_stack+0x67/0x92 [ 254.327753] lockdep_rcu_suspicious+0xd7/0x110 [ 254.327753] get_work_pool+0xb1/0xc0 [ 254.327753] __queue_work+0x89/0x620 [ 254.327753] queue_work_on+0x33/0x70 [ 254.327753] clear_sched_clock_stable+0x33/0x40 [ 254.327753] early_init_intel+0x10a/0x310 [ 254.327753] init_intel+0x11/0x310 [ 254.327753] identify_cpu+0x309/0x520 [ 254.327753] identify_secondary_cpu+0x18/0x80 [ 254.327753] smp_store_cpu_info+0x3a/0x40 [ 254.327753] start_secondary+0x53/0x120 [ 254.327753] start_cpu+0x14/0x14 [ 254.337007] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/1:2/105 [ 254.337007] caller is debug_smp_processor_id+0x17/0x20 [ 254.337007] CPU: 0 PID: 105 Comm: kworker/1:2 Not tainted 4.10.0-CI-CI_DRM_2254+ #1 [ 254.337007] Disabled fast string operations [ 254.337007] Hardware name: LENOVO 7465CTO/7465CTO, BIOS 6DET44WW (2.08 ) 04/22/2009 [ 254.337007] Workqueue: events __clear_sched_clock_stable [ 254.337007] Call Trace: [ 254.337007] dump_stack+0x67/0x92 [ 254.337007] check_preemption_disabled+0xd7/0xe0 [ 254.337007] debug_smp_processor_id+0x17/0x20 [ 254.337007] __clear_sched_clock_stable+0x11/0x60 [ 254.337007] process_one_work+0x1f4/0x6d0 [ 254.337007] ? process_one_work+0x16e/0x6d0 [ 254.337007] worker_thread+0x49/0x4a0 [ 254.337007] kthread+0x107/0x140 [ 254.337007] ? process_one_work+0x6d0/0x6d0 [ 254.337007] ? kthread_create_on_node+0x40/0x40 [ 254.337007] ret_from_fork+0x2e/0x40 [ 254.337007] sched_clock: Marking unstable (254336007317, -321214989128)<-(-66878981811, 0) [ 254.758282] cache: parent cpu1 should not be sleeping
Date: Wed, 8 Mar 2017 14:16:56 -0800 From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Subject: RCU used on incoming CPU before rcu_cpu_starting() called Message-Id: <20170308221656.GA11949@linux.vnet.ibm.com>
(In reply to Chris Wilson from comment #2) > Date: Wed, 8 Mar 2017 14:16:56 -0800 > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > Subject: RCU used on incoming CPU before rcu_cpu_starting() called > Message-Id: <20170308221656.GA11949@linux.vnet.ibm.com> According to <20170320125000.GG3637@linux.vnet.ibm.com>, the commit below (present in 4.11-rc2) should have fixed this condition: f94c8d116997 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface")
Until it happens again...
(In reply to Chris Wilson from comment #4) > Until it happens again... It did: https://intel-gfx-ci.01.org/CI/CI_DRM_2610/fi-ctg-l9400/igt@gem_exec_suspend@basic-s3.html
(In reply to Martin Peres from comment #5) > (In reply to Chris Wilson from comment #4) > > Until it happens again... > > It did: > https://intel-gfx-ci.01.org/CI/CI_DRM_2610/fi-ctg-l9400/ > igt@gem_exec_suspend@basic-s3.html In the last 6 runs, it failed 5 times.
gem_exec_suspend@basic-s3 incomplete on fi-skl-6260u. https://intel-gfx-ci.01.org/CI/Patchwork_4753/fi-skl-6260u/igt@gem_exec_suspend@basic-s3.html
Forgot to add that the patch in question does not touch skl codepaths, so it is in CI_DRM_2635.
As Martin noted, very likely that skl is a separate. So made a new bug to track skl. https://bugs.freedesktop.org/show_bug.cgi?id=101098
So, we found that this particular machine had a very weird clock drift when suspended. We swapped this machine for another one which does not show this behaviour. I'll re-open if the problem re-appears.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.