Bug 106085

Summary: [CI] igt@drv_selftest@mock_breadcrumbs - breadcrumbs returned 10000, conflicting with selftest's magic values!
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: Francesco Balestrieri <francesco.balestrieri>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: highest CC: intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: ALL i915 features: GEM/Other
Bug Depends on:    
Bug Blocks: 106319    

Description Martin Peres 2018-04-16 18:01:54 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4055/shard-snb1/igt@drv_selftest@mock_breadcrumbs.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4055/shard-kbl1/igt@drv_selftest@mock_breadcrumbs.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4055/shard-glkb4/igt@drv_selftest@mock_breadcrumbs.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4055/shard-glk2/igt@drv_selftest@mock_breadcrumbs.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4055/shard-apl5/igt@drv_selftest@mock_breadcrumbs.html

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4055/shard-hsw1/igt@drv_selftest@mock_breadcrumbs.html

[   53.726304] Setting dangerous option mock_selftests - tainting kernel
[   54.693605] Timed out waiting for 0 remaining waiters
[   54.913754] i915/intel_breadcrumbs_mock_selftests: igt_wakeup failed with error 10000
[   54.930617] ------------[ cut here ]------------
[   54.930620] breadcrumbs returned 10000, conflicting with selftest's magic values!
[   54.930682] WARNING: CPU: 3 PID: 1320 at drivers/gpu/drm/i915/selftests/i915_selftest.c:149 __run_selftests+0x13a/0x1b0 [i915]
[   54.930684] Modules linked in: i915(+) snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic btusb btrtl btbcm x86_pkg_temp_thermal intel_powerclamp btintel coretemp crct10dif_pclmul crc32_pclmul snd_hda_codec bluetooth snd_hwdep snd_hda_core ghash_clmulni_intel e1000e snd_pcm ecdh_generic mei_me mei prime_numbers [last unloaded: i915]
[   54.930713] CPU: 3 PID: 1320 Comm: drv_selftest Tainted: G     U            4.17.0-rc1-CI-CI_DRM_4055+ #1
[   54.930714] Hardware name:  /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
[   54.930752] RIP: 0010:__run_selftests+0x13a/0x1b0 [i915]
[   54.930753] RSP: 0018:ffffc900002c3c60 EFLAGS: 00010282
[   54.930756] RAX: 0000000000000000 RBX: ffffffffa0758370 RCX: 0000000000000001
[   54.930757] RDX: 0000000080000001 RSI: ffffffff820c21ce RDI: 00000000ffffffff
[   54.930758] RBP: ffffffffa0758448 R08: 0000000000000001 R09: 0000000000000001
[   54.930759] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   54.930760] R13: ffff880271370040 R14: ffff880271370040 R15: ffffffffa076d450
[   54.930762] FS:  00007f02609b3980(0000) GS:ffff88027ed80000(0000) knlGS:0000000000000000
[   54.930763] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   54.930764] CR2: 000055e2776f6140 CR3: 0000000271158004 CR4: 00000000003606e0
[   54.930765] Call Trace:
[   54.930767]  ? 0xffffffffa07cc000
[   54.930801]  i915_mock_selftests+0x27/0x50 [i915]
[   54.930834]  i915_init+0x7/0x68 [i915]
[   54.930836]  ? 0xffffffffa07cc000
[   54.930838]  do_one_initcall+0x9f/0x370
[   54.930842]  ? rcu_read_lock_sched_held+0x6f/0x80
[   54.930844]  ? kmem_cache_alloc_trace+0x264/0x2d0
[   54.930848]  do_init_module+0x56/0x1ea
[   54.930850]  load_module+0x2431/0x2e00
[   54.930853]  ? show_coresize+0x20/0x20
[   54.930862]  ? __se_sys_finit_module+0x95/0xe0
[   54.930864]  __se_sys_finit_module+0x95/0xe0
[   54.930870]  do_syscall_64+0x4f/0x180
[   54.930873]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   54.930874] RIP: 0033:0x7f0260062839
[   54.930876] RSP: 002b:00007ffc8935b2b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   54.930878] RAX: ffffffffffffffda RBX: 000055aad6e4dcf0 RCX: 00007f0260062839
[   54.930879] RDX: 0000000000000000 RSI: 000055aad6e522c0 RDI: 0000000000000004
[   54.930880] RBP: 000055aad6e522c0 R08: 0000000000000004 R09: 0000000000000000
[   54.930881] R10: 00007ffc8935b420 R11: 0000000000000246 R12: 0000000000000000
[   54.930882] R13: 000055aad6e51af0 R14: 0000000000000000 R15: 000000000000003f
[   54.930888] Code: 74 4e 4c 89 e7 ff 53 10 83 f8 fc 74 4b 83 f8 00 74 b9 7f 05 83 f8 e7 75 4f 48 8b 73 08 89 c2 48 c7 c7 98 6a 6e a0 e8 b6 37 a2 e0 <0f> 0b b8 ff ff ff ff eb 34 48 8b 53 08 48 c7 c6 44 d2 6b a0 48 
[   54.930980] WARNING: CPU: 3 PID: 1320 at drivers/gpu/drm/i915/selftests/i915_selftest.c:149 __run_selftests+0x13a/0x1b0 [i915]
[   54.930981] irq event stamp: 711186
[   54.930984] hardirqs last  enabled at (711185): [<ffffffff810f7bcf>] console_unlock+0x47f/0x650
[   54.930986] hardirqs last disabled at (711186): [<ffffffff81a0111c>] error_entry+0x7c/0x100
[   54.930987] softirqs last  enabled at (710182): [<ffffffff81c003a1>] __do_softirq+0x3a1/0x4aa
[   54.930989] softirqs last disabled at (710161): [<ffffffff8108b0f4>] irq_exit+0xa4/0xb0
[   54.930991] ---[ end trace c21e682f6ab1bf11 ]---
Comment 1 Martin Peres 2018-04-16 18:02:51 UTC
This comes with the new 4.17.0-rc1, so it is likely a core regression.
Comment 2 Chris Wilson 2018-04-17 15:27:02 UTC
For reference,

commit d224985a5e312ab05b624143a3fd9bb91b53e52a
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Thu Mar 15 11:41:39 2018 +0100

    sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API
    
    The old wait_on_atomic_t() is going to get removed, use the more
    flexible wait_var_event() API instead.
    
    Unlike wake_up_atomic_t(), wake_up_var() will issue the wakeup
    even if the variable is not 0.
    
    No change in functionality.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Daniel Vetter <daniel.vetter@intel.com>
    Cc: David Airlie <airlied@linux.ie>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

did impact upon mock_breadcrumbs. Whether that alone is the cause...
Comment 3 Chris Wilson 2018-04-17 15:46:57 UTC
Testing that commit does indicate it is the introduction of the error in the test (as opposed to a later issue with the sched/wait.c). Still more likely a bug in wait_var than that patch (afaict, since it is a very simple replacement).
Comment 4 Chris Wilson 2018-04-23 07:35:02 UTC
*** Bug 106184 has been marked as a duplicate of this bug. ***
Comment 5 Chris Wilson 2018-05-02 10:29:05 UTC
commit 77cbe925bf77bd3159f49c4db0ea89a2045d9071
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 17 18:06:38 2018 +0100

    drm/i915/selftests: Fix error checking for wait_var_timeout
    
    The old wait_on_atomic_t used a custom callback to perform the
    schedule(), which used my return semantics of reporting an error code on
    timeout. wait_var_event_timeout() uses the schedule() return semantics
    of reporting the remaining jiffies (1 if it timed out with 0 jiffies
    remaining!) and 0 on failure. This semantic mismatch lead to us falsely
    claiming a time out occurred.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106085
    Fixes: d224985a5e31 ("sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180417170638.20550-1-chris@chris-wilson.co.uk
Comment 6 Martin Peres 2018-05-03 16:34:48 UTC
Thanks!, It seems to be doing the trick! Closing :)

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.