Bug 108789 - [CI][BAT] igt@i915_selftest@live_sanitycheck - incomplete - general protection fault: 0000 [#1] PREEMPT SMP PTI
Summary: [CI][BAT] igt@i915_selftest@live_sanitycheck - incomplete - general protectio...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-19 14:42 UTC by Martin Peres
Modified: 2018-12-28 08:43 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Martin Peres 2018-11-19 14:42:18 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4720/fi-gdg-551/igt@i915_selftest@live_sanitycheck.html

<4> [491.786829] general protection fault: 0000 [#1] PREEMPT SMP PTI
<4> [491.786852] CPU: 0 PID: 6114 Comm: systemd-udevd Tainted: G     U            4.20.0-rc3-CI-CI_DRM_5159+ #1
<4> [491.786873] Hardware name: Dell Inc.                 OptiPlex GX280               /0G8310, BIOS A04 02/09/2005
<4> [491.786900] RIP: 0010:kernfs_find_ns+0x51/0x100
<4> [491.786913] Code: 20 41 0f 95 c7 85 c0 75 6d 4d 85 e4 0f 95 c0 44 38 f8 0f 85 88 00 00 00 4c 89 e6 4c 89 f7 e8 a6 f9 ff ff 89 c5 48 85 db 74 39 <3b> 6b 20 72 41 77 29 4c 3b 63 18 72 39 77 21 48 8b 73 f8 4c 89 f7
<4> [491.786948] RSP: 0018:ffffc900001a3b90 EFLAGS: 00010202
<4> [491.786962] RAX: 000000004684866e RBX: 0701200107012001 RCX: ffff888039f26ebe
<4> [491.786978] RDX: 0000000000000720 RSI: 000000004684866e RDI: 0000000000000006
<4> [491.786993] RBP: 000000004684866e R08: 0000000082c49ee6 R09: 0000000000000001
<4> [491.787009] R10: ffffc900001a3bc8 R11: ffffffff8226cde0 R12: 0000000000000000
<4> [491.787024] R13: ffff88803ca949b8 R14: ffff888039f26eb8 R15: 0000000000000000
<4> [491.787040] FS:  00007f6384e29680(0000) GS:ffff88803e000000(0000) knlGS:0000000000000000
<4> [491.787058] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [491.787071] CR2: 000055e0ef98a130 CR3: 000000002f97c000 CR4: 00000000000006f0
<4> [491.787087] Call Trace:
<4> [491.787100]  kernfs_iop_lookup+0x46/0xa0
<4> [491.787115]  __lookup_slow+0xfc/0x1d0
<4> [491.787135]  lookup_slow+0x30/0x50
<4> [491.787148]  walk_component+0x1ba/0x2d0
<4> [491.787161]  ? path_init+0x3db/0x510
<4> [491.787173]  ? getname_flags+0x2d/0x180
<4> [491.787185]  ? set_track+0x90/0x140
<4> [491.787199]  path_lookupat+0x69/0x200
<4> [491.787212]  ? ___slab_alloc.constprop.34+0x21c/0x380
<4> [491.787227]  ? ___slab_alloc.constprop.34+0x21c/0x380
<4> [491.787243]  filename_lookup+0xb1/0x140
<4> [491.787259]  ? getname_flags+0x2d/0x180
<4> [491.787274]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [491.787287]  ? kmem_cache_alloc+0x24d/0x280
<4> [491.787304]  ? do_readlinkat+0x58/0x110
<4> [491.787316]  do_readlinkat+0x58/0x110
<4> [491.787331]  __x64_sys_readlinkat+0x15/0x20
<4> [491.787344]  do_syscall_64+0x55/0x190
<4> [491.787359]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [491.787373] RIP: 0033:0x7f638493fd1a
<4> [491.787385] Code: 48 8b 0d 71 91 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 0b 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3e 91 2d 00 f7 d8 64 89 01 48
<4> [491.787421] RSP: 002b:00007fff1824ef48 EFLAGS: 00000202 ORIG_RAX: 000000000000010b
<4> [491.787440] RAX: ffffffffffffffda RBX: 000055e0ef98b9b0 RCX: 00007f638493fd1a
<4> [491.787457] RDX: 000055e0ef98b9b0 RSI: 00007fff1824efd0 RDI: 00000000ffffff9c
<4> [491.787473] RBP: 0000000000000064 R08: fefefefefefefeff R09: fefefeff6b6073ff
<4> [491.787489] R10: 0000000000000063 R11: 0000000000000202 R12: 00007fff1824efd0
<4> [491.787505] R13: 00000000ffffff9c R14: 00007fff1824efa0 R15: 0000000000000063
<4> [491.787527] Modules linked in: i915(+) amdgpu chash gpu_sched ttm snd_hda_codec snd_hwdep snd_hda_core snd_pcm vgem i2c_i801 lpc_ich tg3 prime_numbers [last unloaded: i915]
Comment 1 Chris Wilson 2018-11-19 14:46:18 UTC
I'm sure we've filed that stacktrace before. Superficially it's not our bug, the unlikely case would be it is some previously unknown memcorruption of ours.
Comment 2 Chris Wilson 2018-11-19 15:11:08 UTC
Speaking of memcorruption, what followed was:

<3> [491.818914] =============================================================================
<3> [491.818942] BUG kmalloc-16 (Tainted: G     UD          ): Padding overwritten. 0x00000000ee2a4830-0x000000006d9ceb26
<3> [491.818966] -----------------------------------------------------------------------------\x0a
<3> [491.818992] INFO: Slab 0x00000000fe774f07 objects=23 used=23 fp=0x          (null) flags=0x10200
<4> [491.819016] CPU: 0 PID: 6110 Comm: i915_selftest Tainted: G    BUD           4.20.0-rc3-CI-CI_DRM_5159+ #1
<4> [491.819037] Hardware name: Dell Inc.                 OptiPlex GX280               /0G8310, BIOS A04 02/09/2005
<4> [491.819058] Call Trace:
<4> [491.819074]  dump_stack+0x67/0x9b
<4> [491.819088]  slab_err+0xa8/0xd0
<4> [491.819104]  ? _raw_spin_unlock+0x29/0x40
<4> [491.819117]  ? get_partial_node.isra.29+0x1f1/0x460
<4> [491.819134]  slab_pad_check.part.11+0xe0/0x160
<4> [491.819291]  ? i915_pmu_register+0x17b/0x5c0 [i915]
<4> [491.819306]  check_slab+0x5c/0xb0
<4> [491.819318]  alloc_debug_processing+0x97/0x190
<4> [491.819333]  ___slab_alloc.constprop.34+0x355/0x380
<4> [491.819427]  ? i915_pmu_register+0x17b/0x5c0 [i915]
<4> [491.819444]  ? lock_acquire+0xa6/0x1c0
<4> [491.819541]  ? i915_pmu_register+0x17b/0x5c0 [i915]
<4> [491.819556]  ? __slab_alloc.isra.27.constprop.33+0x3d/0x70
<4> [491.819571]  __slab_alloc.isra.27.constprop.33+0x3d/0x70
<4> [491.819666]  ? i915_pmu_register+0x17b/0x5c0 [i915]
<4> [491.819680]  __kmalloc_track_caller+0x29c/0x2e0
<4> [491.819697]  kstrdup+0x28/0x50
<4> [491.819789]  i915_pmu_register+0x17b/0x5c0 [i915]
<4> [491.819883]  i915_driver_load+0x907/0x1550 [i915]
<4> [491.819900]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
<4> [491.819914]  ? lockdep_hardirqs_on+0xe0/0x1b0
<4> [491.820005]  i915_pci_probe+0x29/0xa0 [i915]
<4> [491.820021]  pci_device_probe+0xa1/0x130
<4> [491.820036]  really_probe+0xf3/0x3e0
<4> [491.820050]  driver_probe_device+0x10a/0x120
<4> [491.820063]  __driver_attach+0xdb/0x100
<4> [491.820075]  ? driver_probe_device+0x120/0x120
<4> [491.820089]  ? driver_probe_device+0x120/0x120
<4> [491.820102]  bus_for_each_dev+0x74/0xc0
<4> [491.820116]  bus_add_driver+0x15f/0x250
<4> [491.820128]  ? 0xffffffffa0974000
<4> [491.820140]  driver_register+0x56/0xe0
<4> [491.820152]  ? 0xffffffffa0974000
<4> [491.820163]  do_one_initcall+0x58/0x2e0
<4> [491.820177]  ? do_init_module+0x1d/0x1ea
<4> [491.820190]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [491.820203]  ? kmem_cache_alloc_trace+0x264/0x290
<4> [491.820219]  do_init_module+0x56/0x1ea
<4> [491.820232]  load_module+0x2714/0x29f0
<4> [491.820256]  ? __se_sys_finit_module+0xd3/0xf0
<4> [491.820268]  __se_sys_finit_module+0xd3/0xf0
<4> [491.820287]  do_syscall_64+0x55/0x190
<4> [491.820300]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [491.820314] RIP: 0033:0x7f2b17b62839
<4> [491.820327] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
<4> [491.820364] RSP: 002b:00007ffc4d7a24e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4> [491.820383] RAX: ffffffffffffffda RBX: 0000564b16aa6b50 RCX: 00007f2b17b62839
<4> [491.820398] RDX: 0000000000000000 RSI: 0000564b16aab280 RDI: 0000000000000006
<4> [491.820414] RBP: 0000564b16aab280 R08: 0000000000000004 R09: 0000000000000000
<4> [491.820430] R10: 00007ffc4d7a2660 R11: 0000000000000246 R12: 0000000000000000
<4> [491.820445] R13: 0000564b16aa1700 R14: 0000000000000020 R15: 000000000000003f
<3> [491.820469] Padding 00000000ee2a4830: 01 20 01 07 01 20 01 07 01 20 01 07 01 20 01 07  . ... ... ... ..
<3> [491.820490] Padding 00000000576e7a7d: 01 20 01 07 01 20 01 07 01 20 01 07 01 20 01 07  . ... ... ... ..
<3> [491.820511] Padding 0000000068eccb55: 01 20 01 07 01 20 01 07 01 20 01 07 01 20 01 07  . ... ... ... ..
<3> [491.820531] Padding 000000003bb3f0f3: 01 20 01 07 01 20 01 07 01 20 01 07 01 20 01 07  . ... ... ... ..
<3> [491.820551] Padding 000000000fa445d6: 01 20 01 07 01 20 01 07 01 20 01 07 01 20 01 07  . ... ... ... ..
<3> [491.820572] Padding 000000009589c07d: 01 20 01 07 01 20 01 07 01 20 01 07 01 20 01 07  . ... ... ... ..
<3> [491.820593] FIX kmalloc-16: Restoring 0x00000000ee2a4830-0x000000006d9ceb26=0x5a\x0a
<3> [491.820614] =============================================================================
<3> [491.820632] BUG kmalloc-16 (Tainted: G    BUD          ): Redzone overwritten
<3> [491.820647] -----------------------------------------------------------------------------\x0a
<3> [491.820670] INFO: 0x00000000d994a43d-0x00000000b6634830. First byte 0x1 instead of 0xbb
<3> [491.820689] INFO: Allocated in 0x701200107012001 age=17942024255977933694 cpu=117514241 pid=117514241
<3> [491.820710] \x090x701200107012001
<3> [491.820719] \x090x701200107012001
<3> [491.820728] \x090x701200107012001
<3> [491.820737] \x090x701200107012001
<3> [491.820747] \x090x701200107012001
<3> [491.820756] \x090x701200107012001
<3> [491.820765] \x090x701200107012001
<3> [491.820774] \x090x701200107012001
<3> [491.820783] \x090x701200107012001
<3> [491.820792] \x090x701200107012001
<3> [491.820801] \x090x701200107012001
<3> [491.820810] \x090x701200107012001
<3> [491.820819] \x090x701200107012001
<3> [491.820828] \x090x701200107012001
<3> [491.820837] \x090x701200107012001
<3> [491.820846] \x090x701200107012001
<3> [491.820857] INFO: Freed in 0x701200107012001 age=17942024255977933694 cpu=117514241 pid=117514241
<3> [491.820876] \x090x701200107012001
<3> [491.820885] \x090x701200107012001
<3> [491.820894] \x090x701200107012001
<3> [491.820902] \x090x701200107012001
<3> [491.820911] \x090x701200107012001
<3> [491.820920] \x090x701200107012001
<3> [491.820929] \x090x701200107012001
<3> [491.820938] \x090x701200107012001
<3> [491.820947] \x090x701200107012001
<3> [491.820956] \x090x701200107012001
<3> [491.820965] \x090x701200107012001
<3> [491.820974] \x090x701200107012001
<3> [491.820982] \x090x701200107012001
<3> [491.820991] \x090x701200107012001
<3> [491.821000] \x090x701200107012001
<3> [491.821009] \x090x701200107012001
<3> [491.821019] INFO: Slab 0x00000000fe774f07 objects=23 used=23 fp=0x          (null) flags=0x10200
<3> [491.821039] INFO: Object 0x00000000edaed870 @offset=3880 fp=0x0000000044883889\x0a
<3> [491.821060] Redzone 00000000d994a43d: 01 20 01 07 01 20 01 07                          . ... ..
<3> [491.821080] Object 00000000edaed870: 01 20 01 07 01 20 01 07 01 20 01 07 01 20 01 07  . ... ... ... ..
<3> [491.821103] Redzone 00000000662a40fe: 01 20 01 07 01 20 01 07                          . ... ..
<3> [491.821125] Padding 000000008d5fd6ed: 01 20 01 07 01 20 01 07                          . ... ..
<4> [491.821146] CPU: 0 PID: 6110 Comm: i915_selftest Tainted: G    BUD           4.20.0-rc3-CI-CI_DRM_5159+ #1
<4> [491.821166] Hardware name: Dell Inc.                 OptiPlex GX280               /0G8310, BIOS A04 02/09/2005
<4> [491.821186] Call Trace:
<4> [491.821197]  dump_stack+0x67/0x9b
<4> [491.821210]  check_bytes_and_report+0xbd/0x100
<4> [491.821225]  check_object+0x184/0x280
<4> [491.821319]  ? i915_pmu_register+0x17b/0x5c0 [i915]
<4> [491.821333]  alloc_debug_processing+0x183/0x190
<4> [491.821348]  ___slab_alloc.constprop.34+0x355/0x380
<4> [491.821440]  ? i915_pmu_register+0x17b/0x5c0 [i915]
<4> [491.821456]  ? lock_acquire+0xa6/0x1c0
<4> [491.821548]  ? i915_pmu_register+0x17b/0x5c0 [i915]
<4> [491.821562]  ? __slab_alloc.isra.27.constprop.33+0x3d/0x70
<4> [491.821577]  __slab_alloc.isra.27.constprop.33+0x3d/0x70
<4> [491.821671]  ? i915_pmu_register+0x17b/0x5c0 [i915]
<4> [491.821685]  __kmalloc_track_caller+0x29c/0x2e0
<4> [491.821701]  kstrdup+0x28/0x50
<4> [491.821791]  i915_pmu_register+0x17b/0x5c0 [i915]
<4> [491.821884]  i915_driver_load+0x907/0x1550 [i915]
<4> [491.821900]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
<4> [491.821915]  ? lockdep_hardirqs_on+0xe0/0x1b0
<4> [491.822005]  i915_pci_probe+0x29/0xa0 [i915]
<4> [491.822019]  pci_device_probe+0xa1/0x130
<4> [491.822033]  really_probe+0xf3/0x3e0
<4> [491.822046]  driver_probe_device+0x10a/0x120
<4> [491.822060]  __driver_attach+0xdb/0x100
<4> [491.822072]  ? driver_probe_device+0x120/0x120
<4> [491.822085]  ? driver_probe_device+0x120/0x120
<4> [491.822098]  bus_for_each_dev+0x74/0xc0
<4> [491.822112]  bus_add_driver+0x15f/0x250
<4> [491.822124]  ? 0xffffffffa0974000
<4> [491.822135]  driver_register+0x56/0xe0
<4> [491.822147]  ? 0xffffffffa0974000
<4> [491.822157]  do_one_initcall+0x58/0x2e0
<4> [491.822170]  ? do_init_module+0x1d/0x1ea
<4> [491.822182]  ? rcu_read_lock_sched_held+0x6f/0x80
<4> [491.822195]  ? kmem_cache_alloc_trace+0x264/0x290
<4> [491.822211]  do_init_module+0x56/0x1ea
<4> [491.822224]  load_module+0x2714/0x29f0
<4> [491.822247]  ? __se_sys_finit_module+0xd3/0xf0
<4> [491.822260]  __se_sys_finit_module+0xd3/0xf0
<4> [491.822278]  do_syscall_64+0x55/0x190
<4> [491.822291]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [491.822305] RIP: 0033:0x7f2b17b62839
<4> [491.822315] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
<4> [491.822351] RSP: 002b:00007ffc4d7a24e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4> [491.822370] RAX: ffffffffffffffda RBX: 0000564b16aa6b50 RCX: 00007f2b17b62839
<4> [491.822385] RDX: 0000000000000000 RSI: 0000564b16aab280 RDI: 0000000000000006
<4> [491.822401] RBP: 0000564b16aab280 R08: 0000000000000004 R09: 0000000000000000
<4> [491.822417] R10: 00007ffc4d7a2660 R11: 0000000000000246 R12: 0000000000000000
<4> [491.822432] R13: 0000564b16aa1700 R14: 0000000000000020 R15: 000000000000003f
<3> [491.822456] FIX kmalloc-16: Restoring 0x00000000d994a43d-0x00000000b6634830=0xbb\x0a
<3> [491.822476] FIX kmalloc-16: Marking all objects used
Comment 3 Chris Wilson 2018-11-19 19:08:47 UTC
commit dafdf69736d66075836b7bc291584cd0889e7601 (HEAD -> topic/core-for-CI, drm-intel/topic/core-for-CI)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri May 12 12:45:25 2017 +0100

    perf/core: Avoid removing shared pmu_context on unregister
    
    In commit 1fd7e4169954 ("perf/core: Remove perf_cpu_context::unique_pmu"),
    the search for another user of the pmu_cpu_context was removed, and so
    we unconditionally free it during perf_pmu_unregister. This leads to
    random corruption later and a BUG at mm/percpu.c:689.
    
    v2: Check for shared pmu_contexts under the mutex.
    
    Fixes: 1fd7e4169954 ("perf/core: Remove perf_cpu_context::unique_pmu")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: David Carrillo-Cisneros <davidcc@google.com>
    Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: <stable@vger.kernel.org> # v4.11+
    Link: http://patchwork.freedesktop.org/patch/msgid/20170512114525.17575-1-chris@chris-wilson.co.uk
Comment 4 Francesco Balestrieri 2018-12-28 08:43:44 UTC
No more occurrences, closing.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.