Bug 106379 - [CI] igt@perf@oa-formats - dmesg-warn - BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1342
Summary: [CI] igt@perf@oa-formats - dmesg-warn - BUG: sleeping function called from in...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-03 13:51 UTC by Martin Peres
Modified: 2018-06-04 14:31 UTC (History)
1 user (show)

See Also:
i915 platform: HSW
i915 features: Perf/OA


Attachments

Description Martin Peres 2018-05-03 13:51:29 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_28/fi-hsw-4770/igt@perf@oa-formats.html

[   38.237985] [drm] Skipping spurious, invalid OA report
[   38.237986] [drm] Skipping spurious, invalid OA report
[   38.237987] [drm] Skipping spurious, invalid OA report
[   38.237991] [drm] Skipping spurious, invalid OA report
[   38.238251] BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1342
[   38.238365] in_atomic(): 0, irqs_disabled(): 1, pid: 209, name: systemd-journal
[   38.238383] 4 locks held by systemd-journal/209:
[   38.238384]  #0: 00000000030d6f36 (&mm->mmap_sem){++++}, at: __do_page_fault+0x116/0x590
[   38.238396]  #1: 0000000054155f5b (sb_pagefaults){.+.+}, at: ext4_page_mkwrite+0x56/0x4f0
[   38.238404]  #2: 000000003e29ab5e (&ei->i_mmap_sem){++++}, at: ext4_page_mkwrite+0x6a/0x4f0
[   38.238411]  #3: 00000000030d6f36 (&mm->mmap_sem){++++}, at: __do_page_fault+0x116/0x590
[   38.238419] irq event stamp: 955744
[   38.238423] hardirqs last  enabled at (955743): [<ffffffffb794769c>] _raw_spin_unlock_irqrestore+0x4c/0x60
[   38.238426] hardirqs last disabled at (955744): [<ffffffffb71f9c49>] __slab_alloc.isra.27.constprop.33+0x19/0x70
[   38.238428] softirqs last  enabled at (955292): [<ffffffffb7c0032b>] __do_softirq+0x32b/0x4e1
[   38.238431] softirqs last disabled at (955271): [<ffffffffb708f6c4>] irq_exit+0xa4/0xb0
[   38.238434] CPU: 5 PID: 209 Comm: systemd-journal Not tainted 4.17.0-rc3-g1d2a421b1f9b-drmtip_28+ #1
[   38.238435] Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013
[   38.238436] Call Trace:
[   38.238441]  dump_stack+0x67/0x9b
[   38.238445]  ___might_sleep+0x167/0x250
[   38.238448]  __do_page_fault+0x133/0x590
[   38.238454]  page_fault+0x1e/0x30
[   38.238456] RIP: 0010:deactivate_slab.isra.26+0x1bb/0x8d0
[   38.238458] RSP: 0000:ffffb6ad803b7990 EFLAGS: 00010086
[   38.238460] RAX: 0000000080000000 RBX: 000000000005099d RCX: 0000000000000001
[   38.238463] RDX: 0000000080000001 RSI: 0000000000000070 RDI: 00000000ffffffff
[   38.238464] RBP: ffffb6ad803b7a90 R08: ffff96a3deb66c00 R09: ffff96a3924203e0
[   38.238465] R10: ffffb6ad803b7ab0 R11: 0000000000000000 R12: fffff7ef8f490800
[   38.238467] R13: ffff96a392421138 R14: ffff96a3ccd89a40 R15: 000000018025001d
[   38.238475]  ? __kernel_text_address+0x9/0x30
[   38.238479]  ? __save_stack_trace+0x8d/0xf0
[   38.238486]  ? alloc_buffer_head+0x18/0x80
[   38.238488]  ? set_track+0x90/0x140
[   38.238490]  ? init_object+0x66/0x80
[   38.238494]  ? ___slab_alloc.constprop.34+0x232/0x3e0
[   38.238496]  ___slab_alloc.constprop.34+0x232/0x3e0
[   38.238498]  ? alloc_buffer_head+0x18/0x80
[   38.238502]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
[   38.238506]  ? trace_hardirqs_on_caller+0xe0/0x1b0
[   38.238511]  ? alloc_buffer_head+0x18/0x80
[   38.238513]  ? __slab_alloc.isra.27.constprop.33+0x3d/0x70
[   38.238515]  __slab_alloc.isra.27.constprop.33+0x3d/0x70
[   38.238518]  ? alloc_buffer_head+0x18/0x80
[   38.238520]  kmem_cache_alloc+0x234/0x2c0
[   38.238524]  alloc_buffer_head+0x18/0x80
[   38.238526]  alloc_page_buffers+0x6b/0xb0
[   38.238530]  create_empty_buffers+0x14/0x100
[   38.238533]  create_page_buffers+0x47/0x50
[   38.238535]  __block_write_begin_int+0x89/0x590
[   38.238538]  ? ext4_inode_attach_jinode.part.18+0xa0/0xa0
[   38.238542]  ? ext4_inode_attach_jinode.part.18+0xa0/0xa0
[   38.238545]  block_page_mkwrite+0xab/0xf0
[   38.238548]  ext4_page_mkwrite+0x3d4/0x4f0
[   38.238554]  do_page_mkwrite+0x2c/0xa0
[   38.238557]  do_wp_page+0x1fc/0x4b0
[   38.238561]  __handle_mm_fault+0x7c4/0xe20
[   38.238569]  handle_mm_fault+0x196/0x3a0
[   38.238575]  __do_page_fault+0x295/0x590
[   38.238581]  ? page_fault+0x8/0x30
[   38.238599]  page_fault+0x1e/0x30
[   38.238601] RIP: 0033:0x7f09cbd2db00
[   38.238604] RSP: 002b:00007ffcbe7a26f0 EFLAGS: 00010246
[   38.238608] RAX: 00007f09c25bea00 RBX: 00005642d181da80 RCX: 00007f09cbdf7694
[   38.238610] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ffcbe7a2698
[   38.238612] RBP: 0000000000000024 R08: 00000000001e1ee0 R09: 0000000001afda00
[   38.238615] R10: 0000000000000000 R11: 0000000000000000 R12: 00005642d1817220
[   38.238617] R13: 00007ffcbe7a2728 R14: 00000000045e76f8 R15: 00007ffcbe7a2898
[   38.238631] BUG: unable to handle kernel paging request at 0000000000050a0d
[   38.238647] Oops: 0000 [#1] PREEMPT SMP PTI
[   38.238667] Dumping ftrace buffer:
[   38.238673] ---------------------------------
[   38.238745] modprobe-340     7.... 3354554us : intel_gpu_reset: engine_mask=ffffffff
[   38.238788] modprobe-340     4.... 3395312us : __i915_request_add: rcs0 fence 8:1
[   38.238830] modprobe-340     4d..1 3395364us : __i915_request_submit: rcs0 fence 8:1 -> global=1, current 0
[   38.238874] modprobe-340     4.... 3395472us : __i915_request_add: bcs0 fence 9:1
[   38.238915] modprobe-340     4d..1 3395477us : __i915_request_submit: bcs0 fence 9:1 -> global=1, current 0
[   38.238959] modprobe-340     4.... 3395494us : __i915_request_add: vcs0 fence a:1
[   38.239011] modprobe-340     4d..1 3395497us : __i915_request_submit: vcs0 fence a:1 -> global=1, current 0
[   38.239077] modprobe-340     4.... 3395514us : __i915_request_add: vecs0 fence e:1
[   38.239106] modprobe-340     4d..1 3395517us : __i915_request_submit: vecs0 fence e:1 -> global=1, current 0
[   38.239138] modprobe-340     4.... 3395524us : i915_request_retire: rcs0 fence 8:1, global=1, current 1
[   38.239196] modprobe-340     4.... 3395612us : i915_request_retire: bcs0 fence 9:1, global=1, current 1
[   38.239225] modprobe-340     4.... 3395620us : i915_request_retire: vcs0 fence a:1, global=1, current 1
[   38.239255] modprobe-340     4.... 3395628us : i915_request_retire: vecs0 fence e:1, global=1, current 1
[   38.239285] modprobe-340     4.... 3395652us : __i915_request_add: rcs0 fence 8:2
[   38.239315] modprobe-340     4d..1 3395656us : __i915_request_submit: rcs0 fence 8:2 -> global=2, current 1
[   38.239346] modprobe-340     4.... 3395673us : __i915_request_add: bcs0 fence 9:2
[   38.239375] modprobe-340     4d..1 3395677us : __i915_request_submit: bcs0 fence 9:2 -> global=2, current 1
[   38.239406] modprobe-340     4.... 3395694us : __i915_request_add: vcs0 fence a:2
[   38.239435] modprobe-340     4d..1 3395697us : __i915_request_submit: vcs0 fence a:2 -> global=2, current 1
[   38.239466] modprobe-340     4.... 3395713us : __i915_request_add: vecs0 fence e:2
[   38.239509] modprobe-340     4d..1 3395717us : __i915_request_submit: vecs0 fence e:2 -> global=2, current 1
[   38.239541] modprobe-340     4.... 3395732us : i915_request_retire: rcs0 fence 8:2, global=2, current 2
[   38.239571] modprobe-340     4.... 3395740us : i915_request_retire: bcs0 fence 9:2, global=2, current 2
[   38.239601] modprobe-340     4.... 3395748us : i915_request_retire: vcs0 fence a:2, global=2, current 2
[   38.239632] modprobe-340     4.... 3395756us : i915_request_retire: vecs0 fence e:2, global=2, current 2
[   38.239663] kms_vbla-1243    2.... 27926323us : __i915_request_add: rcs0 fence 8:3
[   38.239693] kms_vbla-1243    2d..1 27926350us : __i915_request_submit: rcs0 fence 8:3 -> global=3, current 2
[   38.239718] kworker/-70      2.... 37837207us : i915_reset: flags=2380000000000003
[   38.239750] kworker/-70      2.... 37837609us : intel_gpu_reset: engine_mask=ffffffff
[   38.239780] kworker/-70      2.... 37837764us : __i915_request_add: bcs0 fence 9:3
[   38.239811] kworker/-70      2d..1 37837787us : __i915_request_submit: bcs0 fence 9:3 -> global=3, current 2
[   38.239842] kworker/-70      2.... 37837849us : __i915_request_add: vcs0 fence a:3
[   38.239872] kworker/-70      2d..1 37837852us : __i915_request_submit: vcs0 fence a:3 -> global=3, current 2
[   38.239903] kworker/-70      2.... 37837866us : __i915_request_add: vecs0 fence e:3
[   38.239933] kworker/-70      2d..1 37837869us : __i915_request_submit: vecs0 fence e:3 -> global=3, current 2
[   38.239964] kms_vbla-1243    2.... 37849731us : i915_request_retire: rcs0 fence 8:3, global=3, current 3
[   38.239995] kms_vbla-1243    2.... 37849866us : i915_request_retire: bcs0 fence 9:3, global=3, current 3
[   38.240041] kms_vbla-1243    2.... 37849878us : i915_request_retire: vcs0 fence a:3, global=3, current 3
[   38.240087] kms_vbla-1243    2.... 37849883us : i915_request_retire: vecs0 fence e:3, global=3, current 3
[   38.240095] ---------------------------------
[   38.240114] Modules linked in: snd_hda_codec_hdmi i915 snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel x86_pkg_temp_thermal intel_powerclamp snd_hda_codec snd_hwdep coretemp snd_hda_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_pcm mei_me e1000e prime_numbers lpc_ich mei
[   38.240156] CPU: 5 PID: 209 Comm: systemd-journal Tainted: G        W         4.17.0-rc3-g1d2a421b1f9b-drmtip_28+ #1
[   38.240168] Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013
[   38.240178] RIP: 0010:deactivate_slab.isra.26+0x1bb/0x8d0
[   38.240185] RSP: 0000:ffffb6ad803b7990 EFLAGS: 00010086
[   38.240193] RAX: 0000000080000000 RBX: 000000000005099d RCX: 0000000000000001
[   38.240214] RDX: 0000000080000001 RSI: 0000000000000070 RDI: 00000000ffffffff
[   38.240223] RBP: ffffb6ad803b7a90 R08: ffff96a3deb66c00 R09: ffff96a3924203e0
[   38.240232] R10: ffffb6ad803b7ab0 R11: 0000000000000000 R12: fffff7ef8f490800
[   38.240240] R13: ffff96a392421138 R14: ffff96a3ccd89a40 R15: 000000018025001d
[   38.240250] FS:  00007f09cc6c1940(0000) GS:ffff96a3deb40000(0000) knlGS:0000000000000000
[   38.240260] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   38.240267] CR2: 0000000000050a0d CR3: 0000000408e48004 CR4: 00000000001606e0
[   38.240275] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   38.240284] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   38.240292] Call Trace:
[   38.240298]  ? __kernel_text_address+0x9/0x30
[   38.240306]  ? __save_stack_trace+0x8d/0xf0
[   38.240328]  ? alloc_buffer_head+0x18/0x80
[   38.240334]  ? set_track+0x90/0x140
[   38.240340]  ? init_object+0x66/0x80
[   38.240348]  ? ___slab_alloc.constprop.34+0x232/0x3e0
[   38.240370]  ___slab_alloc.constprop.34+0x232/0x3e0
[   38.240377]  ? alloc_buffer_head+0x18/0x80
[   38.240385]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
[   38.240407]  ? trace_hardirqs_on_caller+0xe0/0x1b0
[   38.240416]  ? alloc_buffer_head+0x18/0x80
[   38.240440]  ? __slab_alloc.isra.27.constprop.33+0x3d/0x70
[   38.240448]  __slab_alloc.isra.27.constprop.33+0x3d/0x70
[   38.240456]  ? alloc_buffer_head+0x18/0x80
[   38.240463]  kmem_cache_alloc+0x234/0x2c0
[   38.240470]  alloc_buffer_head+0x18/0x80
[   38.240489]  alloc_page_buffers+0x6b/0xb0
[   38.240496]  create_empty_buffers+0x14/0x100
[   38.240503]  create_page_buffers+0x47/0x50
[   38.240526]  __block_write_begin_int+0x89/0x590
[   38.240533]  ? ext4_inode_attach_jinode.part.18+0xa0/0xa0
[   38.240542]  ? ext4_inode_attach_jinode.part.18+0xa0/0xa0
[   38.240551]  block_page_mkwrite+0xab/0xf0
[   38.240559]  ext4_page_mkwrite+0x3d4/0x4f0
[   38.240569]  do_page_mkwrite+0x2c/0xa0
[   38.240576]  do_wp_page+0x1fc/0x4b0
[   38.240583]  __handle_mm_fault+0x7c4/0xe20
[   38.240593]  handle_mm_fault+0x196/0x3a0
[   38.240602]  __do_page_fault+0x295/0x590
[   38.240610]  ? page_fault+0x8/0x30
[   38.240617]  page_fault+0x1e/0x30
[   38.240623] RIP: 0033:0x7f09cbd2db00
[   38.240629] RSP: 002b:00007ffcbe7a26f0 EFLAGS: 00010246
[   38.240638] RAX: 00007f09c25bea00 RBX: 00005642d181da80 RCX: 00007f09cbdf7694
[   38.240647] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ffcbe7a2698
[   38.240656] RBP: 0000000000000024 R08: 00000000001e1ee0 R09: 0000000001afda00
[   38.240664] R10: 0000000000000000 R11: 0000000000000000 R12: 00005642d1817220
[   38.240672] R13: 00007ffcbe7a2728 R14: 00000000045e76f8 R15: 00007ffcbe7a2898
[   38.240683] Code: 24 00 bf 01 00 00 00 e8 34 d6 eb ff 65 8b 05 0d c9 e1 48 85 c0 0f 84 cc 06 00 00 41 8b 76 20 48 8b 9d 38 ff ff ff 4d 8b 6c 24 10 <48> 8b 04 33 48 85 c0 74 44 48 89 85 38 ff ff ff e9 d7 fe ff ff 
[   38.240756] RIP: deactivate_slab.isra.26+0x1bb/0x8d0 RSP: ffffb6ad803b7990
[   38.240764] CR2: 0000000000050a0d
[   38.240770] ---[ end trace d79f283c7f9f3dc4 ]---
[   38.264900] systemd-journald[1265]: File /var/log/journal/9f5a9ec0d2de4f609f24d0845af7c92f/system.journal corrupted or uncleanly shut down, renaming and replacing.
[   38.282867] [drm] 256567 spurious OA report notices suppressed due to ratelimiting
[   38.319349] [drm] Skipping spurious, invalid OA report
[   38.319360] [drm] Skipping spurious, invalid OA report
[   38.319366] [drm] Skipping spurious, invalid OA report
[   38.319372] [drm] Skipping spurious, invalid OA report
[   38.319376] [drm] Skipping spurious, invalid OA report
[   38.319380] [drm] Skipping spurious, invalid OA report
[   38.319384] [drm] Skipping spurious, invalid OA report
[   38.319388] [drm] Skipping spurious, invalid OA report
[   38.319395] [drm] Skipping spurious, invalid OA report
[   38.319399] [drm] Skipping spurious, invalid OA report
[   38.399382] [drm] 815891 spurious OA report notices suppressed due to ratelimiting
Comment 1 Chris Wilson 2018-05-03 14:17:53 UTC
I get the horrible feeling we may have clobbered memory...

An old page now contains a freelist pointer of RBX: 000000000005099d which I suspect is a report-id.
Comment 2 Chris Wilson 2018-05-11 14:00:24 UTC
A kasan run has a good trace to indicate this is caused by the GPU, https://intel-gfx-ci.01.org/tree/drm-tip/kasan_34/fi-hsw-4770/dmesg11.log
Comment 3 Chris Wilson 2018-05-11 16:19:44 UTC
commit e896d29a548d04371ce746f7d02a8488ff93d812
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri May 11 14:52:07 2018 +0100

    drm/i915/oa: Check that OA is disabled before unpinning
    
    Before we unpin the buffer used for OA reports and return it to the
    system, we need to be sure that the HW has finished writing into it.
    For lack of a better idea, poll OACONTROL to check it is switched off.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106379
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
    Cc: Matthew Auld <matthew.auld@intel.com>
    Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
    Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180511135207.12880-1-chris@chris-wilson.co.uk
Comment 4 Martin Peres 2018-05-22 20:32:21 UTC
(In reply to Chris Wilson from comment #3)
> commit e896d29a548d04371ce746f7d02a8488ff93d812
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri May 11 14:52:07 2018 +0100
> 
>     drm/i915/oa: Check that OA is disabled before unpinning
>     
>     Before we unpin the buffer used for OA reports and return it to the
>     system, we need to be sure that the HW has finished writing into it.
>     For lack of a better idea, poll OACONTROL to check it is switched off.
>     
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106379
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>     Cc: Matthew Auld <matthew.auld@intel.com>
>     Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>     Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
>     Link:
> https://patchwork.freedesktop.org/patch/msgid/20180511135207.12880-1-
> chris@chris-wilson.co.uk

Seems like it did the trick as it has not been seen in quite some time.
Comment 5 Jani Saarinen 2018-06-04 14:31:37 UTC
Closing then.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.