Bug 109823 - [CI][DRMTIP] igt@gem_ppgtt@blt-vs-render-ctx0 - dmesg-fail - BUG: unable to handle kernel paging request at ffffa08d71663ee0
Summary: [CI][DRMTIP] igt@gem_ppgtt@blt-vs-render-ctx0 - dmesg-fail - BUG: unable to h...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-04 08:06 UTC by Lakshmi
Modified: 2019-04-17 14:31 UTC (History)
1 user (show)

See Also:
i915 platform: ICL
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lakshmi 2019-03-04 08:06:47 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_234/fi-icl-y/igt@gem_ppgtt@blt-vs-render-ctx0.html

<6> [473.720154] Console: switching to colour dummy device 80x25
<6> [473.720221] [IGT] gem_ppgtt: executing
<6> [473.735629] [IGT] gem_ppgtt: starting subtest blt-vs-render-ctx0
<1> [477.207645] BUG: unable to handle kernel paging request at ffffa08d71663ee0
<1> [477.207660] #PF error: [normal kernel read fault]
<6> [477.207667] PGD 0 P4D 0 
<4> [477.207674] Oops: 0000 [#1] PREEMPT SMP PTI
<4> [477.207684] CPU: 1 PID: 1584 Comm: gem_ppgtt Tainted: G     U            5.0.0-rc8-g348fd7b7fa65-drmtip_234+ #1
<4> [477.207703] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019
<4> [477.207721] RIP: 0010:xas_start+0x71/0x1d0
<4> [477.207731] Code: ff ff e8 62 47 79 ff 85 c0 0f 84 43 01 00 00 5b 48 89 e8 5d 41 5c 41 5d c3 48 83 fa 02 75 08 48 3d 05 c0 ff ff 77 6e 4c 8b 2b <4d> 8b 65 40 e8 d6 35 7b ff 85 c0 4c 89 e5 74 0d 80 3d 31 0e 99 00
<4> [477.207763] RSP: 0000:ffffa60a8055fc90 EFLAGS: 00010202
<4> [477.207770] RAX: 0000000000000003 RBX: ffffa60a8055fcd8 RCX: 1137aa9900000000
<4> [477.207779] RDX: 0000000000000003 RSI: ffffffff9d084383 RDI: ffffa60a8055fcd8
<4> [477.207787] RBP: ffffa60a8055fcd8 R08: 0000000025ff6d19 R09: 0000000000000002
<4> [477.207798] R10: ffffa60a8055fc60 R11: ffffffff9d24c100 R12: 00000000ffffffff
<4> [477.207807] R13: ffffa08d71663ea0 R14: 0003feffffffffff R15: ffffffff9d2603c0
<4> [477.207815] FS:  00007fad786ed980(0000) GS:ffff8a131be80000(0000) knlGS:0000000000000000
<4> [477.207824] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [477.207831] CR2: ffffa08d71663ee0 CR3: 0000000210b6c005 CR4: 0000000000760ee0
<4> [477.207842] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4> [477.207852] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4> [477.207866] PKRU: 55555554
<4> [477.207870] Call Trace:
<4> [477.207881]  xas_load+0xa/0x50
<4> [477.207890]  find_get_entry+0x9e/0x2f0
<4> [477.207899]  pagecache_get_page+0x2b/0x210
<4> [477.207909]  ? __lock_acquire+0x3c7/0x1b00
<4> [477.207917]  lookup_swap_cache+0x3b/0x100
<4> [477.207925]  do_swap_page+0x86/0x950
<4> [477.207933]  ? file_update_time+0x5b/0x120
<4> [477.207943]  __handle_mm_fault+0x66a/0xfa0
<4> [477.207954]  handle_mm_fault+0x196/0x3a0
<4> [477.207969]  __do_page_fault+0x246/0x500
<4> [477.207978]  ? page_fault+0x8/0x30
<4> [477.207985]  page_fault+0x1e/0x30
<4> [477.207993] RIP: 0033:0x5609b992d5b4
<4> [477.208000] Code: 40 18 48 89 45 f8 c7 45 e4 00 00 00 00 e9 f8 00 00 00 8b 45 e4 48 98 48 8d 14 c5 00 00 00 00 48 8b 45 f8 48 01 c2 48 8b 45 e8 <48> 89 02 8b 45 e4 48 98 48 83 c0 01 48 8d 14 c5 00 00 00 00 48 8b
<4> [477.208002] ------------[ cut here ]------------
<4> [477.208022] RSP: 002b:00007ffe67e3dcc0 EFLAGS: 00010206
<4> [477.208030] downgrading a read lock
<4> [477.208038] WARNING: CPU: 0 PID: 1585 at kernel/locking/lockdep.c:3553 lock_downgrade+0x158/0x1e0
<4> [477.208051] RAX: 0006013e0006013e RBX: 00007ffe67e3de98 RCX: 0000000000000000
<4> [477.208054] Modules linked in: vgem snd_hda_codec_hdmi mei_hdcp snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm ax88179_178a usbnet e1000e mii ptp pps_core i2c_i801 mei_me mei prime_numbers
<4> [477.208076] RDX: 00007fad6b8e1000 RSI: 0000000000000000 RDI: 00005609b9b610b8
<4> [477.208096] CPU: 0 PID: 1585 Comm: gem_ppgtt Tainted: G     U            5.0.0-rc8-g348fd7b7fa65-drmtip_234+ #1
<4> [477.208103] RBP: 00007ffe67e3dcf0 R08: 0000000000000000 R09: 00007fad6b8d4000
<4> [477.208110] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019
<4> [477.208121] R10: 00005609b9b31010 R11: 0000000000000246 R12: 00005609b992d390
<4> [477.208130] RIP: 0010:lock_downgrade+0x158/0x1e0
<4> [477.208133] Code: ff e9 23 ff ff ff 4c 89 ea 4c 89 f6 48 89 df e8 2e bd ff ff 85 c0 74 aa eb 9a 48 c7 c7 e2 d0 06 9d 48 89 04 24 e8 b8 bf f9 ff <0f> 0b 8b 54 24 0c 48 8b 04 24 e9 46 ff ff ff e8 04 5d 39 00 85 c0
<4> [477.208143] R13: 00007ffe67e3dfd0 R14: 0000000000000000 R15: 0000000000000000
<4> [477.208148] RSP: 0018:ffffa60a805f7e38 EFLAGS: 00010082
<4> [477.208170] Modules linked in: vgem snd_hda_codec_hdmi mei_hdcp snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul
<4> [477.208178] RAX: 0000000000000000 RBX: ffff8a1294314040 RCX: 0000000000000001
<4> [477.208179]  ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm ax88179_178a usbnet e1000e mii ptp pps_core i2c_i801 mei_me mei prime_numbers
<4> [477.208195] RDX: ffffffff9c127c8b RSI: 0000000000000001 RDI: ffffffff9c127ca0
<4> [477.208197] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
<0> [477.208217] Dumping ftrace buffer:
<0> [477.208219] ---------------------------------
<4> [477.208223] R10: 0000000000000000 R11: ffffa60a805f7c70 R12: 0000000000000246
<4> [477.208225] R13: ffffffff9c205e93 R14: ffff8a1308071708 R15: ffff8a12a7a172d0
<4> [477.208244] FS:  00007fad786ed980(0000) GS:ffff8a131be00000(0000) knlGS:0000000000000000
<4> [477.208250] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [477.208255] CR2: 00007fad6b9d3000 CR3: 0000000213c46003 CR4: 0000000000760ef0
<4> [477.208261] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4> [477.208267] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4> [477.208272] PKRU: 55555554
<4> [477.208275] Call Trace:
<4> [477.208282]  downgrade_write+0x12/0x80
<4> [477.208288]  __do_munmap+0x393/0x400
<4> [477.208294]  __vm_munmap+0x6e/0xc0
<4> [477.208300]  __x64_sys_munmap+0x12/0x20
<4> [477.208306]  do_syscall_64+0x55/0x190
<0> [477.208307] CPU:3 [LOST 90622 EVENTS]
gem_ppgt-1583    3d..1 474010378us : process_csb: rcs0 cs-irq head=5, tail=5
<4> [477.208321]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [477.208326] RIP: 0033:0x7fad77b6dab7
<4> [477.208331] Code: 10 e9 67 ff ff ff 0f 1f 44 00 00 48 8b 15 c9 f3 2c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff e9 6b ff ff ff b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a1 f3 2c 00 f7 d8 64 89 01 48
<4> [477.208343] RSP: 002b:00007ffe67e3dc38 EFLAGS: 00000206 ORIG_RAX: 000000000000000b
<4> [477.208349] RAX: ffffffffffffffda RBX: 00005609b9b641d0 RCX: 00007fad77b6dab7
<4> [477.208355] RDX: 00005609b9b62220 RSI: 0000000000100000 RDI: 00007fad6b8d4000
<0> [477.208360] gem_ppgt-1583    3d..1 474010379us : __i915_request_submit: rcs0 fence 182:8 -> current 6
<4> [477.208367] RBP: 00005609b9b61b60 R08: 00005609b9b61c68 R09: 00007ffe67e69080
<4> [477.208373] R10: 00000000000e6fce R11: 0000000000000206 R12: 00000000000001da
<4> [477.208378] R13: 00005609b9b61b60 R14: 0000000000000000 R15: 0000000000000000
<4> [477.208386] irq event stamp: 1314095
<4> [477.208392] hardirqs last  enabled at (1314095): [<ffffffff9c202148>] handle_mm_fault+0x258/0x3a0
<4> [477.208400] hardirqs last disabled at (1314094): [<ffffffff9c201fa1>] handle_mm_fault+0xb1/0x3a0
<4> [477.208407] softirqs last  enabled at (1311040): [<ffffffff9cc0033a>] __do_softirq+0x33a/0x4b9
<4> [477.208415] softirqs last disabled at (1311033): [<ffffffff9c0b8f21>] irq_exit+0xd1/0xe0
<0> [477.208419] gem_ppgt-1583    3d..1 474010383us : __execlists_submission_tasklet: rcs0 in[0]:  ctx=320.1, fence 182:8 (current 6), prio=6
<4> [477.208423] WARNING: CPU: 0 PID: 1585 at kernel/locking/lockdep.c:3553 lock_downgrade+0x158/0x1e0
<4> [477.208425] ---[ end trace 27c5c6e50523a9d4 ]---

4> [477.234946] ---[ end trace 27c5c6e50523a9d6 ]---
<4> [477.684141] RIP: 0010:xas_start+0x71/0x1d0
<4> [477.684154] Code: ff ff e8 62 47 79 ff 85 c0 0f 84 43 01 00 00 5b 48 89 e8 5d 41 5c 41 5d c3 48 83 fa 02 75 08 48 3d 05 c0 ff ff 77 6e 4c 8b 2b <4d> 8b 65 40 e8 d6 35 7b ff 85 c0 4c 89 e5 74 0d 80 3d 31 0e 99 00
<4> [477.684176] RSP: 0000:ffffa60a8055fc90 EFLAGS: 00010202
<4> [477.684184] RAX: 0000000000000003 RBX: ffffa60a8055fcd8 RCX: 1137aa9900000000
<4> [477.684193] RDX: 0000000000000003 RSI: ffffffff9d084383 RDI: ffffa60a8055fcd8
<4> [477.684201] RBP: ffffa60a8055fcd8 R08: 0000000025ff6d19 R09: 0000000000000002
<4> [477.684210] R10: ffffa60a8055fc60 R11: ffffffff9d24c100 R12: 00000000ffffffff
<4> [477.684220] R13: ffffa08d71663ea0 R14: 0003feffffffffff R15: ffffffff9d2603c0
<4> [477.684230] FS:  00007fad786ed980(0000) GS:ffff8a131be80000(0000) knlGS:0000000000000000
<4> [477.684239] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [477.684246] CR2: ffffa08d71663ee0 CR3: 0000000210b6c005 CR4: 0000000000760ee0
<4> [477.684254] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4> [477.684264] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4> [477.684274] PKRU: 55555554
<3> [477.684285] BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:34
<3> [477.684296] in_atomic(): 0, irqs_disabled(): 1, pid: 1584, name: gem_ppgtt
<4> [477.684303] INFO: lockdep is turned off.
<4> [477.684308] irq event stamp: 1496392
<4> [477.684317] hardirqs last  enabled at (1496391): [<ffffffff9c202148>] handle_mm_fault+0x258/0x3a0
<4> [477.684330] hardirqs last disabled at (1496392): [<ffffffff9c0019b0>] trace_hardirqs_off_thunk+0x1a/0x1c
<4> [477.684342] softirqs last  enabled at (1495202): [<ffffffff9cc0033a>] __do_softirq+0x33a/0x4b9
<4> [477.684353] softirqs last disabled at (1495195): [<ffffffff9c0b8f21>] irq_exit+0xd1/0xe0
<3> [477.684364] Preemption disabled at:
<4> [477.684367] [<ffffffff9c127b8b>] wake_up_klogd+0xb/0x60
<4> [477.684383] CPU: 1 PID: 1584 Comm: gem_ppgtt Tainted: G     UD W         5.0.0-rc8-g348fd7b7fa65-drmtip_234+ #1
<4> [477.684394] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019
<4> [477.684408] Call Trace:
<4> [477.684417]  dump_stack+0x67/0x9b
<4> [477.684425]  ___might_sleep+0x167/0x250
<4> [477.684439]  exit_signals+0x2b/0x2d0
<4> [477.684446]  do_exit+0xa3/0xd90
<4> [477.684454]  ? __do_page_fault+0x246/0x500
<4> [477.684462]  rewind_stack_do_exit+0x17/0x20
<3> [477.762914] swap_info_get: Bad swap offset entry 3feffffffffff
<1> [477.762929] BUG: Bad page map in process gem_ppgtt  pte:2000000000000 pmd:207cdb067
<1> [477.763036] addr:000000008e397884 vm_flags:000000fb anon_vma:          (null) mapping:00000000abc6c2c4 index:d
<1> [477.763055] file:i915 fault:shmem_fault mmap:shmem_mmap readpage:          (null)
<4> [477.763069] CPU: 0 PID: 1584 Comm: gem_ppgtt Tainted: G     UD W         5.0.0-rc8-g348fd7b7fa65-drmtip_234+ #1
<4> [477.763084] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019
<4> [477.763183] Call Trace:
<4> [477.763193]  dump_stack+0x67/0x9b
<4> [477.763202]  print_bad_pte+0x1a0/0x270
<4> [477.763213]  unmap_page_range+0x7da/0xa00
<4> [477.763227]  unmap_vmas+0x67/0xd0
<4> [477.763236]  exit_mmap+0x9a/0x180
<4> [477.763329]  mmput+0x5c/0x120
<4> [477.763338]  do_exit+0x5ba/0xd90
<4> [477.763347]  ? __do_page_fault+0x246/0x500
<4> [477.763358]  rewind_stack_do_exit+0x17/0x20
<3> [477.763371] swap_info_get: Bad swap offset entry 3feffffffffff
<1> [477.763381] BUG: Bad page map in process gem_ppgtt  pte:2000000000000 pmd:207cdb067
<1> [477.763393] addr:00000000db46049a vm_flags:000000fb anon_vma:          (null) mapping:00000000abc6c2c4 index:e
<1> [477.763410] file:i915 fault:shmem_fault mmap:shmem_mmap readpage:          (null)
<4> [477.763516] CPU: 0 PID: 1584 Comm: gem_ppgtt Tainted: G    BUD W         5.0.0-rc8-g348fd7b7fa65-drmtip_234+ #1
Comment 1 CI Bug Log 2019-03-04 08:17:54 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@gem_ppgtt@blt-vs-render-ctx0 - dmesg-fail - BUG: unable to handle kernel paging request at ffffa08d71663ee0
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_234/fi-icl-y/igt@gem_ppgtt@blt-vs-render-ctx0.html
Comment 2 CI Bug Log 2019-03-04 08:34:25 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@runner@aborted - fail - Previous test: gem_ppgtt (blt-vs-render-ctx0)
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_234/fi-icl-y/igt@runner@aborted.html
Comment 3 Chris Wilson 2019-03-04 09:25:42 UTC
That doesn't (immediately) look like our fault.
Comment 4 Chris Wilson 2019-03-07 12:52:30 UTC
If it's stray memcorruption (honestly more likely than a bug in the swap code), then 

commit e781a7a3235e9ff68095d2cd4d9c1e039a0516d7
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Mar 4 11:41:13 2019 +0000

    drm/i915: Acquire breadcrumb ref before cancelling
    
    We may race the interrupt signaling with retirement, in which case the
    order in which we acquire the reference inside the interrupt is vital to
    provide the correct barrier against the request being freed in
    retirement, i.e. we need to acquire our reference before marking the
    breadcrumb as cancelled (as soon as the breadcrumb is cancelled
    retirement may drop its reference to the request without serialisation
    with the interrupt handler).

is definitely a possible fix. But without definite proof, the only thing we can do is read the code while waiting for a pattern.
Comment 5 Chris Wilson 2019-03-15 23:12:30 UTC
Let's claim it was indeed fixed to provoke it to occur again.
Comment 6 Martin Peres 2019-04-17 14:31:49 UTC
(In reply to Chris Wilson from comment #4)
> If it's stray memcorruption (honestly more likely than a bug in the swap
> code), then 
> 
> commit e781a7a3235e9ff68095d2cd4d9c1e039a0516d7
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Mar 4 11:41:13 2019 +0000
> 
>     drm/i915: Acquire breadcrumb ref before cancelling
>     
>     We may race the interrupt signaling with retirement, in which case the
>     order in which we acquire the reference inside the interrupt is vital to
>     provide the correct barrier against the request being freed in
>     retirement, i.e. we need to acquire our reference before marking the
>     breadcrumb as cancelled (as soon as the breadcrumb is cancelled
>     retirement may drop its reference to the request without serialisation
>     with the interrupt handler).
> 
> is definitely a possible fix. But without definite proof, the only thing we
> can do is read the code while waiting for a pattern.

Seen only once, 22 drmtip runs ago. I guess we'll have to wait quite a long time before closing this one...


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.