Bug 110778 - [CI][SHARDS] igt@* - dmesg-warn - BUG: unable to handle page fault for address
Summary: [CI][SHARDS] igt@* - dmesg-warn - BUG: unable to handle page fault for address
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-28 08:06 UTC by Martin Peres
Modified: 2019-05-29 21:31 UTC (History)
1 user (show)

See Also:
i915 platform: ICL, KBL
i915 features: GEM/Other


Attachments

Description Martin Peres 2019-05-28 08:06:26 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6147/shard-kbl3/igt@gem_tiled_swapping@non-threaded.html

<6> [1504.674826] Console: switching to colour dummy device 80x25
<6> [1504.674876] [IGT] gem_tiled_swapping: executing
<1> [1505.337813] BUG: unable to handle page fault for address: ffffea0003ff8030
<1> [1505.337825] #PF: supervisor read access in kernel mode
<1> [1505.337831] #PF: error_code(0x0000) - not-present page
<6> [1505.337837] PGD 276ef7067 P4D 276ef7067 PUD 276ef6067 PMD 0 
<4> [1505.337845] Oops: 0000 [#1] PREEMPT SMP PTI
<4> [1505.337852] CPU: 3 PID: 38 Comm: khugepaged Tainted: G     U            5.2.0-rc2-CI-CI_DRM_6147+ #1
<4> [1505.337862] Hardware name:  /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
<4> [1505.337875] RIP: 0010:compaction_alloc+0x5d3/0x960
<4> [1505.337882] Code: 39 cf 0f 83 e8 00 00 00 e9 85 01 00 00 48 8b 04 24 4d 89 e6 80 b8 7d 04 00 00 00 0f 84 03 01 00 00 4d 85 f6 0f 84 a2 00 00 00 <41> 8b 46 30 25 80 00 00 f0 3d 00 00 00 f0 0f 84 fd 00 00 00 80 7b
<4> [1505.337900] RSP: 0018:ffffc9000019b928 EFLAGS: 00010286
<4> [1505.337906] RAX: ffffffff8230bac0 RBX: ffffc9000019bb30 RCX: 00000000000001e0
<4> [1505.337914] RDX: 80000000000ffe00 RSI: 1000000000000000 RDI: 2333333333333533
<4> [1505.337922] RBP: 8000000000100000 R08: efffffffffffffff R09: ffffffff8230bac0
<4> [1505.337929] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0003ff8000
<4> [1505.337937] R13: 0000000000000020 R14: ffffea0003ff8000 R15: 80000000000ffe00
<4> [1505.337945] FS:  0000000000000000(0000) GS:ffff888276b80000(0000) knlGS:0000000000000000
<4> [1505.337954] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [1505.337961] CR2: ffffea0003ff8030 CR3: 0000000005210004 CR4: 00000000003606e0
<4> [1505.337969] Call Trace:
<4> [1505.337979]  migrate_pages+0x122/0xb40
<4> [1505.337986]  ? isolate_freepages_block+0x460/0x460
<4> [1505.337993]  ? __reset_isolation_suitable+0x110/0x110
<4> [1505.338001]  compact_zone+0x604/0xf60
<4> [1505.338010]  compact_zone_order+0xda/0x120
<4> [1505.338020]  ? try_to_wake_up+0x257/0x820
<4> [1505.338026]  ? try_to_compact_pages+0xb2/0x2b0
<4> [1505.338033]  try_to_compact_pages+0xb2/0x2b0
<4> [1505.338041]  __alloc_pages_direct_compact+0x62/0x140
<4> [1505.338049]  __alloc_pages_nodemask+0x72d/0x1130
<4> [1505.338058]  ? lock_acquire+0xa6/0x1c0
<4> [1505.338066]  ? khugepaged+0x233/0x2560
<4> [1505.338074]  khugepaged+0x2d4/0x2560
<4> [1505.338086]  ? wait_woken+0xa0/0xa0
<4> [1505.338093]  ? collapse_shmem.isra.8+0xe50/0xe50
<4> [1505.338100]  kthread+0x119/0x130
<4> [1505.338106]  ? kthread_park+0x80/0x80
<4> [1505.338113]  ret_from_fork+0x3a/0x50

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_295/fi-icl-dsi/igt@gem_ppgtt@blt-vs-render-ctxn.html

<6> [54.317209] Console: switching to colour dummy device 80x25
<6> [54.317258] [IGT] gem_ppgtt: executing
<6> [54.323140] [IGT] gem_ppgtt: starting subtest blt-vs-render-ctxN
<6> [54.323352] gem_ppgtt (1210): drop_caches: 4
<1> [85.048012] BUG: unable to handle page fault for address: ffff98cb44b36c48
<1> [85.048028] #PF: supervisor write access in kernel mode
<1> [85.048036] #PF: error_code(0x0002) - not-present page
<6> [85.048044] PGD 23a001067 P4D 23a001067 PUD 0 
<4> [85.048055] Oops: 0002 [#1] PREEMPT SMP PTI
<4> [85.048064] CPU: 7 PID: 223 Comm: kworker/7:2 Tainted: G     U            5.2.0-rc2-ga8a2a6870850-drmtip_295+ #1
<4> [85.048077] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3102.A00.1903052247 03/05/2019
<4> [85.048099] Workqueue: events delayed_fput
<4> [85.048111] RIP: 0010:__lock_acquire+0xf6/0x24c0
<4> [85.048119] Code: ff 4c 8b 95 40 ff ff ff 0f 84 e8 04 00 00 49 81 ec 80 55 c4 a4 48 b8 a3 8b 2e ba e8 a2 8b 2e 49 c1 fc 04 4c 0f af e0 49 63 c4 <65> 48 ff 04 c5 50 66 01 00 8b 05 43 3c 43 02 45 8b ba 70 08 00 00
<4> [85.048141] RSP: 0018:ffffa4474089bc60 EFLAGS: 00010803
<4> [85.048150] RAX: 000000005d1740bf RBX: 0000000000000000 RCX: 0000000000000000
<4> [85.048160] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff98c8524dd640
<4> [85.048170] RBP: ffffa4474089bd20 R08: 0000000000000001 R09: 0000000000000000
<4> [85.048179] R10: ffff98c8561a8040 R11: 0000000000000000 R12: 1745d1745d1740bf
<4> [85.048189] R13: ffff98c8524dd640 R14: 0000000000000001 R15: 0000000000000246
<4> [85.048199] FS:  0000000000000000(0000) GS:ffff98c85bf80000(0000) knlGS:0000000000000000
<4> [85.048211] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [85.048220] CR2: ffff98cb44b36c48 CR3: 0000000281c7a002 CR4: 0000000000760ee0
<4> [85.048230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4> [85.048240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4> [85.048249] PKRU: 55555554
<4> [85.048254] Call Trace:
<4> [85.048264]  ? dentry_kill+0x22/0x1b0
<4> [85.048273]  ? mark_held_locks+0x49/0x70
<4> [85.048284]  ? lock_acquire+0xa6/0x1c0
<4> [85.048292]  lock_acquire+0xa6/0x1c0
<4> [85.048300]  ? dentry_kill+0x22/0x1b0
<4> [85.048309]  ? dput+0x20/0x2c0
<4> [85.048319]  _raw_spin_trylock+0x60/0x80
<4> [85.048328]  ? dentry_kill+0x22/0x1b0
<4> [85.048336]  dentry_kill+0x22/0x1b0
<4> [85.048344]  ? dput+0x20/0x2c0
<4> [85.048351]  dput+0x262/0x2c0
<4> [85.048359]  __fput+0x102/0x220
<4> [85.048368]  delayed_fput+0x17/0x30
<4> [85.048377]  process_one_work+0x245/0x610
<4> [85.048387]  worker_thread+0x37/0x380
<4> [85.048396]  ? process_one_work+0x610/0x610
<4> [85.048404]  kthread+0x119/0x130
<4> [85.048412]  ? kthread_park+0x80/0x80
<4> [85.048421]  ret_from_fork+0x3a/0x50
Comment 1 CI Bug Log 2019-05-28 08:07:48 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* KBL ICL: all tests - dmesg-warn - BUG: unable to handle page fault for address
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6147/shard-kbl3/igt@gem_tiled_swapping@non-threaded.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_295/fi-icl-dsi/igt@gem_ppgtt@blt-vs-render-ctxn.html
Comment 2 Chris Wilson 2019-05-28 08:10:08 UTC
Superficially not us.
Comment 3 Chris Wilson 2019-05-29 21:31:32 UTC
I suspect fixed by

commit f27a5d91201639161d6f6e25af1c89c9cbb3cac7 (drm-intel/topic/core-for-CI, topic/core-for-CI)
Author: Hugh Dickins <hughd@google.com>
Date:   Wed May 29 09:25:40 2019 +0200

    x86/fpu: Use fault_in_pages_writeable() for pre-faulting
    
    Since commit
    
       d9c9ce34ed5c8 ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails")
    
    we use get_user_pages_unlocked() to pre-faulting user's memory if a
    write generates a pagefault while the handler is disabled.
    This works in general and uncovered a bug as reported by Mike Rapoport.
    It has been pointed out that this function may be fragile and a
    simple pre-fault as in fault_in_pages_writeable() would be a better
    solution. Better as in taste and simplicity: That write (as performed by
    the alternative function) performs exactly the same faulting of memory
    that we had before. This was suggested by Hugh Dickins and Andrew
    Morton.
    
    Use fault_in_pages_writeable() for pre-faulting of user's stack.
    
    Fixes: d9c9ce34ed5c8 ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails")
    Suggested-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Hugh Dickins <hughd@google.com>
    [bigeasy: patch description]
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

but this might a different issue - time will tell.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.