Summary: | [CI][SHARDS] igt@* - dmesg-warn - i915_drop_caches_set+ | ||
---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | RESOLVED NOTOURBUG | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | high | CC: | intel-gfx-bugs, lakshminarayana.vudum, mike |
Version: | XOrg git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | HSW, SNB | i915 features: | GEM/Other |
Description
Martin Peres
2019-05-29 07:41:00 UTC
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * SNB HSW: all tests - dmesg-warn - i915_drop_caches_set+ - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13107/shard-snb2/igt@gem_pwrite@huge-cpu-forwards.html - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_295/fi-hsw-4770/igt@gem_pwrite@small-gtt-backwards.html - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4342/fi-hsw-4770/igt@gem_exec_gttfill@basic.html - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4342/fi-hsw-4770r/igt@gem_exec_gttfill@basic.html - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4342/fi-snb-2600/igt@gem_exec_gttfill@basic.html - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6157/shard-snb2/igt@gem_tiled_swapping@non-threaded.html - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13063/shard-hsw4/igt@gem_persistent_relocs@forked-interruptible-faulting-reloc-thrashing.html - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4331/fi-hsw-4770r/igt@gem_exec_gttfill@basic.html - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4331/fi-snb-2600/igt@gem_exec_gttfill@basic.html Not us; we're just a victim here. RCU looks like it has acquired a new race. Does rcutorture show any failures? Failure reported in Bug 110776 looks similar to this bug report. Is it a duplicate? The CI Bug Log issue associated to this bug has been updated. ### New filters associated * HSW: igt@runner@aborted - fail - Previous test: gem_pwrite (small-gtt-backwards) - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_295/fi-hsw-4770/igt@runner@aborted.html I suspect fixed by commit f27a5d91201639161d6f6e25af1c89c9cbb3cac7 (drm-intel/topic/core-for-CI, topic/core-for-CI) Author: Hugh Dickins <hughd@google.com> Date: Wed May 29 09:25:40 2019 +0200 x86/fpu: Use fault_in_pages_writeable() for pre-faulting Since commit d9c9ce34ed5c8 ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails") we use get_user_pages_unlocked() to pre-faulting user's memory if a write generates a pagefault while the handler is disabled. This works in general and uncovered a bug as reported by Mike Rapoport. It has been pointed out that this function may be fragile and a simple pre-fault as in fault_in_pages_writeable() would be a better solution. Better as in taste and simplicity: That write (as performed by the alternative function) performs exactly the same faulting of memory that we had before. This was suggested by Hugh Dickins and Andrew Morton. Use fault_in_pages_writeable() for pre-faulting of user's stack. Fixes: d9c9ce34ed5c8 ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails") Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Hugh Dickins <hughd@google.com> [bigeasy: patch description] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> *** Bug 110776 has been marked as a duplicate of this bug. *** Still able to reproduce rcu stalls under duress. Fwiw, the same rcu stalls are in v5.2-rc1. I just hope it's reproducible enough for bisection. (In reply to Chris Wilson from comment #8) > Fwiw, the same rcu stalls are in v5.2-rc1. I just hope it's reproducible > enough for bisection. Nearly. Failure is easy to spot, but my test is not 100% reliable in triggering them. So far I think I have it narrowed down to the mm/ pull, but I need to rerun all the good commits for a longer soak to confirm that are indeed good. What was 30min, let's make it 3hours. 5fd4ca2d84b249f0858ce28cf637cf25b61a398f is the first bad commit commit 5fd4ca2d84b249f0858ce28cf637cf25b61a398f Author: Matthew Wilcox <willy@infradead.org> Date: Mon May 13 17:16:44 2019 -0700 mm: page cache: store only head pages in i_pages Transparent Huge Pages are currently stored in i_pages as pointers to consecutive subpages. This patch changes that to storing consecutive pointers to the head page in preparation for storing huge pages more efficiently in i_pages. Large parts of this are "inspired" by Kirill's patch https://lore.kernel.org/lkml/20170126115819.58875-2-kirill.shutemov@linux.intel.com/ [willy@infradead.org: fix swapcache pages] Link: http://lkml.kernel.org/r/20190324155441.GF10344@bombadil.infradead.org [kirill@shutemov.name: hugetlb stores pages in page cache differently] Link: http://lkml.kernel.org/r/20190404134553.vuvhgmghlkiw2hgl@kshutemo-mobl1 Link: http://lkml.kernel.org/r/20190307153051.18815-1-willy@infradead.org Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Jan Kara <jack@suse.cz> Reviewed-by: Kirill Shutemov <kirill@shutemov.name> Reviewed-and-tested-by: Song Liu <songliubraving@fb.com> Tested-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Tested-by: Qian Cai <cai@lca.pw> Cc: Hugh Dickins <hughd@google.com> Cc: Song Liu <liu.song.a23@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [ 76.175502] page:ffffea00098e0000 count:0 mapcount:0 mapping:0000000000000000 index:0x1 [ 76.175525] flags: 0x8000000000000000() [ 76.175533] raw: 8000000000000000 ffffea0004a7e988 ffffea000445c3c8 0000000000000000 [ 76.175538] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 [ 76.175543] page dumped because: VM_BUG_ON_PAGE(entry != page) [ 76.175560] ------------[ cut here ]------------ [ 76.175564] kernel BUG at mm/swap_state.c:170! [ 76.175574] invalid opcode: 0000 [#1] PREEMPT SMP [ 76.175581] CPU: 0 PID: 131 Comm: kswapd0 Tainted: G U 5.1.0+ #247 [ 76.175586] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0029.2016.1124.1625 11/24/2016 [ 76.175598] RIP: 0010:__delete_from_swap_cache+0x22e/0x340 [ 76.175604] Code: e8 b7 3e fd ff 48 01 1d a8 7e 04 01 48 83 c4 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c6 03 7e bf 81 48 89 c7 e8 92 f8 fd ff <0f> 0b 48 c7 c6 c8 7c bf 81 48 89 df e8 81 f8 fd ff 0f 0b 48 c7 c6 [ 76.175613] RSP: 0000:ffffc900008dba88 EFLAGS: 00010046 [ 76.175619] RAX: 0000000000000032 RBX: ffffea00098e0040 RCX: 0000000000000006 [ 76.175624] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff81bf6d4c [ 76.175629] RBP: ffff888265ed8640 R08: 00000000000002c2 R09: 0000000000000000 [ 76.175634] R10: 0000000273a4626d R11: 0000000000000000 R12: 0000000000000001 [ 76.175639] R13: 0000000000000040 R14: 0000000000000000 R15: ffffea00098e0000 [ 76.175645] FS: 0000000000000000(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000 [ 76.175651] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 76.175656] CR2: 00007f24e4399000 CR3: 0000000002c09000 CR4: 00000000001406f0 [ 76.175661] Call Trace: [ 76.175671] __remove_mapping+0x1c2/0x380 [ 76.175678] shrink_page_list+0x11db/0x1d10 [ 76.175684] shrink_inactive_list+0x14b/0x420 [ 76.175690] shrink_node_memcg+0x20e/0x740 [ 76.175696] shrink_node+0xba/0x420 [ 76.175702] balance_pgdat+0x27d/0x4d0 [ 76.175709] kswapd+0x216/0x300 [ 76.175715] ? wait_woken+0x80/0x80 [ 76.175721] ? balance_pgdat+0x4d0/0x4d0 [ 76.175726] kthread+0x106/0x120 [ 76.175732] ? kthread_create_on_node+0x40/0x40 [ 76.175739] ret_from_fork+0x1f/0x30 [ 76.175745] Modules linked in: i915 intel_gtt drm_kms_helper [ 76.175754] ---[ end trace 8faf2ec849d50724 ]--- [ 76.206689] RIP: 0010:__delete_from_swap_cache+0x22e/0x340 [ 76.206708] Code: e8 b7 3e fd ff 48 01 1d a8 7e 04 01 48 83 c4 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c6 03 7e bf 81 48 89 c7 e8 92 f8 fd ff <0f> 0b 48 c7 c6 c8 7c bf 81 48 89 df e8 81 f8 fd ff 0f 0b 48 c7 c6 [ 76.206718] RSP: 0000:ffffc900008dba88 EFLAGS: 00010046 [ 76.206723] RAX: 0000000000000032 RBX: ffffea00098e0040 RCX: 0000000000000006 [ 76.206729] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff81bf6d4c [ 76.206734] RBP: ffff888265ed8640 R08: 00000000000002c2 R09: 0000000000000000 [ 76.206740] R10: 0000000273a4626d R11: 0000000000000000 R12: 0000000000000001 [ 76.206745] R13: 0000000000000040 R14: 0000000000000000 R15: ffffea00098e0000 [ 76.206750] FS: 0000000000000000(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000 [ 76.206757] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9f8bce9a6b32..e15b67d4373a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2501,6 +2501,11 @@ static void __split_huge_page(struct page *page, struct list_head *list, } else if (!PageAnon(page)) { __xa_store(&head->mapping->i_pages, head[i].index, head + i, 0); + } else if (PageSwapCache(page)) { + swp_entry_t entry = { .val = page_private(head + i) }; + __xa_store(&swap_address_space(entry)->i_pages, + swp_offset(entry), + head + i, 0); } } Reported upstream and applied band aid to core-for-CI. *** Bug 110827 has been marked as a duplicate of this bug. *** Hi Chris The fix: commit d7fbcebcb044e9e602a730138621471c619b87db Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Jun 3 08:08:43 2019 +0100 mm: Band aid for 5fd4ca2d84b249f0858ce28cf637cf25b61a398f Breaks the build of the kernel for me: axion /usr/src/intel # make CALL scripts/checksyscalls.sh CALL scripts/atomic/check-atomics.sh DESCEND objtool CHK include/generated/compile.h CHK kernel/kheaders_data.tar.xz CC mm/huge_memory.o mm/huge_memory.c: In function ‘__split_huge_page’: mm/huge_memory.c:2504:41: warning: dereferencing ‘void *’ pointer 2504 | __xa_store(&swap_address_space(entry)->i_pages, | ^~ mm/huge_memory.c:2504:41: error: request for member ‘i_pages’ in something not a structure or union make[1]: *** [scripts/Makefile.build:279: mm/huge_memory.o] Error 1 make: *** [Makefile:1071: mm] Error 2 Reverting your change gets it building again It's a trivial CONFIG_SWAP dependency. Still waiting for Matthew Wilcox to fix this regression for realz. Ah, I don't have swap enabled here |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.