Bug 110789

Summary:	[CI][SHARDS] igt@* - dmesg-warn - i915_drop_caches_set+
Product:	DRI	Reporter:	Martin Peres <martin.peres>
Component:	DRM/Intel	Assignee:	Intel GFX Bugs mailing list <intel-gfx-bugs>
Status:	RESOLVED NOTOURBUG	QA Contact:	Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity:	normal
Priority:	high	CC:	intel-gfx-bugs, lakshminarayana.vudum, mike
Version:	XOrg git
Hardware:	Other
OS:	All
Whiteboard:	ReadyForDev
i915 platform:	HSW, SNB	i915 features:	GEM/Other

Description Martin Peres 2019-05-29 07:41:00 UTC

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6157/shard-snb2/igt@gem_tiled_swapping@non-threaded.html

<6> [612.466174] Console: switching to colour dummy device 80x25
<6> [612.466227] [IGT] gem_tiled_swapping: executing
<6> [613.438871] [IGT] gem_tiled_swapping: starting subtest non-threaded
<6> [657.145239] [IGT] gem_tiled_swapping: exiting, ret=0
<5> [657.146397] Setting dangerous option reset - tainting kernel
<3> [718.271943] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [718.272027] rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-7): P1012
<4> [718.272145] 	(detected by 7, t=65002 jiffies, g=145317, q=10455)
<6> [718.272148] java            R  running task    13352  1012    953 0x00004000
<4> [718.272154] Call Trace:
<4> [718.272165]  ? retint_kernel+0x2b/0x2b
<4> [718.272176]  ? lock_is_held_type+0xe0/0x120
<4> [718.272180]  ? lock_is_held_type+0xfb/0x120
<4> [718.272186]  ? xas_descend+0xe0/0x120
<4> [718.272191]  ? xas_load+0xa/0x50
<4> [718.272195]  ? find_get_entry+0xa2/0x300
<4> [718.272204]  ? pagecache_get_page+0x2b/0x220
<4> [718.272210]  ? lookup_swap_cache+0x3b/0x100
<4> [718.272215]  ? do_swap_page+0x86/0x920
<4> [718.272223]  ? __handle_mm_fault+0x676/0xfc0
<4> [718.272235]  ? handle_mm_fault+0x155/0x350
<4> [718.272242]  ? __do_page_fault+0x248/0x4f0
<4> [718.272247]  ? page_fault+0x8/0x30
<4> [718.272252]  ? page_fault+0x1e/0x30
<3> [738.718980] INFO: task java:1010 blocked for more than 61 seconds.
<3> [738.719033]       Tainted: G     U  W         5.2.0-rc2-CI-CI_DRM_6157+ #1
<3> [738.719049] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6> [738.719067] java            D12968  1010    953 0x00000000
<4> [738.719071] Call Trace:
<4> [738.719077]  ? __schedule+0x311/0x840
<4> [738.719081]  ? rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719084]  schedule+0x2e/0xb0
<4> [738.719086]  rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719093]  ? down_read+0x65/0xd0
<4> [738.719094]  down_read+0x65/0xd0
<4> [738.719097]  __do_page_fault+0x3d9/0x4f0
<4> [738.719101]  ? page_fault+0x8/0x30
<4> [738.719103]  page_fault+0x1e/0x30
<4> [738.719105] RIP: 0033:0x7f12e7fabede
<4> [738.719109] Code: Bad RIP value.
<4> [738.719110] RSP: 002b:00007f12c25d9d60 EFLAGS: 00010206
<4> [738.719112] RAX: 000000000172ca00 RBX: 00007f12e9458420 RCX: 0000000000000002
<4> [738.719114] RDX: 00007f12e792a040 RSI: 00007f12e85db5ec RDI: 00007f12e0010320
<4> [738.719115] RBP: 00007f12c25d9d70 R08: 00007ffcfc8c30b0 R09: 0000000000000032
<4> [738.719117] R10: 00000000000b2ac8 R11: 000000000000029c R12: 0000000000000008
<4> [738.719118] R13: 00007f12e021d470 R14: 0000000000000002 R15: 0000000000000001
<3> [738.719125] INFO: task java:1011 blocked for more than 61 seconds.
<3> [738.719140]       Tainted: G     U  W         5.2.0-rc2-CI-CI_DRM_6157+ #1
<3> [738.719155] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6> [738.719172] java            D13216  1011    953 0x00000000
<4> [738.719176] Call Trace:
<4> [738.719180]  ? __schedule+0x311/0x840
<4> [738.719183]  ? rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719186]  schedule+0x2e/0xb0
<4> [738.719188]  rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719195]  ? down_read+0x65/0xd0
<4> [738.719196]  down_read+0x65/0xd0
<4> [738.719198]  __do_page_fault+0x3d9/0x4f0
<4> [738.719202]  ? page_fault+0x8/0x30
<4> [738.719205]  page_fault+0x1e/0x30
<4> [738.719206] RIP: 0033:0x7f12e7cec6e0
<4> [738.719208] Code: Bad RIP value.
<4> [738.719210] RSP: 002b:00007f12c20ad728 EFLAGS: 00010246
<4> [738.719211] RAX: 00007f12c20ad738 RBX: 00007f12d0ed8b48 RCX: 0000000000000000
<4> [738.719213] RDX: 00007f12d0ed9c28 RSI: 0000000000000000 RDI: 00007f12e03f4800
<4> [738.719214] RBP: 00007f12c20ad778 R08: 00007ffcfc8c30b0 R09: 00007ffcfc8c3080
<4> [738.719216] R10: 00000099d161f9a9 R11: 0000000000000294 R12: 0000000000000000
<4> [738.719217] R13: 00007f12d0ed89d0 R14: 00007f12c20ad7c8 R15: 00007f12e03f4800
<3> [738.719224] INFO: task java:1021 blocked for more than 61 seconds.
<3> [738.719238]       Tainted: G     U  W         5.2.0-rc2-CI-CI_DRM_6157+ #1
<3> [738.719253] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6> [738.719271] java            D13816  1021    953 0x00000000
<4> [738.719274] Call Trace:
<4> [738.719278]  ? __schedule+0x311/0x840
<4> [738.719281]  ? rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719284]  schedule+0x2e/0xb0
<4> [738.719286]  rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719292]  ? down_read+0x65/0xd0
<4> [738.719294]  down_read+0x65/0xd0
<4> [738.719296]  __do_page_fault+0x3d9/0x4f0
<4> [738.719299]  ? page_fault+0x8/0x30
<4> [738.719302]  page_fault+0x1e/0x30
<4> [738.719303] RIP: 0033:0x7f12e7f4cd96
<4> [738.719306] Code: Bad RIP value.
<4> [738.719307] RSP: 002b:00007f12d00f75d0 EFLAGS: 00010202
<4> [738.719308] RAX: 00007f12e9457100 RBX: 00007f1260038000 RCX: 0000000000000000
<4> [738.719310] RDX: 00007f1260038828 RSI: 0000000000000000 RDI: 00007f1260038828
<4> [738.719311] RBP: 00007f12d00f7650 R08: 0000000000000000 R09: 0000000000000080
<4> [738.719313] R10: 00007f12d00f74b0 R11: 0000000000000206 R12: 00000000000003e8
<4> [738.719314] R13: 00007f12e85db5ec R14: 0000000000000000 R15: 00007f125c004798
<3> [738.719321] INFO: task java:1026 blocked for more than 61 seconds.
<3> [738.719335]       Tainted: G     U  W         5.2.0-rc2-CI-CI_DRM_6157+ #1
<3> [738.719350] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6> [738.719367] java            D13064  1026    953 0x00000000
<4> [738.719371] Call Trace:
<4> [738.719374]  ? __schedule+0x311/0x840
<4> [738.719378]  ? rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719381]  schedule+0x2e/0xb0
<4> [738.719383]  rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719389]  ? down_read+0x65/0xd0
<4> [738.719390]  down_read+0x65/0xd0
<4> [738.719392]  __do_page_fault+0x3d9/0x4f0
<4> [738.719396]  ? page_fault+0x8/0x30
<4> [738.719398]  page_fault+0x1e/0x30
<4> [738.719400] RIP: 0033:0x7f12e860a1c3
<4> [738.719402] Code: Bad RIP value.
<4> [738.719403] RSP: 002b:00007f12c0c54530 EFLAGS: 00010202
<4> [738.719405] RAX: ffffffffffffff92 RBX: 00007f126c006a80 RCX: 00007f12e8609ed9
<4> [738.719406] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000000000000000
<4> [738.719408] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffcfc8c3080
<4> [738.719409] R10: 00007f12c0c54590 R11: 0000000000000246 R12: 00007f126c006a58
<4> [738.719411] R13: 00007f126c006aac R14: 00007f12c0c54650 R15: 00007f126c006aa4
<3> [738.719418] INFO: task java:1160 blocked for more than 61 seconds.
<3> [738.719432]       Tainted: G     U  W         5.2.0-rc2-CI-CI_DRM_6157+ #1
<3> [738.719447] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6> [738.719465] java            D13760  1160    953 0x00000000
<4> [738.719468] Call Trace:
<4> [738.719471]  ? __schedule+0x311/0x840
<4> [738.719475]  ? rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719478]  schedule+0x2e/0xb0
<4> [738.719479]  rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719486]  ? down_read+0x65/0xd0
<4> [738.719487]  down_read+0x65/0xd0
<4> [738.719489]  __do_page_fault+0x3d9/0x4f0
<4> [738.719493]  ? page_fault+0x8/0x30
<4> [738.719495]  page_fault+0x1e/0x30
<4> [738.719497] RIP: 0033:0x7f12e860a1c3
<4> [738.719499] Code: Bad RIP value.
<4> [738.719500] RSP: 002b:00007f12c037a530 EFLAGS: 00010202
<4> [738.719502] RAX: ffffffffffffff92 RBX: 00007f123c002900 RCX: 00007f12e8609ed9
<4> [738.719503] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000000000000000
<4> [738.719505] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffcfc8c3080
<4> [738.719506] R10: 00007f12c037a590 R11: 0000000000000246 R12: 00007f123c0028d8
<4> [738.719507] R13: 00007f123c002928 R14: 00007f12c037a650 R15: 00007f123c002924
<3> [738.719514] INFO: task java:1516 blocked for more than 61 seconds.
<3> [738.719528]       Tainted: G     U  W         5.2.0-rc2-CI-CI_DRM_6157+ #1
<3> [738.719544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6> [738.719561] java            D14008  1516    953 0x00000000
<4> [738.719565] Call Trace:
<4> [738.719568]  ? __schedule+0x311/0x840
<4> [738.719571]  ? rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719574]  schedule+0x2e/0xb0
<4> [738.719576]  rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719583]  ? down_read+0x65/0xd0
<4> [738.719584]  down_read+0x65/0xd0
<4> [738.719586]  __do_page_fault+0x3d9/0x4f0
<4> [738.719590]  ? page_fault+0x8/0x30
<4> [738.719592]  page_fault+0x1e/0x30
<4> [738.719594] RIP: 0033:0x7f12e7f4cd96
<4> [738.719596] Code: Bad RIP value.
<4> [738.719597] RSP: 002b:00007f12c01785d0 EFLAGS: 00010202
<4> [738.719599] RAX: 00007f12e9457100 RBX: 00007f122c002000 RCX: 0000000000000000
<4> [738.719600] RDX: 00007f1270007328 RSI: 0000000000000000 RDI: 00007f1270007328
<4> [738.719602] RBP: 00007f12c0178650 R08: 0000000000000000 R09: 0000000000000080
<4> [738.719603] R10: 00007f12c01784b0 R11: 0000000000000206 R12: 00000000000003e8
<4> [738.719604] R13: 00007f12e85db5ec R14: 0000000000000000 R15: 00007f125c07f328
<3> [738.719611] INFO: task java:1845 blocked for more than 61 seconds.
<3> [738.719625]       Tainted: G     U  W         5.2.0-rc2-CI-CI_DRM_6157+ #1
<3> [738.719641] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6> [738.719658] java            D13944  1845    953 0x00000000
<4> [738.719661] Call Trace:
<4> [738.719665]  ? __schedule+0x311/0x840
<4> [738.719668]  ? rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719671]  schedule+0x2e/0xb0
<4> [738.719673]  rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719680]  ? down_read+0x65/0xd0
<4> [738.719681]  down_read+0x65/0xd0
<4> [738.719683]  __do_page_fault+0x3d9/0x4f0
<4> [738.719687]  ? page_fault+0x8/0x30
<4> [738.719689]  page_fault+0x1e/0x30
<4> [738.719691] RIP: 0033:0x7f12e860a1c3
<4> [738.719693] Code: Bad RIP value.
<4> [738.719694] RSP: 002b:00007f12c0e56530 EFLAGS: 00010202
<4> [738.719696] RAX: ffffffffffffff92 RBX: 00007f126800ad60 RCX: 00007f12e8609ed9
<4> [738.719697] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000000000000000
<4> [738.719698] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffcfc8c3080
<4> [738.719700] R10: 00007f12c0e56590 R11: 0000000000000246 R12: 00007f126800ad38
<4> [738.719701] R13: 00007f126800ad88 R14: 00007f12c0e56650 R15: 00007f126800ad84
<3> [738.719708] INFO: task java:1848 blocked for more than 61 seconds.
<3> [738.719722]       Tainted: G     U  W         5.2.0-rc2-CI-CI_DRM_6157+ #1
<3> [738.719737] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6> [738.719755] java            D13160  1848    953 0x00000000
<4> [738.719758] Call Trace:
<4> [738.719762]  ? __schedule+0x311/0x840
<4> [738.719765]  ? rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719768]  schedule+0x2e/0xb0
<4> [738.719770]  rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719776]  ? down_read+0x65/0xd0
<4> [738.719777]  down_read+0x65/0xd0
<4> [738.719780]  __do_page_fault+0x3d9/0x4f0
<4> [738.719783]  ? page_fault+0x8/0x30
<4> [738.719786]  page_fault+0x1e/0x30
<4> [738.719787] RIP: 0033:0x7f12e860a1c3
<4> [738.719789] Code: Bad RIP value.
<4> [738.719790] RSP: 002b:00007f12c1aa1530 EFLAGS: 00010202
<4> [738.719792] RAX: ffffffffffffff92 RBX: 00007f126400f350 RCX: 00007f12e8609ed9
<4> [738.719793] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000000000000000
<4> [738.719795] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffcfc8c3080
<4> [738.719796] R10: 00007f12c1aa1590 R11: 0000000000000246 R12: 00007f126400f328
<4> [738.719797] R13: 00007f126400f37c R14: 00007f12c1aa1650 R15: 00007f126400f374
<3> [738.719804] INFO: task java:1849 blocked for more than 61 seconds.
<3> [738.719818]       Tainted: G     U  W         5.2.0-rc2-CI-CI_DRM_6157+ #1
<3> [738.719834] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6> [738.719851] java            D14008  1849    953 0x00000000
<4> [738.719855] Call Trace:
<4> [738.719858]  ? __schedule+0x311/0x840
<4> [738.719862]  ? rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719865]  schedule+0x2e/0xb0
<4> [738.719867]  rwsem_down_read_failed+0xf7/0x1b0
<4> [738.719873]  ? down_read+0x65/0xd0
<4> [738.719874]  down_read+0x65/0xd0
<4> [738.719876]  __do_page_fault+0x3d9/0x4f0
<4> [738.719880]  ? page_fault+0x8/0x30
<4> [738.719882]  page_fault+0x1e/0x30
<4> [738.719884] RIP: 0033:0x7f12e860a1c3
<4> [738.719886] Code: Bad RIP value.
<4> [738.719887] RSP: 002b:00007f12c19a0530 EFLAGS: 00010202
<4> [738.719889] RAX: ffffffffffffff92 RBX: 00007f12640221f0 RCX: 00007f12e8609ed9
<4> [738.719890] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000000000000000
<4> [738.719892] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffcfc8c3080
<4> [738.719893] R10: 00007f12c19a0590 R11: 0000000000000246 R12: 00007f12640221c8
<4> [738.719894] R13: 00007f126402221c R14: 00007f12c19a0650 R15: 00007f1264022214
<3> [738.719902] INFO: task gem_tiled_swapp:2533 blocked for more than 61 seconds.
<3> [738.719918]       Tainted: G     U  W         5.2.0-rc2-CI-CI_DRM_6157+ #1
<3> [738.719969] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
<6> [738.720011] gem_tiled_swapp D10928  2533   2357 0x00000000
<4> [738.720017] Call Trace:
<4> [738.720024]  ? __schedule+0x311/0x840
<4> [738.720033]  ? wait_for_common+0x116/0x1f0
<4> [738.720038]  schedule+0x2e/0xb0
<4> [738.720042]  schedule_timeout+0x21b/0x3d0
<4> [738.720047]  ? lock_acquire+0xa6/0x1c0
<4> [738.720051]  ? generic_exec_single+0xad/0xe0
<4> [738.720056]  ? wait_for_common+0x116/0x1f0
<4> [738.720061]  ? mark_held_locks+0x49/0x70
<4> [738.720065]  ? _raw_spin_unlock_irq+0x24/0x50
<4> [738.720071]  ? wait_for_common+0x116/0x1f0
<4> [738.720076]  wait_for_common+0x13a/0x1f0
<4> [738.720082]  ? wake_up_q+0x80/0x80
<4> [738.720091]  rcu_barrier+0x11c/0x190
<4> [738.720148]  i915_drop_caches_set+0xb1/0x270 [i915]
<4> [738.720157]  simple_attr_write+0xb0/0xd0
<4> [738.720164]  full_proxy_write+0x51/0x80
<4> [738.720171]  vfs_write+0xbd/0x1b0
<4> [738.720177]  ksys_write+0x8f/0xe0
<4> [738.720184]  do_syscall_64+0x55/0x1c0
<4> [738.720189]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [738.720193] RIP: 0033:0x7f0b8b848281
<4> [738.720197] Code: Bad RIP value.
<4> [738.720200] RSP: 002b:00007ffcda03e318 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4> [738.720204] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0b8b848281
<4> [738.720208] RDX: 0000000000000005 RSI: 00007ffcda03e370 RDI: 0000000000000007
<4> [738.720210] RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000000
<4> [738.720213] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffcda03e370
<4> [738.720216] R13: 0000000000000007 R14: 00007ffcda03e370 R15: 00007f0b8b832d80
<4> [738.720230] 
Showing all locks held in the system:
<4> [738.720235] 1 lock held by khungtaskd/52:
<4> [738.720237]  #0: 000000005953c7d5 (rcu_read_lock){....}, at: debug_show_all_locks+0xe/0x1a0
<4> [738.720248] 1 lock held by java/1010:
<4> [738.720250]  #0: 000000000d37b753 (&mm->mmap_sem#2){++++}, at: __do_page_fault+0x3d9/0x4f0
<4> [738.720258] 1 lock held by java/1011:
<4> [738.720259]  #0: 000000000d37b753 (&mm->mmap_sem#2){++++}, at: __do_page_fault+0x3d9/0x4f0
<4> [738.720275] 2 locks held by java/1012:
<4> [738.720277] 1 lock held by java/1021:
<4> [738.720279]  #0: 000000000d37b753 (&mm->mmap_sem#2){++++}, at: __do_page_fault+0x3d9/0x4f0
<4> [738.720287] 1 lock held by java/1026:
<4> [738.720289]  #0: 000000000d37b753 (&mm->mmap_sem#2){++++}, at: __do_page_fault+0x3d9/0x4f0
<4> [738.720296] 1 lock held by java/1160:
<4> [738.720298]  #0: 000000000d37b753 (&mm->mmap_sem#2){++++}, at: __do_page_fault+0x3d9/0x4f0
<4> [738.720305] 1 lock held by java/1516:
<4> [738.720306]  #0: 000000000d37b753 (&mm->mmap_sem#2){++++}, at: __do_page_fault+0x3d9/0x4f0
<4> [738.720313] 1 lock held by java/1845:
<4> [738.720316]  #0: 000000000d37b753 (&mm->mmap_sem#2){++++}, at: __do_page_fault+0x3d9/0x4f0
<4> [738.720323] 1 lock held by java/1848:
<4> [738.720325]  #0: 000000000d37b753 (&mm->mmap_sem#2){++++}, at: __do_page_fault+0x3d9/0x4f0
<4> [738.720332] 1 lock held by java/1849:
<4> [738.720334]  #0: 000000000d37b753 (&mm->mmap_sem#2){++++}, at: __do_page_fault+0x3d9/0x4f0
<4> [738.720341] 1 lock held by java/2070:
<4> [738.720343]  #0: 000000000d37b753 (&mm->mmap_sem#2){++++}, at: vm_mmap_pgoff+0x6a/0xf0
<4> [738.720353] 4 locks held by dmesg/2340:
<4> [738.720356] 3 locks held by gem_tiled_swapp/2533:
<4> [738.720358]  #0: 000000005ad30be7 (sb_writers#14){.+.+}, at: vfs_write+0x18e/0x1b0
<4> [738.720364]  #1: 0000000029f660d1 (&attr->mutex){+.+.}, at: simple_attr_write+0x36/0xd0
<4> [738.720370]  #2: 0000000021bb9f66 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
<4> [738.720377] 
<4> [738.720379] =============================================

<3> [913.276947] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [913.276989] rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-7): P1012
<4> [913.277016] 	(detected by 7, t=260007 jiffies, g=145317, q=11306)
<6> [913.277019] java            R  running task    13352  1012    953 0x00004000
<4> [913.277025] Call Trace:
<4> [913.277034]  ? retint_kernel+0x2b/0x2b
<4> [913.277045]  ? lock_is_held_type+0xe0/0x120
<4> [913.277049]  ? lock_is_held_type+0xfb/0x120
<4> [913.277055]  ? xas_descend+0xe0/0x120
<4> [913.277060]  ? xas_load+0x39/0x50
<4> [913.277064]  ? find_get_entry+0xa2/0x300
<4> [913.277073]  ? pagecache_get_page+0x2b/0x220
<4> [913.277079]  ? lookup_swap_cache+0x3b/0x100
<4> [913.277083]  ? do_swap_page+0x86/0x920
<4> [913.277092]  ? __handle_mm_fault+0x676/0xfc0
<4> [913.277103]  ? handle_mm_fault+0x155/0x350
<4> [913.277110]  ? __do_page_fault+0x248/0x4f0
<4> [913.277116]  ? page_fault+0x8/0x30
<4> [913.277120]  ? page_fault+0x1e/0x30
<3> [1108.281950] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [1108.281986] rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-7): P1012
<4> [1108.282013] 	(detected by 7, t=455012 jiffies, g=145317, q=13146)
<6> [1108.282016] java            R  running task    13352  1012    953 0x00004000
<4> [1108.282022] Call Trace:
<4> [1108.282029]  ? trace_hardirqs_on_thunk+0x1a/0x1c
<4> [1108.282032]  ? lockdep_hardirqs_on+0xe3/0x1b0
<4> [1108.282036]  ? trace_hardirqs_on_thunk+0x1a/0x1c
<4> [1108.282044]  ? retint_kernel+0x2b/0x2b
<4> [1108.282054]  ? lock_is_held_type+0xe0/0x120
<4> [1108.282059]  ? lock_is_held_type+0xfb/0x120
<4> [1108.282065]  ? rcu_lockdep_current_cpu_online+0x38/0x80
<4> [1108.282069]  ? xas_descend+0xe0/0x120
<4> [1108.282074]  ? xas_load+0xa/0x50
<4> [1108.282078]  ? find_get_entry+0xa2/0x300
<4> [1108.282087]  ? pagecache_get_page+0x2b/0x220
<4> [1108.282093]  ? lookup_swap_cache+0x3b/0x100
<4> [1108.282098]  ? do_swap_page+0x86/0x920
<4> [1108.282106]  ? __handle_mm_fault+0x676/0xfc0
<4> [1108.282118]  ? handle_mm_fault+0x155/0x350
<4> [1108.282124]  ? __do_page_fault+0x248/0x4f0
<4> [1108.282130]  ? page_fault+0x8/0x30
<4> [1108.282134]  ? page_fault+0x1e/0x30
<6> [1157.407192] Console: switching to colour frame buffer device 128x48
<6> [1158.248102] gem_tiled_swapp (2533) used greatest stack depth: 10928 bytes left

Comment 1 CI Bug Log 2019-05-29 07:42:07 UTC

The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* SNB HSW: all tests - dmesg-warn - i915_drop_caches_set+
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13107/shard-snb2/igt@gem_pwrite@huge-cpu-forwards.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_295/fi-hsw-4770/igt@gem_pwrite@small-gtt-backwards.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4342/fi-hsw-4770/igt@gem_exec_gttfill@basic.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4342/fi-hsw-4770r/igt@gem_exec_gttfill@basic.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4342/fi-snb-2600/igt@gem_exec_gttfill@basic.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6157/shard-snb2/igt@gem_tiled_swapping@non-threaded.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13063/shard-hsw4/igt@gem_persistent_relocs@forked-interruptible-faulting-reloc-thrashing.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4331/fi-hsw-4770r/igt@gem_exec_gttfill@basic.html
  - https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_4331/fi-snb-2600/igt@gem_exec_gttfill@basic.html

Comment 2 Chris Wilson 2019-05-29 07:46:48 UTC

Not us; we're just a victim here. RCU looks like it has acquired a new race. Does rcutorture show any failures?

Comment 3 Lakshmi 2019-05-29 07:50:10 UTC

Failure reported in Bug 110776 looks similar to this bug report. Is it a duplicate?

Comment 4 CI Bug Log 2019-05-29 08:08:50 UTC

The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* HSW: igt@runner@aborted - fail -  Previous test: gem_pwrite (small-gtt-backwards)
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_295/fi-hsw-4770/igt@runner@aborted.html

Comment 5 Chris Wilson 2019-05-29 21:30:52 UTC

I suspect fixed by

commit f27a5d91201639161d6f6e25af1c89c9cbb3cac7 (drm-intel/topic/core-for-CI, topic/core-for-CI)
Author: Hugh Dickins <hughd@google.com>
Date:   Wed May 29 09:25:40 2019 +0200

    x86/fpu: Use fault_in_pages_writeable() for pre-faulting
    
    Since commit
    
       d9c9ce34ed5c8 ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails")
    
    we use get_user_pages_unlocked() to pre-faulting user's memory if a
    write generates a pagefault while the handler is disabled.
    This works in general and uncovered a bug as reported by Mike Rapoport.
    It has been pointed out that this function may be fragile and a
    simple pre-fault as in fault_in_pages_writeable() would be a better
    solution. Better as in taste and simplicity: That write (as performed by
    the alternative function) performs exactly the same faulting of memory
    that we had before. This was suggested by Hugh Dickins and Andrew
    Morton.
    
    Use fault_in_pages_writeable() for pre-faulting of user's stack.
    
    Fixes: d9c9ce34ed5c8 ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails")
    Suggested-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Hugh Dickins <hughd@google.com>
    [bigeasy: patch description]
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Comment 6 Chris Wilson 2019-05-29 21:32:04 UTC

*** Bug 110776 has been marked as a duplicate of this bug. ***

Comment 7 Chris Wilson 2019-05-29 21:45:46 UTC

Still able to reproduce rcu stalls under duress.

Comment 8 Chris Wilson 2019-05-29 22:11:58 UTC

Fwiw, the same rcu stalls are in v5.2-rc1. I just hope it's reproducible enough for bisection.

Comment 9 Chris Wilson 2019-05-31 07:40:43 UTC

(In reply to Chris Wilson from comment #8)
> Fwiw, the same rcu stalls are in v5.2-rc1. I just hope it's reproducible
> enough for bisection.

Nearly. Failure is easy to spot, but my test is not 100% reliable in triggering them. So far I think I have it narrowed down to the mm/ pull, but I need to rerun all the good commits for a longer soak to confirm that are indeed good. What was 30min, let's make it 3hours.

Comment 10 Chris Wilson 2019-06-01 09:11:06 UTC

5fd4ca2d84b249f0858ce28cf637cf25b61a398f is the first bad commit
commit 5fd4ca2d84b249f0858ce28cf637cf25b61a398f
Author: Matthew Wilcox <willy@infradead.org>
Date:   Mon May 13 17:16:44 2019 -0700

    mm: page cache: store only head pages in i_pages
    
    Transparent Huge Pages are currently stored in i_pages as pointers to
    consecutive subpages.  This patch changes that to storing consecutive
    pointers to the head page in preparation for storing huge pages more
    efficiently in i_pages.
    
    Large parts of this are "inspired" by Kirill's patch
    https://lore.kernel.org/lkml/20170126115819.58875-2-kirill.shutemov@linux.intel.com/
    
    [willy@infradead.org: fix swapcache pages]
      Link: http://lkml.kernel.org/r/20190324155441.GF10344@bombadil.infradead.org
    [kirill@shutemov.name: hugetlb stores pages in page cache differently]
      Link: http://lkml.kernel.org/r/20190404134553.vuvhgmghlkiw2hgl@kshutemo-mobl1
    Link: http://lkml.kernel.org/r/20190307153051.18815-1-willy@infradead.org
    Signed-off-by: Matthew Wilcox <willy@infradead.org>
    Acked-by: Jan Kara <jack@suse.cz>
    Reviewed-by: Kirill Shutemov <kirill@shutemov.name>
    Reviewed-and-tested-by: Song Liu <songliubraving@fb.com>
    Tested-by: William Kucharski <william.kucharski@oracle.com>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>
    Tested-by: Qian Cai <cai@lca.pw>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Song Liu <liu.song.a23@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Comment 11 Chris Wilson 2019-06-01 11:44:44 UTC

[   76.175502] page:ffffea00098e0000 count:0 mapcount:0 mapping:0000000000000000 index:0x1
[   76.175525] flags: 0x8000000000000000()
[   76.175533] raw: 8000000000000000 ffffea0004a7e988 ffffea000445c3c8 0000000000000000
[   76.175538] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
[   76.175543] page dumped because: VM_BUG_ON_PAGE(entry != page)
[   76.175560] ------------[ cut here ]------------
[   76.175564] kernel BUG at mm/swap_state.c:170!
[   76.175574] invalid opcode: 0000 [#1] PREEMPT SMP
[   76.175581] CPU: 0 PID: 131 Comm: kswapd0 Tainted: G     U            5.1.0+ #247
[   76.175586] Hardware name:  /NUC6CAYB, BIOS AYAPLCEL.86A.0029.2016.1124.1625 11/24/2016
[   76.175598] RIP: 0010:__delete_from_swap_cache+0x22e/0x340
[   76.175604] Code: e8 b7 3e fd ff 48 01 1d a8 7e 04 01 48 83 c4 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c6 03 7e bf 81 48 89 c7 e8 92 f8 fd ff <0f> 0b 48 c7 c6 c8 7c bf 81 48 89 df e8 81 f8 fd ff 0f 0b 48 c7 c6
[   76.175613] RSP: 0000:ffffc900008dba88 EFLAGS: 00010046
[   76.175619] RAX: 0000000000000032 RBX: ffffea00098e0040 RCX: 0000000000000006
[   76.175624] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff81bf6d4c
[   76.175629] RBP: ffff888265ed8640 R08: 00000000000002c2 R09: 0000000000000000
[   76.175634] R10: 0000000273a4626d R11: 0000000000000000 R12: 0000000000000001
[   76.175639] R13: 0000000000000040 R14: 0000000000000000 R15: ffffea00098e0000
[   76.175645] FS:  0000000000000000(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
[   76.175651] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   76.175656] CR2: 00007f24e4399000 CR3: 0000000002c09000 CR4: 00000000001406f0
[   76.175661] Call Trace:
[   76.175671]  __remove_mapping+0x1c2/0x380
[   76.175678]  shrink_page_list+0x11db/0x1d10
[   76.175684]  shrink_inactive_list+0x14b/0x420
[   76.175690]  shrink_node_memcg+0x20e/0x740
[   76.175696]  shrink_node+0xba/0x420
[   76.175702]  balance_pgdat+0x27d/0x4d0
[   76.175709]  kswapd+0x216/0x300
[   76.175715]  ? wait_woken+0x80/0x80
[   76.175721]  ? balance_pgdat+0x4d0/0x4d0
[   76.175726]  kthread+0x106/0x120
[   76.175732]  ? kthread_create_on_node+0x40/0x40
[   76.175739]  ret_from_fork+0x1f/0x30
[   76.175745] Modules linked in: i915 intel_gtt drm_kms_helper
[   76.175754] ---[ end trace 8faf2ec849d50724 ]---
[   76.206689] RIP: 0010:__delete_from_swap_cache+0x22e/0x340
[   76.206708] Code: e8 b7 3e fd ff 48 01 1d a8 7e 04 01 48 83 c4 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c6 03 7e bf 81 48 89 c7 e8 92 f8 fd ff <0f> 0b 48 c7 c6 c8 7c bf 81 48 89 df e8 81 f8 fd ff 0f 0b 48 c7 c6
[   76.206718] RSP: 0000:ffffc900008dba88 EFLAGS: 00010046
[   76.206723] RAX: 0000000000000032 RBX: ffffea00098e0040 RCX: 0000000000000006
[   76.206729] RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffffffff81bf6d4c
[   76.206734] RBP: ffff888265ed8640 R08: 00000000000002c2 R09: 0000000000000000
[   76.206740] R10: 0000000273a4626d R11: 0000000000000000 R12: 0000000000000001
[   76.206745] R13: 0000000000000040 R14: 0000000000000000 R15: ffffea00098e0000
[   76.206750] FS:  0000000000000000(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
[   76.206757] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

Comment 12 Chris Wilson 2019-06-02 22:15:52 UTC

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9f8bce9a6b32..e15b67d4373a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2501,6 +2501,11 @@ static void __split_huge_page(struct page *page, struct list_head *list,
                } else if (!PageAnon(page)) {
                        __xa_store(&head->mapping->i_pages, head[i].index,
                                        head + i, 0);
+               } else if (PageSwapCache(page)) {
+                       swp_entry_t entry = { .val = page_private(head + i) };
+                       __xa_store(&swap_address_space(entry)->i_pages,
+                                       swp_offset(entry),
+                                       head + i, 0);
                }
        }

Comment 13 Chris Wilson 2019-06-03 07:13:51 UTC

Reported upstream and applied band aid to core-for-CI.

Comment 14 Chris Wilson 2019-06-03 12:13:58 UTC

*** Bug 110827 has been marked as a duplicate of this bug. ***

Comment 15 Mike Lothian 2019-06-25 12:05:59 UTC

Hi Chris

The fix:

commit d7fbcebcb044e9e602a730138621471c619b87db
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Jun 3 08:08:43 2019 +0100

    mm: Band aid for 5fd4ca2d84b249f0858ce28cf637cf25b61a398f

Breaks the build of the kernel for me:

axion /usr/src/intel # make
  CALL    scripts/checksyscalls.sh
  CALL    scripts/atomic/check-atomics.sh
  DESCEND  objtool
  CHK     include/generated/compile.h
  CHK     kernel/kheaders_data.tar.xz
  CC      mm/huge_memory.o
mm/huge_memory.c: In function ‘__split_huge_page’:
mm/huge_memory.c:2504:41: warning: dereferencing ‘void *’ pointer
 2504 |    __xa_store(&swap_address_space(entry)->i_pages,
      |                                         ^~
mm/huge_memory.c:2504:41: error: request for member ‘i_pages’ in something not a structure or union
make[1]: *** [scripts/Makefile.build:279: mm/huge_memory.o] Error 1
make: *** [Makefile:1071: mm] Error 2



Reverting your change gets it building again

Comment 16 Chris Wilson 2019-06-25 13:17:35 UTC

It's a trivial CONFIG_SWAP dependency. Still waiting for Matthew Wilcox to fix this regression for realz.

Comment 17 Mike Lothian 2019-06-25 14:35:49 UTC

Ah, I don't have swap enabled here

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.