https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5824/shard-iclb2/igt@kms_flip@flip-vs-fences-interruptible.html <1> [1060.203377] BUG: unable to handle kernel paging request at ffffea0003ff8030 <1> [1060.203381] #PF error: [normal kernel read fault] <6> [1060.203383] PGD 4a02f7067 P4D 4a02f7067 PUD 4a02f6067 PMD 0 <4> [1060.203388] Oops: 0000 [#1] PREEMPT SMP NOPTI <4> [1060.203391] CPU: 6 PID: 57 Comm: khugepaged Tainted: G U 5.1.0-rc2-CI-CI_DRM_5824+ #1 <4> [1060.203393] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019 <4> [1060.203398] RIP: 0010:compaction_alloc+0x623/0x940 <4> [1060.203401] Code: ff 48 c1 e5 06 48 01 c5 e9 e8 00 00 00 48 8b 04 24 49 89 ed 80 b8 7d 04 00 00 00 0f 84 08 01 00 00 4d 85 ed 0f 84 a7 00 00 00 <41> 8b 45 30 25 80 00 00 f0 3d 00 00 00 f0 0f 84 02 01 00 00 41 80 <4> [1060.203403] RSP: 0018:ffffc900002ab938 EFLAGS: 00010286 <4> [1060.203405] RAX: ffffffff8230b1c0 RBX: 8000000000100000 RCX: 000000000000003d <4> [1060.203407] RDX: 80000000000ffe00 RSI: 0000000000000000 RDI: ffff8884b02f8120 <4> [1060.203409] RBP: ffffea0003ff8000 R08: 0000000000000000 R09: 0000000000000001 <4> [1060.203411] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 <4> [1060.203413] R13: ffffea0003ff8000 R14: ffffc900002abb30 R15: 80000000000ffe00 <4> [1060.203415] FS: 0000000000000000(0000) GS:ffff88849ff80000(0000) knlGS:0000000000000000 <4> [1060.203417] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [1060.203419] CR2: ffffea0003ff8030 CR3: 000000048f5b2001 CR4: 0000000000760ee0 <4> [1060.203421] PKRU: 55555554 <4> [1060.203422] Call Trace: <4> [1060.203429] migrate_pages+0x122/0xb60 <4> [1060.203432] ? isolate_freepages_block+0x460/0x460 <4> [1060.203435] ? __reset_isolation_suitable+0x110/0x110 <4> [1060.203439] compact_zone+0x604/0xf50 <4> [1060.203444] compact_zone_order+0xda/0x120 <4> [1060.203451] ? try_to_compact_pages+0xb2/0x2b0 <4> [1060.203453] try_to_compact_pages+0xb2/0x2b0 <4> [1060.203457] __alloc_pages_direct_compact+0x62/0x150 <4> [1060.203461] __alloc_pages_nodemask+0x71a/0x1120 <4> [1060.203467] ? khugepaged+0x23b/0x25f0 <4> [1060.203471] khugepaged+0x2dc/0x25f0 <4> [1060.203479] ? wait_woken+0xa0/0xa0 <4> [1060.203483] ? collapse_shmem.isra.8+0xeb0/0xeb0 <4> [1060.203486] kthread+0x119/0x130 <4> [1060.203489] ? kthread_park+0x80/0x80 <4> [1060.203493] ret_from_fork+0x24/0x50 <4> [1060.203498] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp cdc_ether usbnet mii snd_hda_intel snd_hda_codec crct10dif_pclmul crc32_pclmul snd_hwdep snd_hda_core snd_pcm ghash_clmulni_intel e1000e ptp pps_core mei_me mei i915 prime_numbers <0> [1060.203512] Dumping ftrace buffer:
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * ICL: igt@kms_flip@flip-vs-fences-interruptible - dmesg-warn- BUG: unable to handle kernel paging request - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5824/shard-iclb2/igt@kms_flip@flip-vs-fences-interruptible.html
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * ICL: igt@runner@aborted - fail - Previous test: kms_flip (flip-vs-fences-interruptible) - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5824/shard-iclb2/igt@runner@aborted.html
A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@kms_flip@flip-vs-fences-interruptible - dmesg-warn- BUG: unable to handle kernel paging request -} {+ ICL: igt@kms_flip@flip-vs-fences-interruptible - dmesg-warn- (BUG: unable to handle kernel paging request|general protection fault: 0000) +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5831/shard-iclb6/igt@kms_flip@flip-vs-fences-interruptible.html
A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@kms_flip@flip-vs-fences-interruptible - dmesg-warn- (BUG: unable to handle kernel paging request|general protection fault: 0000) -} {+ ICL: igt@kms_flip@flip-vs-fences-interruptible / igt@gem_create@create-clear - dmesg-warn- (BUG: unable to handle kernel paging request|general protection fault: 0000) +} No new failures caught with the new filter
A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@runner@aborted - fail - Previous test: kms_flip (flip-vs-fences-interruptible) -} {+ ICL: igt@runner@aborted - fail - Previous test: (kms_flip|gem_create) +} No new failures caught with the new filter
(In reply to CI Bug Log from comment #4) > A CI Bug Log filter associated to this bug has been updated: > > {- ICL: igt@kms_flip@flip-vs-fences-interruptible - dmesg-warn- (BUG: unable > to handle kernel paging request|general protection fault: 0000) -} > {+ ICL: igt@kms_flip@flip-vs-fences-interruptible / > igt@gem_create@create-clear - dmesg-warn- (BUG: unable to handle kernel > paging request|general protection fault: 0000) +} > > No new failures caught with the new filter https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5816/shard-iclb8/igt@gem_create@create-clear.html https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5808/shard-iclb6/igt@gem_create@create-clear.html Since the test igt@kms_flip@flip-vs-fences-interruptible is using execbuf and GTT fences, and we also fail at igt@gem_create@create-clear, there is a chance that this is not a Linux issue, but instead a GEM one. Given that this issue happened 4 times in a week on 4 different machines and that the outcome of this issue is a oops, which breaks the users' machines until they reboot, it is fair to increase the priority to highest.
Next step would be to see if this is reproducible consistently and bisect to the culprit commit (wherever that may be). gem_create@create-clear might be the easier way to trigger this.
We also see page corruption in gem_create/create-clear on shard-snb. Tomi did some digging and found that it first occurred circa CI_DRM_5800 (which is about a hundred runs after gem_create/create-clear was introduced, so reasonable to conclude that the sporadic failures were introduced by a later kernel update). The implication is that this an -rc1 failure.
*** Bug 110340 has been marked as a duplicate of this bug. ***
Andi, can you see if you can pinpoint the kernel commit that introduced this?
BIOS was updated on shards 10th of Apr.
Lowering the priority because the issue got seen twice in 13 runs, but then nothing for 105 runs. We'll close it next week, when the issue pops up at the top of the open bugs view of cibuglog. Andi, would be great if you could try to reproduce on CI_DRM_5824 and, if you succeed try to reproduce on drmtip? This would give us confidence that this indeed was a SW issue :)
(In reply to Martin Peres from comment #12) > Lowering the priority because the issue got seen twice in 13 runs, but then > nothing for 105 runs. We'll close it next week, when the issue pops up at > the top of the open bugs view of cibuglog. It popped up again at the top, now is time to close it since it did not happen again! Last failure happened on CI_DRM_5831, now not seen for 164 runs which is above the 10x rule. Closing!
If it happens again, go and check whether [1] fixes it. [1] https://lore.kernel.org/lkml/1558711908-15688-1-git-send-email-suzuki.poulose@arm.com/
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.