Bug 110401 - [CI][DRMTIP] igt@gem_create@create-clear - incomplete - system hang/timeout
Summary: [CI][DRMTIP] igt@gem_create@create-clear - incomplete - system hang/timeout
Status: RESOLVED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-11 05:55 UTC by Lakshmi
Modified: 2019-09-08 14:44 UTC (History)
1 user (show)

See Also:
i915 platform: CFL, HSW, KBL, SKL
i915 features: GEM/Other


Attachments

Description Lakshmi 2019-04-11 05:55:41 UTC
All the failures are incompletes. 

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_249/fi-cfl-guc/igt@gem_create@create-clear.html

<6> [23.212172] Console: switching to colour dummy device 80x25
<6> [23.212212] [IGT] gem_create: executing
<6> [23.217046] [IGT] gem_create: starting subtest create-clear
<6> [23.217130] gem_create (1074): drop_caches: 4
<1> [29.434977] BUG: Bad page state in process kworker/9:1  pfn:25e8c0
<4> [29.435036] page:ffffd768097a3000 count:0 mapcount:-128 mapping:0000000000000000 index:0x1
<4> [29.435039] flags: 0x8000000000000000()
<4> [29.435042] raw: 8000000000000000 dead000000000100 dead000000000200 0000000000000000
<4> [29.435044] raw: 0000000000000001 0000000000000000 00000000ffffff7f 0000000000000000
<4> [29.435045] page dumped because: nonzero mapcount
<4> [29.435046] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 mei_hdcp x86_pkg_temp_thermal coretemp snd_hda_intel crct10dif_pclmul snd_hda_codec crc32_pclmul ghash_clmulni_intel snd_hwdep snd_hda_core e1000e snd_pcm mei_me mei ptp pps_core prime_numbers
<4> [29.435059] CPU: 9 PID: 259 Comm: kworker/9:1 Tainted: G     U            5.1.0-rc3-g03f3a57c6df4-drmtip_249+ #1
<4> [29.435060] Hardware name: Micro-Star International Co., Ltd. MS-7B54/Z370M MORTAR (MS-7B54), BIOS 1.10 12/28/2017
<4> [29.435064] Workqueue: events delayed_fput
<4> [29.435066] Call Trace:
<4> [29.435070]  dump_stack+0x67/0x9b
<4> [29.435074]  bad_page+0xbf/0x120
<4> [29.435077]  free_pcppages_bulk+0x470/0x6a0
<4> [29.435085]  free_unref_page_list+0x111/0x250

https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_249/fi-kbl-x1275/igt@gem_create@create-clear.html

 ------------[ cut here ]------------
<4> [26.557602] list_del corruption. prev->next should be ffffc89689039f88, but was ffffc896898f3b08
<4> [26.557612] WARNING: CPU: 4 PID: 994 at lib/list_debug.c:53 __list_del_entry_valid+0x79/0x90
<4> [26.557614] Modules linked in: i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul igb crc32_pclmul ghash_clmulni_intel ptp pps_core mei_me prime_numbers mei acpi_power_meter
<4> [26.557622] CPU: 4 PID: 994 Comm: dmesg Tainted: G     U            5.1.0-rc3-g03f3a57c6df4-drmtip_249+ #1
<4> [26.557623] Hardware name: Intel Corporation S1200SP/S1200SP, BIOS S1200SP.86B.03.01.0026.092720170729 09/27/2017
<4> [26.557625] RIP: 0010:__list_del_entry_valid+0x79/0x90
<4> [26.557627] Code: 0b 31 c0 c3 48 89 fe 48 c7 c7 08 7f 09 ac e8 2e 33 be ff 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 7f 09 ac e8 17 33 be ff <0f> 0b 31 c0 c3 48 c7 c7 80 7f 09 ac e8 06 33 be ff 0f 0b 31 c0 c3
<4> [26.557628] RSP: 0000:ffffa50d80c3bbe0 EFLAGS: 00010086
<4> [26.557630] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
<4> [26.557631] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000ffffffff
<4> [26.557632] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000001
<4> [26.627213] CPU: 3 PID: 1 Comm: systemd Tainted: G    BU  W         5.1.0-rc3-g03f3a57c6df4-drmtip_249+ #1
<4> [26.641210] CPU: 3 PID: 1 Comm: systemd Tainted: G    BU  W         5.1.0-rc3-g03f3a57c6df4-drmtip_249+ #1
<4> [26.655292] CPU: 3 PID: 1 Comm: systemd Tainted: G    BU  W         5.1.0-rc3-g03f3a57c6df4-drmtip_249+ #1
<4> [26.676457]  ? page_fault+0x8/0x30
<4> [26.690920] RSP: 0000:ffffa50d8006fa88 EFLAGS: 00010086
<4> [27.675094] hardirqs last disabled at (83564): [<ffffffffab9ad9ea>] __schedule+0xaa/0xb40
<4> [28.337724] softirqs last disabled at (83260): [<ffffffffab9278ba>] unix_sock_destructor+0x4a/0xb0
<4> [29.211553] R10: 00000000ffffffff R11: 0000000000000246 R12: ffffffffffffffff
<4> [30.462609] RSP: 0000:ffffa50d8006fee0 EFLAGS: 00010086
<4> [31.145107] RAX: 0000000000001000 RBX: 0000000000000000 RCX: 0000000000001000
<4> [31.848697] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4> [33.867078] Modules linked in: i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul igb crc32_pclmul ghash_clmulni_intel ptp pps_core mei_me prime_numbers mei acpi_power_meter
<4> [35.439127] RDX: 0000000000000009 RSI: 00007ffc67ba7198 RDI: 0000000000000000
<4> [36.977790] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
<4> [38.746738] FS:  0000000000000000(0000) GS:ffff928d2b6c0000(0000) knlGS:0000000000000000
<4> [40.302408] RIP: 0010:ext4_mpage_readpages+0x80e/0x870
<4> [41.216185] R13: ffffc896890306c8 R14: fffffffffffffff0 R15: 000000000000003f
<4> [42.148272] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4> [42.382942] FS:  00007f9d4af4c8c0(0000) GS:ffff928d2b680000(0000) knlGS:0000000000000000
<4> [42.391425] R10: 0000000000000000 R11: ffffa50d8006fd90 R12: dead000000000200


The rest of the failures doesn't have much in the logs. Let me know if any failure needs a separate bug.
Comment 2 Lakshmi 2019-04-11 05:57:41 UTC
Is it a firmware bug?
Comment 3 Chris Wilson 2019-04-11 06:43:04 UTC
(In reply to Lakshmi from comment #2)
> Is it a firmware bug?

No, a bug in mm/ is my bet.
Comment 4 CI Bug Log 2019-08-21 13:33:45 UTC
A CI Bug Log filter associated to this bug has been updated:

{- GUC: igt@gem_create@create-clear - incomplete - BUG: Bad page state in process kworker -}
{+ CFL: igt@gem_create@create-clear - incomplete - BUG: Bad page state in process kworker +}


  No new failures caught with the new filter
Comment 5 Chris Wilson 2019-09-08 14:44:32 UTC
This particular bad page has not reoccurred. I believe it was not our bug...


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.