Bug 110461 - [CI][BAT] igt@runner@aborted - fail - Previous test: i915_selftest (live_contexts), GEM_BUG_ON(idx.pml4e >= 512)
Summary: [CI][BAT] igt@runner@aborted - fail - Previous test: i915_selftest (live_cont...
Status: RESOLVED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-17 11:25 UTC by Martin Peres
Modified: 2019-06-03 05:32 UTC (History)
1 user (show)

See Also:
i915 platform: ICL
i915 features: GEM/Other


Attachments

Description Martin Peres 2019-04-17 11:25:20 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5945/fi-icl-y/igt@runner@aborted.html

<3>[  497.023099] gen8_ppgtt_insert_4lvl:1183 GEM_BUG_ON(idx.pml4e >= 512)
<4>[  497.023253] ------------[ cut here ]------------
<2>[  497.023255] kernel BUG at drivers/gpu/drm/i915/i915_gem_gtt.c:1183!
<4>[  497.023264] invalid opcode: 0000 [#1] PREEMPT SMP PTI
<4>[  497.023267] CPU: 5 PID: 4985 Comm: i915_selftest Tainted: G     U            5.1.0-rc5-CI-CI_DRM_5945+ #1
<4>[  497.023269] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3071.A00.1902120336 02/12/2019
<4>[  497.023334] RIP: 0010:gen8_ppgtt_insert_4lvl+0x398/0x980 [i915]
<4>[  497.023336] Code: 67 ed 8a e0 48 8b 35 47 e1 20 00 49 c7 c0 c2 56 9f a0 b9 9f 04 00 00 48 c7 c2 20 43 9a a0 48 c7 c7 93 80 87 a0 e8 b8 93 91 e0 <0f> 0b 83 f9 01 48 19 c9 83 e1 02 48 83 c1 01 48 89 ce 48 83 ce 18
<4>[  497.023338] RSP: 0018:ffffc90000d73850 EFLAGS: 00010282
<4>[  497.023341] RAX: 000000000000000c RBX: 0000000000000001 RCX: 0000000000000000
<4>[  497.023342] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88829adba538
<4>[  497.023344] RBP: 00000000000001ff R08: 000000000021cc74 R09: ffff88829a134000
<4>[  497.023346] R10: 0000000000000000 R11: ffff88829adba538 R12: 00000001a821c000
<4>[  497.023348] R13: ffff8881a81834e0 R14: 0000000000000000 R15: 00000001a821d000
<4>[  497.023349] FS:  00007f0dff054980(0000) GS:ffff88829bf40000(0000) knlGS:0000000000000000
<4>[  497.023351] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  497.023353] CR2: 000055590cf7d180 CR3: 00000002699b4005 CR4: 0000000000760ee0
<4>[  497.023354] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[  497.023356] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[  497.023358] PKRU: 55555554
<4>[  497.023359] Call Trace:
<4>[  497.023410]  ppgtt_bind_vma+0x5b/0x60 [i915]
<4>[  497.023461]  i915_vma_bind+0xe8/0x2c0 [i915]
<4>[  497.023507]  __i915_vma_do_pin+0x99/0xdc0 [i915]
<4>[  497.023553]  gpu_fill+0x653/0xa10 [i915]
<4>[  497.023559]  ? _raw_spin_unlock_irqrestore+0x39/0x60
<4>[  497.023600]  igt_ctx_exec+0x2d9/0x440 [i915]
<4>[  497.023659]  __i915_subtests+0x1a4/0x1e0 [i915]
<4>[  497.023711]  __run_selftests+0x112/0x170 [i915]
<4>[  497.023751]  i915_live_selftests+0x2c/0x60 [i915]
<4>[  497.023793]  i915_pci_probe+0x50/0xa0 [i915]
<4>[  497.023799]  pci_device_probe+0xa1/0x120
<4>[  497.023802]  really_probe+0xf3/0x3e0
<4>[  497.023805]  driver_probe_device+0x10a/0x120
<4>[  497.023808]  device_driver_attach+0x4b/0x50
<4>[  497.023810]  __driver_attach+0x97/0x130
<4>[  497.023813]  ? device_driver_attach+0x50/0x50
<4>[  497.023816]  bus_for_each_dev+0x74/0xc0
<4>[  497.023819]  bus_add_driver+0x13f/0x210
<4>[  497.023823]  ? 0xffffffffa0036000
<4>[  497.023825]  driver_register+0x56/0xe0
<4>[  497.023827]  ? 0xffffffffa0036000
<4>[  497.023830]  do_one_initcall+0x58/0x2e0
<4>[  497.023833]  ? do_init_module+0x1d/0x1ea
<4>[  497.023836]  ? rcu_read_lock_sched_held+0x6f/0x80
<4>[  497.023839]  ? kmem_cache_alloc_trace+0x261/0x290
<4>[  497.023843]  do_init_module+0x56/0x1ea
<4>[  497.023845]  load_module+0x2701/0x29e0
<4>[  497.023853]  ? __se_sys_finit_module+0xd3/0xf0
<4>[  497.023855]  __se_sys_finit_module+0xd3/0xf0
<4>[  497.023861]  do_syscall_64+0x55/0x190
<4>[  497.023864]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  497.023866] RIP: 0033:0x7f0dfe911839
<4>[  497.023869] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
<4>[  497.023871] RSP: 002b:00007ffd73de6108 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4>[  497.023873] RAX: ffffffffffffffda RBX: 0000555c5d8283f0 RCX: 00007f0dfe911839
<4>[  497.023875] RDX: 0000000000000000 RSI: 0000555c5d821970 RDI: 0000000000000006
<4>[  497.023876] RBP: 0000555c5d821970 R08: 0000000000000004 R09: 0000000000000000
<4>[  497.023878] R10: 00007ffd73de6280 R11: 0000000000000246 R12: 0000000000000000
<4>[  497.023880] R13: 0000555c5d822070 R14: 0000000000000020 R15: 0000000000000047
<4>[  497.023884] Modules linked in: i915(+) amdgpu gpu_sched ttm vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ax88179_178a usbnet mii x86_pkg_temp_thermal mei_hdcp coretemp crct10dif_pclmul crc32_pclmul snd_hda_codec snd_hwdep snd_hda_core ghash_clmulni_intel e1000e snd_pcm mei_me ptp pps_core mei prime_numbers [last unloaded: i915]
Comment 1 CI Bug Log 2019-04-17 11:25:53 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* ICL: igt@runner@aborted - fail - Previous test: i915_selftest (live_contexts)
  - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5945/fi-icl-y/igt@runner@aborted.html

  - https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12781/fi-icl-y/igt@runner@aborted.html
Comment 2 Chris Wilson 2019-04-17 12:19:18 UTC
Why igt@runner@aborted and not the test that failed? For which we will go lalalala and pretend didn't happen, because it is quite clearly impossible...
Comment 3 Francesco Balestrieri 2019-04-18 08:30:54 UTC
The code path that triggered this failure is not a corner case but a normal user path, meaning that this issue could occur in normal usage and have high user impact. However the conditions that led to it are very unlikely (and so far unexplained), so we predict that the occurrence rate of this bug will be very low if it even happens again.

The best course of action is to wait for this to occur again, and add more meaningful assertions meanwhile. There is little action we can take based on the information in this report.

I'm moving this to high priority for the time being, but it should be raised again if the reproduction rate turns out to be more than negligible.
Comment 4 Jani Saarinen 2019-04-22 15:45:41 UTC
Seen only once now.
Comment 5 Francesco Balestrieri 2019-06-03 05:32:09 UTC
Still only once. In the spirit of risk taking, I'm closing it.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.