CI_DRM_3196 fi-glk-1 incomplete Something weird is going on for this run: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3196/fi-glk-1/dmesg0.log dmes stop at: <7>[ 388.921308] [IGT] kms_cursor_legacy: executing but there are 2 oops logs where the test appear to keep running: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3196/fi-glk-1/dmesg-1507288911_Oops_1.log <14>[ 388.993250] [IGT] kms_cursor_legacy: starting subtest basic-flip-after-cursor-legacy <2>[ 390.244184] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:880! https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3196/fi-glk-1/dmesg-1507288911_Oops_1.log <14>[ 569.068084] [IGT] drv_module_reload: starting subtest basic-reload ... <1>[ 569.561140] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 the: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3196/fi-glk-1/dmesg-1507563092_Panic_2.log refers to the: <2>[ 390.244184] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:880! So, what is the the oops in the second oops file? https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3196/fi-glk-1/igt@kms_cursor_legacy@basic-flip-after-cursor-legacy.html
For the <7>[ 569.520217] [drm:intel_atomic_commit_tail [i915]] [CRTC:42:pipe A] <1>[ 569.561140] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 <1>[ 569.561218] IP: intel_gpu_reset+0xef/0x1b0 [i915] dev_priv->engines[RCS] is 0, looks like memory corruption. One possibility is freeing dev_priv too early. Adding Chris for more insight.
<4>[ 569.561246] Modules linked in: vgem snd_hda_codec_realtek snd_hda_codec_generic i915(-) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mii prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel [last unloaded: snd_hda_intel] <4>[ 569.561324] CPU: 3 PID: 4239 Comm: drv_module_relo Tainted: G U 4.14.0-rc3-CI-Patchwork_5921+ #1 so the https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3196/fi-glk-1/dmesg-1507288911_Oops_1.log is from an "old" patchwork run.
How often does it happen? Does KASAN find anything?
Note, data from: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3196/fi-glk-1/dmesg-1507288911_Oops_1.log <14>[ 569.068084] [IGT] drv_module_reload: starting subtest basic-reload ... <1>[ 569.561140] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 should be ignorded. So, this is a "plain" kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:880! hence it should be duplicate of BUG 103190, which is now duplicate of BUG 102035. Also, quote from IRC 2017-10-11: <marta_> dolphin, ickle : this on APL-shards there has now been 3 occurences of kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:880! in 7 CI runs. see bug: https://bugs.freedesktop.org/show_bug.cgi?id=103190 <dolphin> marta_: does it happen to have IOMMU on, by any chance? <dolphin> aka. Intel VT-d turned on from BIOS * itoral (~itoral@fanzine.igalia.com) has joined * danvet (~Daniel@2a02:168:5635:0:39d2:f87e:2033:9f6) has joined <dolphin> If you can try running "intel_iommu=off" instead of igfx_off, that'd give a datapoint * cristi (~majeru@131.228.216.128) has joined <dolphin> marta_: it's the same issue as with https://bugs.freedesktop.org/show_bug.cgi?id=102035 <marta_> thanks dolphin I will dup <marta_> tsa is dolphin suggestion ^^ something to test? * xerpi (~xerpi@59.red-88-23-235.staticip.rima-tde.net) has joined <tsa> need to pick up one APL shard and check bios option / test kernel param <tsa> looking at the 3206 missing htmls now\ * lemonzest (~lemonzest@unaffiliated/lemonzest) has joined * [Enrico] (~chiccoroc@gentoo/contributor/Enrico) has joined <dolphin> marta_: yep, double-checked it, there just have been additions to the file <dolphin> it may be more convenient for you guys to look at the actual line of code in the file, when filing bug <dolphin> this this case: GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED); <dolphin> so the bug could be best identified as intel_lrc_irq_handler()/GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED) <dolphin> If we had not been having the same error before we added any pre-emption code, one would think there's an error in the pre-emption code, but it failed before
*** This bug has been marked as a duplicate of bug 102035 ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.