Summary: | [CI][SHARDS] igt@runner@aborted - fail - general protection fault: 0000 [#1] PREEMPT SMP NOPTI, Workqueue: events azx_probe_work | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Martin Peres <martin.peres> | ||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||
Severity: | normal | ||||||
Priority: | high | CC: | intel-gfx-bugs | ||||
Version: | XOrg git | ||||||
Hardware: | Other | ||||||
OS: | All | ||||||
Whiteboard: | ReadyForDev | ||||||
i915 platform: | ICL | i915 features: | display/audio | ||||
Attachments: |
|
Description
Martin Peres
2019-04-10 14:18:42 UTC
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * ICL: igt@runner@aborted - fail - Previous test: nothing, Kernel badly tainted (0x2a0) - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5896/shard-iclb2/igt@runner@aborted.html On shards there was BIOS update to BIOS ICLSFWR1.R00.3121.A00.1903190527 03/19/2019. If we get rid of the noise from the dmesg, we get: <4>[ 11.837706] general protection fault: 0000 [#1] PREEMPT SMP NOPTI <4>[ 11.837755] CPU: 4 PID: 190 Comm: kworker/4:3 Not tainted 5.1.0-rc4-CI-CI_DRM_5896+ #1 <4>[ 11.837788] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3087.A00.1902250334 02/25/2019 <4>[ 11.837843] Workqueue: events azx_probe_work [snd_hda_intel] <4>[ 11.837869] RIP: 0010:kobject_uevent_env+0x38/0x810 <4>[ 11.837890] Code: 53 49 89 d5 48 89 c3 48 83 ec 38 4c 8b 24 c5 c0 88 ef 81 0f 1f 44 00 00 4c 89 f0 eb 0d 48 8b 40 18 48 85 c0 0f 84 27 01 00 00 <4c> 8b 78 20 4d 85 ff 74 ea 41 f6 46 3c 10 0f 85 f9 00 00 00 49 8b <4>[ 11.837957] RSP: 0018:ffffc9000144bd18 EFLAGS: 00010296 <4>[ 11.837980] RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000002 RCX: 0000000000000000 <4>[ 11.838008] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 6b6b6b6b6b6b6b6b <4>[ 11.838036] RBP: ffff88848250b3f8 R08: 0000000085e480ea R09: 0000000000000001 <4>[ 11.838063] R10: 0000000000000000 R11: ffff888499ee48b8 R12: ffffffff820df667 <4>[ 11.839071] R13: 0000000000000000 R14: 6b6b6b6b6b6b6b6b R15: ffff88849507df00 <4>[ 11.840027] FS: 0000000000000000(0000) GS:ffff88849ff00000(0000) knlGS:0000000000000000 <4>[ 11.840992] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 11.841951] CR2: 0000556b60348c78 CR3: 0000000005214004 CR4: 0000000000760ee0 <4>[ 11.842915] PKRU: 55555554 <4>[ 11.843882] Call Trace: <4>[ 11.844846] ? add_widget_node+0x68/0xa0 [snd_hda_core] <4>[ 11.845813] widget_tree_create+0xf6/0x110 [snd_hda_core] <4>[ 11.847717] hda_widget_sysfs_init+0x1a/0x40 [snd_hda_core] <4>[ 11.847720] snd_hdac_device_register+0x19/0x40 [snd_hda_core] <4>[ 11.849720] snd_hda_codec_configure+0x39/0x160 [snd_hda_codec] <4>[ 11.849724] azx_codec_configure+0x2a/0x60 [snd_hda_codec] <4>[ 11.851936] azx_probe_work+0x42e/0x7e0 [snd_hda_intel] <4>[ 11.853928] process_one_work+0x245/0x610 <4>[ 11.856202] worker_thread+0x37/0x380 <4>[ 11.856204] ? process_one_work+0x610/0x610 <4>[ 11.858091] kthread+0x119/0x130 <4>[ 11.858092] ? kthread_park+0x80/0x80 <4>[ 11.858095] ret_from_fork+0x24/0x50 <4>[ 11.859975] Modules linked in: snd_hda_codec_hdmi(+) snd_hda_codec_realtek(+) snd_hda_codec_generic x86_pkg_temp_thermal coretemp mei_hdcp crct10dif_pclmul crc32_pclmul snd_hda_intel snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core cdc_ether usbnet e1000e snd_pcm mii ptp pps_core mei_me mei i915 prime_numbers <4>[ 11.862139] ---[ end trace bf38e6abac39589f ]--- moved lines: <7>[ 11.845330] [drm:i915_audio_component_get_eld [i915]] Not valid for port B <7>[ 11.846800] [drm:i915_audio_component_get_eld [i915]] Not valid for port B <7>[ 11.848782] [drm:i915_audio_component_get_eld [i915]] Not valid for port B <7>[ 11.850981] [drm:i915_audio_component_get_eld [i915]] Not valid for port C <7>[ 11.852973] [drm:i915_audio_component_get_eld [i915]] Not valid for port C <7>[ 11.855224] [drm:i915_audio_component_get_eld [i915]] Not valid for port C <7>[ 11.857129] [drm:i915_audio_component_get_eld [i915]] Not valid for port D <7>[ 11.860861] [drm:i915_audio_component_get_eld [i915]] Not valid for port D <7>[ 11.858999] [drm:i915_audio_component_get_eld [i915]] Not valid for port D Additional info: <7>[ 11.112111] [drm:drm_add_edid_modes] ELD: no CEA Extension found <7>[ 11.529366] [drm:drm_add_edid_modes] ELD: no CEA Extension found <7>[ 11.583388] [drm:drm_add_edid_modes] ELD: no CEA Extension found <7>[ 11.601007] [drm:drm_add_edid_modes] ELD: no CEA Extension found ------------------------------------------------------------------- So, what appears to be happening is that there are no devices connected that advertise an ELD that is audio capable, so none of the DDI ports ever gets has_audio set, which means that: /* referred in audio callbacks */ dev_priv->av_enc_map[pipe] = encoder; from intel_audio_codec_enable (i915/intel_audio.c) never gets called, which means that the call to get_saved_enc always returns NULL, causing the <7>[ 11.845330] [drm:i915_audio_component_get_eld [i915]] Not valid for port B lines to show up, which is i915_audio_component_get_eld returning -EINVAL, which appears to be making the audio driver blow up, as it's trying to grab ELDs for ports that don't have an audio-capable device connected. i915 doesn't appear to be doing anything wrong here, I believe this is a bug for the audio driver (In reply to James Ausmus from comment #3) > If we get rid of the noise from the dmesg, we get: > > <4>[ 11.837706] general protection fault: 0000 [#1] PREEMPT SMP NOPTI > <4>[ 11.837755] CPU: 4 PID: 190 Comm: kworker/4:3 Not tainted > 5.1.0-rc4-CI-CI_DRM_5896+ #1 > <4>[ 11.837788] Hardware name: Intel Corporation Ice Lake Client > Platform/IceLake U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3087.A00.1902250334 > 02/25/2019 > <4>[ 11.837843] Workqueue: events azx_probe_work [snd_hda_intel] > <4>[ 11.837869] RIP: 0010:kobject_uevent_env+0x38/0x810 > <4>[ 11.837890] Code: 53 49 89 d5 48 89 c3 48 83 ec 38 4c 8b 24 c5 c0 88 > ef 81 0f 1f 44 00 00 4c 89 f0 eb 0d 48 8b 40 18 48 85 c0 0f 84 27 01 00 00 > <4c> 8b 78 20 4d 85 ff 74 ea 41 f6 46 3c 10 0f 85 f9 00 00 00 49 8b > <4>[ 11.837957] RSP: 0018:ffffc9000144bd18 EFLAGS: 00010296 > <4>[ 11.837980] RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000002 RCX: 0x6b6b6b6b6b6b6b6b == POISON_FREE == use-after-free in snd_hda. The CI Bug Log issue associated to this bug has been updated. ### New filters associated * ICL - igt@runner@aborted - fail - Previous test: nothing\n.*\n.*\nKernel badly tainted \(0x280\) - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_264/fi-icl-y/igt@runner@aborted.html A CI Bug Log filter associated to this bug has been updated: {- ICL - igt@runner@aborted - fail - Previous test: nothing\n.*\n.*\nKernel badly tainted \(0x280\) -} {+ ICL - igt@runner@aborted - fail - Previous test: nothing, Kernel badly tainted (0x280) +} No new failures caught with the new filter Created attachment 144186 [details]
boot2.log
From boot2.log, seems like problem with sysfs
<3>[ 11.881334] BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
<3>[ 11.882350] =============================================================================
<3>[ 11.882566] in_atomic(): 0, irqs_disabled(): 0, pid: 190, name: kworker/4:3
<3>[ 11.883476] BUG kmalloc-32 (Tainted: G D ): Poison overwritten
<3>[ 11.883477] -----------------------------------------------------------------------------
<3>[ 11.883478] INFO: 0x00000000c581b909-0x00000000cf17bda8. First byte 0xe8 instead of 0x6b
<3>[ 11.883484] INFO: Allocated in widget_tree_create+0x21/0x110 [snd_hda_core] age=55 cpu=4 pid=190
<3>[ 11.883486] kmem_cache_alloc_trace+0x225/0x290
<3>[ 11.883489] widget_tree_create+0x21/0x110 [snd_hda_core]
<3>[ 11.883491] hda_widget_sysfs_init+0x1a/0x40 [snd_hda_core]
<3>[ 11.883493] snd_hdac_device_register+0x19/0x40 [snd_hda_core]
<3>[ 11.883496] snd_hda_codec_configure+0x39/0x160 [snd_hda_codec]
<3>[ 11.883499] azx_codec_configure+0x2a/0x60 [snd_hda_codec]
<3>[ 11.883501] azx_probe_work+0x42e/0x7e0 [snd_hda_intel]
<3>[ 11.883503] process_one_work+0x245/0x610
<3>[ 11.883504] worker_thread+0x37/0x380
<3>[ 11.883505] kthread+0x119/0x130
<3>[ 11.883508] ret_from_fork+0x24/0x50
<3>[ 11.883510] INFO: Freed in hda_widget_sysfs_reinit+0x18c/0x210 [snd_hda_core] age=48 cpu=2 pid=348
<3>[ 11.883513] snd_hdac_refresh_widgets+0x85/0xc0 [snd_hda_core]
<3>[ 11.883516] snd_hda_codec_update_widgets+0xe/0x50 [snd_hda_codec]
<3>[ 11.883518] intel_hsw_common_init+0x59/0xc0 [snd_hda_codec_hdmi]
<3>[ 11.883520] hda_codec_driver_probe+0x88/0x150 [snd_hda_codec]
<3>[ 11.883522] really_probe+0xf3/0x3e0
<3>[ 11.883523] driver_probe_device+0x10a/0x120
<3>[ 11.883524] device_driver_attach+0x4b/0x50
<3>[ 11.883524] __driver_attach+0x97/0x130
<3>[ 11.883527] bus_for_each_dev+0x74/0xc0
<3>[ 11.883528] bus_add_driver+0x13f/0x210
<3>[ 11.883529] driver_register+0x56/0xe0
<3>[ 11.883530] do_one_initcall+0x58/0x2e0
<3>[ 11.883532] do_init_module+0x56/0x1ea
<3>[ 11.883533] load_module+0x2701/0x29e0
<3>[ 11.883534] __se_sys_finit_module+0xd3/0xf0
<3>[ 11.883534] do_syscall_64+0x55/0x190
<3>[ 11.883536] INFO: Slab 0x00000000043b2366 objects=22 used=22 fp=0x (null) flags=0x8000000000010200
<3>[ 11.883536] INFO: Object 0x000000000c2cf882 @offset=1112 fp=0x00000000b006c361
<3>[ 11.883538] Redzone 00000000f96d092c: bb bb bb bb bb bb bb bb ........
<3>[ 11.883539] Object 000000000c2cf882: 6b 6b 6b 6b 6b 6b 6b 6b e8 d5 db 99 84 88 ff ff kkkkkkkk........
<3>[ 11.883539] Object 00000000d4d94bbe: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
<3>[ 11.883540] Redzone 00000000be7a46be: bb bb bb bb bb bb bb bb ........
<3>[ 11.883541] Padding 00000000a84a9f5a: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
This looks like race between widget_tree_create() and hda_widget_sysfs_reinit():
looks like hda_widget_sysfs_reinit() freed codec->widgets structure while widget_tree_create() still operated on it.
Patch https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ed180abba7f1fc3cf04ffa27767b1bcc8e8c842a is part of 5.2-rc1 now. This issue happened twice on drmtuip runs 262 and 264 but not seen later. Current drmtip run is 380 Closing and archiving this issue. The CI Bug Log issue associated to this bug has been archived. New failures matching the above filters will not be associated to this bug anymore. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.