Summary: | [CI][BAT]: igt@i915_selftest@live_hangcheck - timeout - list_add corruption. next->prev should be prev (ffff888405f2c430), but was ffff88848d9492c0. (next=ffff88848d94b480). | ||
---|---|---|---|
Product: | DRI | Reporter: | Lakshmi <lakshminarayana.vudum> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | highest | CC: | intel-gfx-bugs |
Version: | DRI git | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | ICL | i915 features: |
Description
Lakshmi
2019-08-14 06:03:33 UTC
The CI Bug Log issue associated to this bug has been updated. ### New filters associated * ICL: igt@i915_selftest@live_hangcheck - timeout - list_add corruption. next->prev should be prev (ffff888405f2c430), but was ffff88848d9492c0. (next=ffff88848d94b480). - https://intel-gfx-ci.01.org/tree/drm-tip/IGT_5130/fi-icl-u2/igt@i915_selftest@live_hangcheck.html The corruption started a little earlier. It should be fixed by https://patchwork.freedesktop.org/series/65152/ commit f130b712ff259f03feaf7a245e3464e3faad89ae (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue Aug 13 21:09:05 2019 +0100 drm/i915: Serialise read/write of the barrier's engine We use the request pointer inside the i915_active_node as the indicator of the barrier's status; we mark it as used during i915_request_add_active_barriers(), and search for an available barrier in reuse_idle_barrier(). That check must be carefully serialised to ensure we do use an engine for the barrier and not just a random pointer. (Along the other reuse path, we are fully serialised by the timeline->mutex.) The acquisition of the barrier itself is ordered through the strong memory barrier in llist_del_all(). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111397 Fixes: d8af05ff38ae ("drm/i915: Allow sharing the idle-barrier from other kernel requests") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190813200905.11369-1-chris@chris-wilson.co.uk Still visible: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6712/shard-apl1/igt@gem_exec_whisper@normal.html <6> [686.556750] [IGT] gem_exec_whisper: starting subtest normal <4> [816.468720] ------------[ cut here ]------------ <4> [816.468795] list_add corruption. next->prev should be prev (ffff88826140c430), but was ffff88825dec1110. (next=ffff888273e21980). <4> [816.468824] WARNING: CPU: 2 PID: 1255 at lib/list_debug.c:25 __list_add_valid+0x36/0x70 <4> [816.468828] Modules linked in: vgem mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915 lpc_ich r8169 realtek snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hwdep snd_hda_core snd_pcm mei_me pinctrl_broxton pinctrl_intel mei prime_numbers <4> [816.468858] CPU: 2 PID: 1255 Comm: kworker/u8:23 Tainted: G U 5.3.0-rc4-CI-CI_DRM_6712+ #1 <4> [816.468862] Hardware name: /NUC6CAYB, BIOS AYAPLCEL.86A.0049.2018.0508.1356 05/08/2018 <4> [816.468964] Workqueue: i915 retire_work_handler [i915] <4> [816.468971] RIP: 0010:__list_add_valid+0x36/0x70 <4> [816.468976] Code: 48 8b 10 4c 39 c2 75 27 48 39 f8 74 39 48 39 fa 74 34 b8 01 00 00 00 c3 48 89 d1 48 c7 c7 b8 26 0e 82 48 89 c2 e8 aa 58 bc ff <0f> 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 08 27 0e 82 e8 93 58 bc <4> [816.468980] RSP: 0018:ffffc9000089fd40 EFLAGS: 00010286 <4> [816.468986] RAX: 0000000000000000 RBX: ffff888262f59240 RCX: 0000000000000001 <4> [816.468990] RDX: 0000000080000001 RSI: ffff88826afd0940 RDI: 00000000ffffffff <4> [816.468993] RBP: ffffc9000089fd80 R08: ffff88826afd0940 R09: 0000000000000000 <4> [816.468998] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88825dec1cb8 <4> [816.469001] R13: 0000000000000000 R14: ffff88825dec1ce0 R15: ffff888273e21980 <4> [816.469006] FS: 0000000000000000(0000) GS:ffff888277b00000(0000) knlGS:0000000000000000 <4> [816.469010] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [816.469014] CR2: 00007f3b888c9c38 CR3: 000000026a452000 CR4: 00000000003406e0 <4> [816.469018] Call Trace: <4> [816.469100] __i915_request_commit+0x245/0x510 [i915] <4> [816.469179] __engine_park+0x70/0x270 [i915] <4> [816.469251] ____intel_wakeref_put_last+0x1c/0x70 [i915] <4> [816.469335] i915_request_retire+0x418/0x840 [i915] <4> [816.469420] ring_retire_requests+0x47/0x50 [i915] <4> [816.469503] i915_retire_requests+0x57/0xc0 [i915] <4> [816.469583] retire_work_handler+0x27/0x60 [i915] <4> [816.469593] process_one_work+0x245/0x610 <4> [816.469608] worker_thread+0x37/0x380 <4> [816.469618] ? process_one_work+0x610/0x610 <4> [816.469623] kthread+0x119/0x130 <4> [816.469629] ? kthread_park+0xa0/0xa0 <4> [816.469640] ret_from_fork+0x3a/0x50 <4> [816.469661] irq event stamp: 852414 <4> [816.469666] hardirqs last enabled at (852413): [<ffffffff8112b537>] console_unlock+0x3f7/0x5a0 <4> [816.469673] hardirqs last disabled at (852414): [<ffffffff810019ea>] trace_hardirqs_off_thunk+0x1a/0x20 <4> [816.469679] softirqs last enabled at (852148): [<ffffffff8129a057>] wb_workfn+0x1a7/0x5f0 <4> [816.469683] softirqs last disabled at (852146): [<ffffffff81299f65>] wb_workfn+0xb5/0x5f0 <4> [816.469689] WARNING: CPU: 2 PID: 1255 at lib/list_debug.c:25 __list_add_valid+0x36/0x70 <4> [816.469693] ---[ end trace c7c4c36d1f93c572 ]--- A CI Bug Log filter associated to this bug has been updated: {- ICL: igt@i915_selftest@live_hangcheck - timeout - list_add corruption. next->prev should be prev (ffff888405f2c430), but was ffff88848d9492c0. (next=ffff88848d94b480). -} {+ all machines: igt@i915_selftest@live_hangcheck / igt@gem_exec_whisper@normal - timeout - list_add corruption. next->prev should be prev +} New failures caught by the filter: * https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6712/shard-apl1/igt@gem_exec_whisper@normal.html The CI Bug Log issue associated to this bug has been updated. ### New filters associated * APL: igt@runner@aborted - fail - Previous test: gem_exec_whisper (normal) - https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6712/shard-apl1/igt@runner@aborted.html (In reply to Martin Peres from comment #4) > Still visible: > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_6712/shard-apl1/ > igt@gem_exec_whisper@normal.html > > <6> [686.556750] [IGT] gem_exec_whisper: starting subtest normal > <4> [816.468720] ------------[ cut here ]------------ > <4> [816.468795] list_add corruption. next->prev should be prev > (ffff88826140c430), but was ffff88825dec1110. (next=ffff888273e21980) That's not the same bug! That's the unserialised access to the active_rings which is elsewhere in bugzilla. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.