Bug 111183

Summary: [CI][DRMTIP] igt@gem_exec_await@wide-contexts - dmesg-warn - BUG active_node (Tainted:.*): Object padding overwritten
Product: DRI Reporter: Lakshmi <lakshminarayana.vudum>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: RESOLVED NOTOURBUG QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: ICL i915 features: GEM/Other

Description Lakshmi 2019-07-22 10:08:20 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_327/fi-icl-dsi/igt@gem_exec_await@wide-contexts.html

<6> [55.564103] Console: switching to colour dummy device 80x25
<6> [55.564154] [IGT] gem_exec_await: executing
<6> [55.596692] [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 1
<5> [55.729357] Setting dangerous option reset - tainting kernel
<6> [55.737303] [IGT] gem_exec_await: starting subtest wide-contexts
<6> [55.738845] gem_exec_await (1112): drop_caches: 4
<3> [66.140734] =============================================================================
<3> [66.141073] BUG active_node (Tainted: G     U           ): Object padding overwritten
<3> [66.141112] -----------------------------------------------------------------------------

<4> [66.141159] Disabling lock debugging due to kernel taint
<3> [66.141163] INFO: 0x00000000d4b35f55-0x00000000abf6671d. First byte 0x58 instead of 0x5a
<3> [66.141256] INFO: Allocated in i915_active_ref+0x59/0x200 [i915] age=3493 cpu=5 pid=1112
<3> [66.141275] 	__slab_alloc.isra.28.constprop.34+0x3d/0x70
<3> [66.141286] 	kmem_cache_alloc+0x21c/0x280
<3> [66.141358] 	i915_active_ref+0x59/0x200 [i915]
<3> [66.141433] 	i915_vma_move_to_active+0x4c/0x350 [i915]
<3> [66.141502] 	i915_gem_do_execbuffer+0x12cc/0x20f0 [i915]
<3> [66.141567] 	i915_gem_execbuffer2_ioctl+0x11b/0x430 [i915]
<3> [66.141580] 	drm_ioctl_kernel+0x83/0xf0
<3> [66.141589] 	drm_ioctl+0x2f3/0x3b0
<3> [66.141598] 	do_vfs_ioctl+0xa0/0x6e0
<3> [66.141606] 	ksys_ioctl+0x35/0x60
<3> [66.141614] 	__x64_sys_ioctl+0x11/0x20
<3> [66.141624] 	do_syscall_64+0x55/0x1c0
<3> [66.141636] 	entry_SYSCALL_64_after_hwframe+0x49/0xbe
<3> [66.141646] INFO: Slab 0x00000000f7fe5a99 objects=32 used=21 fp=0x0000000004782f4f flags=0x8000000000010201
<3> [66.141663] INFO: Object 0x000000001f6cc2f9 @offset=13376 fp=0x000000002383d773

<3> [66.141680] Redzone 00000000: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
<3> [66.141695] Redzone 00000010: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
<3> [66.141710] Redzone 00000020: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
<3> [66.141725] Redzone 00000030: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc  ................
<3> [66.141740] Object 00000000: 00 00 00 00 00 00 00 00 48 74 e9 08 f3 95 ff ff  ........Ht......
<3> [66.141755] Object 00000010: 48 74 e9 08 f3 95 ff ff b0 f8 58 c0 ff ff ff ff  Ht........X.....
<3> [66.141770] Object 00000020: d0 de e8 08 f3 95 ff ff 01 00 00 00 00 00 00 00  ................
<3> [66.141785] Object 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
<3> [66.141800] Object 00000040: a3 02 00 00 00 00 00 00                          ........
<3> [66.141814] Redzone 00000000: cc cc cc cc cc cc cc cc                          ........
<3> [66.141828] Padding 00000000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
<3> [66.141843] Padding 00000010: 5a 5a 58 5a 5a 5a 5a 5a 5a 5a 58 5a 5a 5a 5a 5a  ZZXZZZZZZZXZZZZZ
<3> [66.141858] Padding 00000020: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
<3> [66.141873] Padding 00000030: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
<4> [66.141889] CPU: 5 PID: 191 Comm: kworker/u16:2 Tainted: G    BU            5.2.0-ga9fbb0055257-drmtip_327+ #1
<4> [66.141891] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3102.A00.1903052247 03/05/2019
<4> [66.141951] Workqueue: i915 retire_work_handler [i915]
<4> [66.141953] Call Trace:
<4> [66.141959]  dump_stack+0x67/0x9b
<4> [66.141964]  check_bytes_and_report+0xbd/0x100
<4> [66.141968]  check_object+0x14e/0x270
<4> [66.141973]  free_debug_processing+0x137/0x370
<4> [66.142031]  ? __active_retire+0x10a/0x190 [i915]
<4> [66.142036]  __slab_free+0x337/0x4f0
<4> [66.142041]  ? mark_held_locks+0x49/0x70
<4> [66.142044]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
<4> [66.142047]  ? _raw_spin_unlock_irqrestore+0x4c/0x60
<4> [66.142050]  ? lockdep_hardirqs_on+0xe3/0x1b0
<4> [66.142053]  ? _raw_spin_unlock_irqrestore+0x39/0x60
<4> [66.142057]  ? debug_check_no_obj_freed+0x132/0x210
<4> [66.142114]  ? __active_retire+0x10a/0x190 [i915]
<4> [66.142118]  ? kmem_cache_free+0x275/0x2e0
<4> [66.142121]  kmem_cache_free+0x275/0x2e0
<4> [66.142175]  __active_retire+0x10a/0x190 [i915]
<4> [66.142237]  i915_request_retire+0x22b/0x840 [i915]
<4> [66.142297]  ring_retire_requests+0x47/0x50 [i915]
<4> [66.142357]  i915_retire_requests+0x57/0xc0 [i915]
<4> [66.142414]  retire_work_handler+0x27/0x60 [i915]
<4> [66.142418]  process_one_work+0x245/0x610
<4> [66.142424]  worker_thread+0x37/0x380
<4> [66.142428]  ? process_one_work+0x610/0x610
<4> [66.142432]  kthread+0x119/0x130
<4> [66.142435]  ? kthread_park+0x80/0x80
<4> [66.142440]  ret_from_fork+0x3a/0x50
<3> [66.142447] FIX active_node: Restoring 0x00000000d4b35f55-0x00000000abf6671d=0x5a

<6> [79.191111] [IGT] gem_exec_await: exiting, ret=0
<5> [79.191471] Setting dangerous option reset - tainting kernel
<6> [79.256757] Console: switching to colour frame buffer device 180x160
Comment 1 CI Bug Log 2019-07-22 10:09:02 UTC
The CI Bug Log issue associated to this bug has been updated.

### New filters associated

* fi-icl-dsi: igt@gem_exec_await@wide-contexts - dmesg-warn - BUG active_node (Tainted:.*): Object padding overwritten
  - https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_327/fi-icl-dsi/igt@gem_exec_await@wide-contexts.html
Comment 2 Chris Wilson 2019-07-23 11:38:07 UTC
Remember the icl bug where somebody [else] was writing BIT(2) into an array of pointers?... Now we have the opposite.

It's likely not our bug, just concerning that these keep on happening to icl specifically.
Comment 3 Chris Wilson 2019-07-27 12:58:53 UTC
We can ignore icl-dsi memcorruption without any smoking gun as we know that machine exhibits HW memory faults.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.