Bug 93248 - [BAT BDW SKL] slab poisoning over module reload since build CI_DRM_862
Summary: [BAT BDW SKL] slab poisoning over module reload since build CI_DRM_862
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: highest normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2015-12-04 09:21 UTC by Daniel Vetter
Modified: 2016-12-13 09:12 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Daniel Vetter 2015-12-04 09:21:37 UTC
Our dear CI started to catch a slab poisoning regression on bdw-nuci7 and skl-i5k-2. Unfortuntely by the time slab notices the problem i915.ko is unloaded already, which means the crucial functions aren't decoded. Someone with local access to the machines needs to first grab a copy of /proc/kallsyms, then reproduce using the module reload testcase and then manually decode where the culprit exactly is.

Anyway, example dmesg splat below:

[  170.349516] =============================================================================
[  170.349523] BUG kmalloc-256 (Tainted: G    BU  W      ): Poison overwritten
[  170.349526] -----------------------------------------------------------------------------

[  170.349533] INFO: 0xffff880212ff16d0-0xffff880212ff16d0. First byte 0x66 instead of 0x6b
[  170.349539] INFO: Allocated in 0xffffffffa01eb19b age=161765 cpu=2 pid=299
[  170.349544] 	___slab_alloc.constprop.59+0x35e/0x390
[  170.349548] 	__slab_alloc.isra.56.constprop.58+0x43/0x80
[  170.349552] 	kmem_cache_alloc_trace+0x25e/0x2e0
[  170.349555] 	0xffffffffa01eb19b
[  170.349557] 	0xffffffffa01eb5f5
[  170.349560] 	0xffffffffa02001e2
[  170.349562] 	0xffffffffa027edff
[  170.349566] 	drm_dev_register+0xa4/0xb0
[  170.349570] 	drm_get_pci_dev+0xce/0x1e0
[  170.349572] 	0xffffffffa01c22cf
[  170.349577] 	pci_device_probe+0x87/0xf0
[  170.349581] 	driver_probe_device+0x221/0x4a0
[  170.349584] 	__driver_attach+0x83/0x90
[  170.349587] 	bus_for_each_dev+0x61/0xa0
[  170.349590] 	driver_attach+0x19/0x20
[  170.349593] 	bus_add_driver+0x1ef/0x290
[  170.349596] INFO: Freed in 0xffffffffa01eaf59 age=142 cpu=2 pid=6059
[  170.349600] 	__slab_free+0x356/0x4a0
[  170.349603] 	kfree+0x283/0x290
[  170.349606] 	0xffffffffa01eaf59
[  170.349608] 	0xffffffffa01eb843
[  170.349610] 	0xffffffffa027f6b0
[  170.349614] 	drm_dev_unregister+0x24/0xa0
[  170.349617] 	drm_put_dev+0x1e/0x60
[  170.349620] 	0xffffffffa01c2290
[  170.349623] 	pci_device_remove+0x34/0xb0
[  170.349626] 	__device_release_driver+0x91/0x130
[  170.349630] 	driver_detach+0xb3/0xc0
[  170.349633] 	bus_remove_driver+0x53/0xd0
[  170.349636] 	driver_unregister+0x27/0x50
[  170.349640] 	pci_unregister_driver+0x25/0x70
[  170.349643] 	drm_pci_exit+0x74/0x90
[  170.349645] 	0xffffffffa02809be
[  170.349648] INFO: Slab 0xffffea00084bfc00 objects=28 used=28 fp=0x          (null) flags=0x8000000000004080
[  170.349654] INFO: Object 0xffff880212ff16d0 @offset=5840 fp=0xffff880212ff0db0

[  170.349660] Bytes b4 ffff880212ff16c0: 82 8d fb ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
[  170.349665] Object ffff880212ff16d0: 66 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  fkkkkkkkkkkkkkkk
[  170.349671] Object ffff880212ff16e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  170.349676] Object ffff880212ff16f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  170.349681] Object ffff880212ff1700: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  170.349686] Object ffff880212ff1710: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  170.349691] Object ffff880212ff1720: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  170.349696] Object ffff880212ff1730: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  170.349701] Object ffff880212ff1740: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  170.349706] Object ffff880212ff1750: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  170.349711] Object ffff880212ff1760: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  170.349716] Object ffff880212ff1770: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  170.349721] Object ffff880212ff1780: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  170.349726] Object ffff880212ff1790: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  170.349731] Object ffff880212ff17a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  170.349736] Object ffff880212ff17b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
[  170.349741] Object ffff880212ff17c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
[  170.349746] Redzone ffff880212ff17d0: bb bb bb bb bb bb bb bb                          ........
[  170.349751] Padding ffff880212ff1910: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
[  170.349756] CPU: 3 PID: 6065 Comm: modprobe Tainted: G    BU  W       4.4.0-rc3-gfxbench+ #1
[  170.349757] Hardware name:                  /NUC5i7RYB, BIOS RYBDWi35.86A.0249.2015.0529.1640 05/29/2015
[  170.349758]  ffff880212ff16d0 ffff8800d8c5f580 ffffffff813df42c ffff8802158073c0
[  170.349761]  ffff8800d8c5f5c0 ffffffff811a6093 0000000000000008 ffff880200000001
[  170.349763]  ffff880212ff16d1 ffff8802158073c0 000000000000006b ffff880212ff16d0
[  170.349765] Call Trace:
[  170.349767]  [<ffffffff813df42c>] dump_stack+0x4e/0x82
[  170.349768]  [<ffffffff811a6093>] print_trailer+0x143/0x1f0
[  170.349770]  [<ffffffff811a6208>] check_bytes_and_report+0xc8/0x110
[  170.349771]  [<ffffffff811a6411>] check_object+0x1c1/0x240
[  170.349773]  [<ffffffff812300c2>] ? __proc_create+0xb2/0x280
[  170.349774]  [<ffffffff811a9b0b>] alloc_debug_processing+0x9b/0x190
[  170.349775]  [<ffffffff811a9f5e>] ___slab_alloc.constprop.59+0x35e/0x390
[  170.349776]  [<ffffffff812300c2>] ? __proc_create+0xb2/0x280
[  170.349778]  [<ffffffff813fc18c>] ? debug_check_no_obj_freed+0x10c/0x1f0
[  170.349780]  [<ffffffff813e0da7>] ? ida_get_new_above+0x1d7/0x210
[  170.349781]  [<ffffffff811a91d4>] ? kmem_cache_free+0x134/0x350
[  170.349782]  [<ffffffff812300c2>] ? __proc_create+0xb2/0x280
[  170.349784]  [<ffffffff811a9fd3>] __slab_alloc.isra.56.constprop.58+0x43/0x80
[  170.349785]  [<ffffffff811aadbb>] __kmalloc+0x2bb/0x330
[  170.349786]  [<ffffffff812300c2>] __proc_create+0xb2/0x280
[  170.349788]  [<ffffffff8123065d>] proc_create_data+0x4d/0xc0
[  170.349790]  [<ffffffff810da1d8>] register_irq_proc+0x138/0x140
[  170.349791]  [<ffffffff810d598e>] __setup_irq+0x27e/0x600
[  170.349807]  [<ffffffffa02ef900>] ? gen8_gt_irq_handler+0x250/0x250 [i915]
[  170.349809]  [<ffffffff810d5e90>] request_threaded_irq+0xf0/0x190
[  170.349810]  [<ffffffff814f3fa0>] drm_irq_install+0x90/0x170
[  170.349822]  [<ffffffffa02f2940>] intel_irq_install+0x20/0x30 [i915]
[  170.349843]  [<ffffffffa03a4ddc>] i915_driver_load+0xeec/0x1670 [i915]
[  170.349845]  [<ffffffff813fc18c>] ? debug_check_no_obj_freed+0x10c/0x1f0
[  170.349847]  [<ffffffff8178f4e0>] ? klist_add_tail+0x20/0x40
[  170.349849]  [<ffffffff814f745c>] ? drm_minor_register+0x7c/0x110
[  170.349851]  [<ffffffff814f7481>] ? drm_minor_register+0xa1/0x110
[  170.349852]  [<ffffffff814f7594>] drm_dev_register+0xa4/0xb0
[  170.349854]  [<ffffffff814f93fe>] drm_get_pci_dev+0xce/0x1e0
[  170.349856]  [<ffffffff81797b6d>] ? _raw_spin_unlock_irqrestore+0x3d/0x60
[  170.349867]  [<ffffffffa02e82cf>] i915_pci_probe+0x2f/0x50 [i915]
[  170.349869]  [<ffffffff81425ea7>] pci_device_probe+0x87/0xf0
[  170.349870]  [<ffffffff8151a551>] driver_probe_device+0x221/0x4a0
[  170.349872]  [<ffffffff8151a853>] __driver_attach+0x83/0x90
[  170.349873]  [<ffffffff8151a7d0>] ? driver_probe_device+0x4a0/0x4a0
[  170.349875]  [<ffffffff81518371>] bus_for_each_dev+0x61/0xa0
[  170.349876]  [<ffffffff81519ee9>] driver_attach+0x19/0x20
[  170.349878]  [<ffffffff81519a6f>] bus_add_driver+0x1ef/0x290
[  170.349879]  [<ffffffff8151b4ab>] driver_register+0x5b/0xe0
[  170.349881]  [<ffffffff81424e3b>] __pci_register_driver+0x5b/0x60
[  170.349883]  [<ffffffff814f95e6>] drm_pci_init+0xd6/0x100
[  170.349884]  [<ffffffffa0171000>] ? 0xffffffffa0171000
[  170.349895]  [<ffffffffa0171094>] i915_init+0x94/0x9b [i915]
[  170.349897]  [<ffffffff810003de>] do_one_initcall+0xae/0x1d0
[  170.349899]  [<ffffffff81153bb2>] ? do_init_module+0x22/0x1e0
[  170.349900]  [<ffffffff811aa3c3>] ? kmem_cache_alloc_trace+0xe3/0x2e0
[  170.349902]  [<ffffffff81153beb>] do_init_module+0x5b/0x1e0
[  170.349904]  [<ffffffff81101877>] load_module+0x1b57/0x2440
[  170.349905]  [<ffffffff810fefc0>] ? symbol_put_addr+0x60/0x60
[  170.349907]  [<ffffffff810ff2b6>] ? copy_module_from_fd.isra.58+0xe6/0x140
[  170.349909]  [<ffffffff8110233b>] SyS_finit_module+0x7b/0xa0
[  170.349911]  [<ffffffff8179849b>] entry_SYSCALL_64_fastpath+0x16/0x73
[  170.349912] FIX kmalloc-256: Restoring 0xffff880212ff16d0-0xffff880212ff16d0=0x6b

[  170.349918] FIX kmalloc-256: Marking all objects used
Comment 1 Chris Wilson 2015-12-04 09:26:18 UTC
Or kmemcheck.
Comment 2 Daniel Vetter 2015-12-04 16:38:17 UTC
commit af3302b90775ca3389c93ab31458d696e8a8fa60
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Dec 4 17:27:15 2015 +0100

    Revert "drm/i915: Extend LRC pinning to cover GPU context writeback"
    
    This reverts commit 6d65ba943a2d1e4292a07ca7ddb6c5138b9efa5d.
    
    Mika Kuoppala traced down a use-after-free crash in module unload to
    this commit, because ring->last_context is leaked beyond when the
    context gets destroyed. Mika submitted a quick fix to patch that up in
    the context destruction code, but that's too much of a hack.
    
    The right fix is instead for the ring to hold a full reference onto
    it's last context, like we do for legacy contexts.
    
    Since this is causing a regression in BAT it gets reverted before we
    can close this.
    
    Cc: Nick Hoath <nicholas.hoath@intel.com>
    Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
    Cc: David Gordon <david.s.gordon@intel.com>
    Cc: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Alex Dai <yu.dai@intel.com>
    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93248
    Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Comment 3 Jari Tahvanainen 2016-12-13 09:12:14 UTC
Closing resolved+fixed after one year without comments. Faulty commit reverted by commit af3302b9.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.