Bug 108690 - [CI][SHARDS] igt@gem_exec_reuse@baggage - dmesg-warn - ODEBUG: Out of memory. ODEBUG disabled
Summary: [CI][SHARDS] igt@gem_exec_reuse@baggage - dmesg-warn - ODEBUG: Out of memory....
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-07 15:13 UTC by Martin Peres
Modified: 2018-11-13 15:40 UTC (History)
1 user (show)

See Also:
i915 platform: BXT
i915 features: GEM/Other


Attachments

Description Martin Peres 2018-11-07 15:13:09 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5096/shard-apl8/igt@gem_exec_reuse@baggage.html

<6> [630.362731] [IGT] gem_exec_reuse: starting subtest baggage
<6> [682.800570] [IGT] gem_exec_reuse: exiting, ret=0
<6> [685.973846] Console: switching to colour frame buffer device 240x67
<4> [697.383196] ODEBUG: Out of memory. ODEBUG disabled
<4> [698.591841] stack segment: 0000 [#1] PREEMPT SMP NOPTI
<4> [698.591905] CPU: 0 PID: 12 Comm: kworker/0:1 Tainted: G     U            4.20.0-rc1-CI-CI_DRM_5096+ #1
<4> [698.591974] Hardware name:  /NUC6CAYB, BIOS AYAPLCEL.86A.0049.2018.0508.1356 05/08/2018
<4> [698.592041] Workqueue: events free_obj_work
<4> [698.592078] RIP: 0010:free_obj_work+0x143/0x210
<4> [698.592116] Code: dd 00 00 00 49 bd 00 01 00 00 00 00 ad de 49 bc 00 02 00 00 00 00 ad de 48 89 43 08 4c 89 2e 4c 89 66 08 e8 3f 0e d6 ff eb 16 <48> 89 45 08 48 89 de 4c 89 2b 4c 89 63 08 48 89 eb e8 27 0e d6 ff
<4> [698.592247] RSP: 0018:ffffc9000007fe30 EFLAGS: 00010202
<4> [698.592288] RAX: ffffc9000007fe30 RBX: ffff8801529c5d68 RCX: 0000000000000001
<4> [698.592342] RDX: 0000000080000001 RSI: 00000000ffffffff RDI: ffff880276816a80
<4> [698.592395] RBP: 6b6b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000001
<4> [698.592448] R10: 0000000000000000 R11: 0000000000000000 R12: dead000000000200
<4> [698.592500] R13: dead000000000100 R14: 0000000000000000 R15: 0000000000000000
<4> [698.592554] FS:  0000000000000000(0000) GS:ffff880277a00000(0000) knlGS:0000000000000000
<4> [698.592614] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [698.592658] CR2: 0000564ee3e10f28 CR3: 000000026e5c4000 CR4: 00000000003406f0
<4> [698.592711] Call Trace:
<4> [698.592740]  process_one_work+0x262/0x630
<4> [698.592779]  worker_thread+0x37/0x380
<4> [698.592812]  ? process_one_work+0x630/0x630
<4> [698.592847]  kthread+0x119/0x130
<4> [698.592876]  ? kthread_park+0x80/0x80
<4> [698.592911]  ret_from_fork+0x3a/0x50
<4> [698.592947] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core lpc_ich snd_pcm r8169 mei_me prime_numbers mei pinctrl_broxton pinctrl_intel
Comment 1 Chris Wilson 2018-11-07 15:27:25 UTC
I get the feeling that rc1 brought in a slew of new debugobject users, too many for the debugobject preallocation to handle. The current preallocation code dates from Feb 2018 so unlikely to be behind the recent discovery.
Comment 2 Chris Wilson 2018-11-08 09:25:18 UTC
I wonder if we've enabled rcu debug objects recently. That ties in with the recent reports for untracked rcu objects on what seems to be an old problem.
Comment 3 Chris Wilson 2018-11-08 09:27:53 UTC
Something else to note is that the debugobject preallocation is done in debug_object_init, and so if we have not been calling them for our obj->rcu that's a lot of debugobjects being allocated without any preallocation!
Comment 4 Chris Wilson 2018-11-09 10:47:57 UTC
commit 8811d616dfaa8c6e1905a20ce0543ec401275997 (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Nov 9 09:03:11 2018 +0000

    drm/i915: Initialise the obj->rcu head
    
    Make the rcu_head known to the system, in particular for debugobjects.
    And having declared it for debugobjects, we need to tidy up afterwards.
    
    v2: mark the obj->rcu as being destroyed when we reuse its location for
    the freed list.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108691
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20181109090311.15321-1-chris@chris-wilson.co.uk
Comment 5 Martin Peres 2018-11-13 15:40:13 UTC
This indeed fixed it, thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.