Bug 103707 - [CI] igt@drv_selftest@ - dmesg-fail/dmesg-warn - BUG: MAX_LOCKDEP_CHAINS too low!
Summary: [CI] igt@drv_selftest@ - dmesg-fail/dmesg-warn - BUG: MAX_LOCKDEP_CHAINS too ...
Status: CLOSED WONTFIX
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
: 103716 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-11-13 11:12 UTC by Marta Löfstedt
Modified: 2017-11-30 10:57 UTC (History)
1 user (show)

See Also:
i915 platform: BXT, GLK
i915 features:


Attachments

Description Marta Löfstedt 2017-11-13 11:12:07 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3337/shard-glkb4/igt@drv_selftest@live_coherency.html

[ 4062.736714] CPU: 2 PID: 12971 Comm: kworker/u8:1 Tainted: G     U          4.14.0-rc8-CI-CI_DRM_3337+ #1
[ 4062.736715] Hardware name: Intel Corp. Geminilake/GLK RVP1 DDR4 (05), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
[ 4062.736759] Workqueue: i915 contexts_free_worker [i915]
[ 4062.736763] Call Trace:
[ 4062.736767]  dump_stack+0x68/0x9f
[ 4062.736773]  __lock_acquire+0x19d5/0x1b00
[ 4062.736775]  ? trace_hardirqs_on_caller+0xe3/0x1b0
[ 4062.736777]  ? trace_hardirqs_on_caller+0xe3/0x1b0
[ 4062.736779]  lock_acquire+0xb0/0x200
[ 4062.736781]  ? lock_acquire+0xb0/0x200
[ 4062.736783]  ? ida_simple_remove+0x1e/0x50
[ 4062.736785]  _raw_spin_lock_irqsave+0x3d/0x60
[ 4062.736787]  ? ida_simple_remove+0x1e/0x50
[ 4062.736788]  ida_simple_remove+0x1e/0x50
[ 4062.736828]  i915_gem_context_free+0xfd/0x180 [i915]
[ 4062.736869]  contexts_free_worker+0x50/0xa0 [i915]
[ 4062.736873]  process_one_work+0x221/0x650
[ 4062.736875]  worker_thread+0x4e/0x3c0
[ 4062.736878]  kthread+0x114/0x150
[ 4062.736879]  ? process_one_work+0x650/0x650
[ 4062.736881]  ? kthread_create_on_node+0x40/0x40
[ 4062.736882]  ret_from_fork+0x27/0x40
[ 4067.466756] snd_hda_intel 0000:00:0e.0: azx_get_response timeout, switching to polling mode: last cmd=0x20bf8100
[ 4068.470762] snd_hda_intel 0000:00:0e.0: No response from codec, disabling MSI: last cmd=0x20bf8100
[ 4069.471748] snd_hda_intel 0000:00:0e.0: azx_get_response timeout, switching to single_cmd mode: last cmd=0x20bf8100
[ 4069.491930] snd_hda_codec_hdmi hdaudioC0D2: Unable to sync register 0x2f0d00. -5
[ 4069.925897] Setting dangerous option live_selftests - tainting kernel
[ 4072.668046] i915: probe of 0000:00:02.0 failed with error -25
Comment 1 Chris Wilson 2017-11-13 11:23:46 UTC
That's actually:

<7>[ 4062.736709] BUG: MAX_LOCKDEP_CHAINS too low!
<7>[ 4062.736710] turning off the locking correctness validator.

It's telling us that we've overflowed the static array lockdep uses for tracking unique lockclasses -- we have too many locks! lockclasses are not recycled when the module is unload, so every time we run a drv_selftest subtest we load the module twice (the intention was to run drv_selftest in one batch so the extra reload only happened once for the entire set). We can drop the reload, so long as we teach igt to load i915.ko when required: https://patchwork.freedesktop.org/series/33703/ but this still means that there is an upper limit to the number of kselftests we run in this manner. Annoying.

I think the ultimate fix here is recycling the lockclasses on module unload.
Comment 2 Chris Wilson 2017-11-13 11:53:22 UTC
*** Bug 103716 has been marked as a duplicate of this bug. ***
Comment 3 Marta Löfstedt 2017-11-14 07:19:31 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3340/shard-glkb3/igt@drv_selftest@live_evict.html

dmesg:
<7>[  890.372801] BUG: MAX_LOCKDEP_CHAINS too low!
<7>[  890.372808] turning off the locking correctness validator.
<4>[  890.372815] CPU: 2 PID: 11348 Comm: drv_selftest Tainted: G     U          4.14.0-rc8-CI-CI_DRM_3340+ #1
<4>[  890.372821] Hardware name: Intel Corp. Geminilake/GLK RVP1 DDR4 (05), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[  890.372829] Call Trace:
<4>[  890.372839]  dump_stack+0x68/0x9f
<4>[  890.372846]  __lock_acquire+0x19d5/0x1b00
<4>[  890.372853]  lock_acquire+0xb0/0x200
<4>[  890.372857]  ? lock_acquire+0xb0/0x200
<4>[  890.372861]  ? load_balance+0x307/0xb50
<4>[  890.372866]  _raw_spin_lock_irqsave+0x3d/0x60
<4>[  890.372869]  ? load_balance+0x307/0xb50
<4>[  890.372874]  load_balance+0x307/0xb50
<4>[  890.372880]  ? trace_hardirqs_off_caller+0x96/0xe0
<4>[  890.372886]  pick_next_task_fair+0x648/0x840
<4>[  890.372892]  __schedule+0x123/0xaf0
<4>[  890.372898]  schedule+0x40/0x90
<4>[  890.372902]  schedule_timeout+0x1f2/0x550
<4>[  890.372909]  ? run_timer_softirq+0x160/0x160
<4>[  890.372914]  schedule_timeout_uninterruptible+0x2a/0x30
<4>[  890.372919]  ? schedule_timeout_uninterruptible+0x2a/0x30
<4>[  890.372971]  wait_panel_power_cycle+0xb9/0xc0 [i915]
<4>[  890.373020]  edp_panel_vdd_on+0x1c6/0x1d0 [i915]
<4>[  890.373027]  ? mutex_lock_nested+0x1b/0x20
<4>[  890.373071]  intel_dp_aux_ch+0x48/0x5f0 [i915]
<4>[  890.373119]  intel_dp_aux_transfer+0xa5/0x200 [i915]
<4>[  890.373127]  drm_dp_dpcd_access+0x65/0xf0
<4>[  890.373131]  drm_dp_dpcd_read+0x31/0x70
<4>[  890.373175]  intel_dp_read_dpcd+0x29/0x60 [i915]
<4>[  890.373221]  intel_dp_init_connector+0x9c3/0x1360 [i915]
<4>[  890.373229]  ? rcu_read_lock_sched_held+0x7a/0x90
<4>[  890.373233]  ? kmem_cache_alloc_trace+0x270/0x2d0
<4>[  890.373278]  intel_ddi_init+0x393/0x490 [i915]
<4>[  890.373326]  intel_modeset_init+0x12d2/0x1910 [i915]
<4>[  890.373374]  ? intel_setup_gmbus+0x244/0x260 [i915]
<4>[  890.373418]  i915_driver_load+0x9c4/0x1600 [i915]
<4>[  890.373462]  i915_pci_probe+0x37/0x90 [i915]
<4>[  890.373470]  pci_device_probe+0xa8/0x130
<4>[  890.373475]  driver_probe_device+0x29c/0x450
<4>[  890.373481]  __driver_attach+0xe3/0xf0
<4>[  890.373485]  ? driver_probe_device+0x450/0x450
<4>[  890.373489]  bus_for_each_dev+0x62/0xa0
<4>[  890.373493]  driver_attach+0x1e/0x20
<4>[  890.373498]  bus_add_driver+0x173/0x270
<4>[  890.373502]  driver_register+0x60/0xe0
<4>[  890.373507]  __pci_register_driver+0x6b/0x70
<4>[  890.373548]  i915_init+0x6f/0x78 [i915]
<4>[  890.373554]  ? 0xffffffffa04c7000
<4>[  890.373559]  do_one_initcall+0x43/0x170
<4>[  890.373563]  ? rcu_read_lock_sched_held+0x7a/0x90
<4>[  890.373567]  ? kmem_cache_alloc_trace+0x270/0x2d0
<4>[  890.373573]  do_init_module+0x5f/0x206
<4>[  890.373577]  load_module+0x2581/0x2dd0
<4>[  890.373582]  ? show_coresize+0x30/0x30
<4>[  890.373586]  ? kernel_read+0x31/0x50
<4>[  890.373591]  SyS_finit_module+0xc1/0x100
<4>[  890.373596]  ? SyS_finit_module+0xc1/0x100
<4>[  890.373601]  entry_SYSCALL_64_fastpath+0x1c/0xb1
<4>[  890.373605] RIP: 0033:0x7f191e6749f9
<4>[  890.373608] RSP: 002b:00007ffc5a86a9e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4>[  890.373615] RAX: ffffffffffffffda RBX: ffffffff81492083 RCX: 00007f191e6749f9
<4>[  890.373619] RDX: 0000000000000000 RSI: 00007f191f5a7e23 RDI: 0000000000000003
<4>[  890.373623] RBP: ffffc9000054ff88 R08: 0000000000000000 R09: 00005640b845fdb0
<4>[  890.373627] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
<4>[  890.373631] R13: 00005640b8460dd0 R14: 0000000000000000 R15: 00005640b84589a0
<4>[  890.373637]  ? __this_cpu_preempt_check+0x13/0x20
<7>[  891.029129] [drm:wait_panel_status [i915]] mask b800000f value 00000000 status 00000000 contro


Note, the piglit filter dmesg doesn't have the "BUG: MAX_LOCKDEP_CHAINS too low!". Which make this annoying since I now no longer can trust the piglit geneerated dmesg warning and always have to search through the real dmesg.

piglit dmesg:
[  888.351041] Setting dangerous option live_selftests - tainting kernel
[  890.240848] i915: probe of 0000:00:02.0 failed with error -25
[  890.372815] CPU: 2 PID: 11348 Comm: drv_selftest Tainted: G     U          4.14.0-rc8-CI-CI_DRM_3340+ #1
[  890.372821] Hardware name: Intel Corp. Geminilake/GLK RVP1 DDR4 (05), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
[  890.372829] Call Trace:
[  890.372839]  dump_stack+0x68/0x9f
[  890.372846]  __lock_acquire+0x19d5/0x1b00
[  890.372853]  lock_acquire+0xb0/0x200
[  890.372857]  ? lock_acquire+0xb0/0x200
[  890.372861]  ? load_balance+0x307/0xb50
[  890.372866]  _raw_spin_lock_irqsave+0x3d/0x60
[  890.372869]  ? load_balance+0x307/0xb50
[  890.372874]  load_balance+0x307/0xb50
[  890.372880]  ? trace_hardirqs_off_caller+0x96/0xe0
[  890.372886]  pick_next_task_fair+0x648/0x840
[  890.372892]  __schedule+0x123/0xaf0
[  890.372898]  schedule+0x40/0x90
[  890.372902]  schedule_timeout+0x1f2/0x550
[  890.372909]  ? run_timer_softirq+0x160/0x160
[  890.372914]  schedule_timeout_uninterruptible+0x2a/0x30
[  890.372919]  ? schedule_timeout_uninterruptible+0x2a/0x30
[  890.372971]  wait_panel_power_cycle+0xb9/0xc0 [i915]
[  890.373020]  edp_panel_vdd_on+0x1c6/0x1d0 [i915]
[  890.373027]  ? mutex_lock_nested+0x1b/0x20
[  890.373071]  intel_dp_aux_ch+0x48/0x5f0 [i915]
[  890.373119]  intel_dp_aux_transfer+0xa5/0x200 [i915]
[  890.373127]  drm_dp_dpcd_access+0x65/0xf0
[  890.373131]  drm_dp_dpcd_read+0x31/0x70
[  890.373175]  intel_dp_read_dpcd+0x29/0x60 [i915]
[  890.373221]  intel_dp_init_connector+0x9c3/0x1360 [i915]
[  890.373229]  ? rcu_read_lock_sched_held+0x7a/0x90
[  890.373233]  ? kmem_cache_alloc_trace+0x270/0x2d0
[  890.373278]  intel_ddi_init+0x393/0x490 [i915]
[  890.373326]  intel_modeset_init+0x12d2/0x1910 [i915]
[  890.373374]  ? intel_setup_gmbus+0x244/0x260 [i915]
[  890.373418]  i915_driver_load+0x9c4/0x1600 [i915]
[  890.373462]  i915_pci_probe+0x37/0x90 [i915]
[  890.373470]  pci_device_probe+0xa8/0x130
[  890.373475]  driver_probe_device+0x29c/0x450
[  890.373481]  __driver_attach+0xe3/0xf0
[  890.373485]  ? driver_probe_device+0x450/0x450
[  890.373489]  bus_for_each_dev+0x62/0xa0
[  890.373493]  driver_attach+0x1e/0x20
[  890.373498]  bus_add_driver+0x173/0x270
[  890.373502]  driver_register+0x60/0xe0
[  890.373507]  __pci_register_driver+0x6b/0x70
[  890.373548]  i915_init+0x6f/0x78 [i915]
[  890.373554]  ? 0xffffffffa04c7000
[  890.373559]  do_one_initcall+0x43/0x170
[  890.373563]  ? rcu_read_lock_sched_held+0x7a/0x90
[  890.373567]  ? kmem_cache_alloc_trace+0x270/0x2d0
[  890.373573]  do_init_module+0x5f/0x206
[  890.373577]  load_module+0x2581/0x2dd0
[  890.373582]  ? show_coresize+0x30/0x30
[  890.373586]  ? kernel_read+0x31/0x50
[  890.373591]  SyS_finit_module+0xc1/0x100
[  890.373596]  ? SyS_finit_module+0xc1/0x100
[  890.373601]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[  890.373605] RIP: 0033:0x7f191e6749f9
[  890.373608] RSP: 002b:00007ffc5a86a9e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  890.373615] RAX: ffffffffffffffda RBX: ffffffff81492083 RCX: 00007f191e6749f9
[  890.373619] RDX: 0000000000000000 RSI: 00007f191f5a7e23 RDI: 0000000000000003
[  890.373623] RBP: ffffc9000054ff88 R08: 0000000000000000 R09: 00005640b845fdb0
[  890.373627] R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
[  890.373631] R13: 00005640b8460dd0 R14: 0000000000000000 R15: 00005640b84589a0
[  890.373637]  ? __this_cpu_preempt_check+0x13/0x20
[  891.528248] to colour frame buffer device 400x112
Comment 4 Marta Löfstedt 2017-11-14 07:25:47 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3342/shard-glkb4/igt@drv_selftest@live_gtt.html

<7>[ 2850.207919] BUG: MAX_LOCKDEP_CHAINS too low!
<7>[ 2850.207937] turning off the locking correctness validator.
Comment 5 Marta Löfstedt 2017-11-14 11:43:46 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3343/shard-apl7/igt@drv_selftest@live_evict.html

<7>[ 1875.409195] [drm:__bxt_hpd_detection_setup [i915]] Invert bit setting: hp_ctl:10001818 hp_port:30
<7>[ 1875.409885] BUG: MAX_LOCKDEP_CHAINS too low!
<7>[ 1875.409886] turning off the locking correctness validator.
<4>[ 1875.409887] CPU: 3 PID: 16400 Comm: drv_selftest Tainted: G     U          4.14.0-rc8-CI-CI_DRM_3343+ #1
<4>[ 1875.409888] Hardware name:                  /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017
<4>[ 1875.409888] Call Trace:
<4>[ 1875.409889]  dump_stack+0x68/0x9f
<4>[ 1875.409889]  __lock_acquire+0x19d5/0x1b00
<4>[ 1875.409890]  lock_acquire+0xb0/0x200
<4>[ 1875.409891]  ? lock_acquire+0xb0/0x200
<4>[ 1875.409891]  ? console_unlock+0x97/0x570
<4>[ 1875.409892]  _raw_spin_lock+0x32/0x50
<4>[ 1875.409892]  ? console_unlock+0x97/0x570
<4>[ 1875.409893]  console_unlock+0x97/0x570
<4>[ 1875.409893]  vprintk_emit+0x254/0x3b0
<4>[ 1875.409894]  ? try_to_wake_up+0x2b4/0x620
<4>[ 1875.409894]  vprintk_default+0x1f/0x30
<4>[ 1875.409895]  vprintk_func+0x2c/0x90
<4>[ 1875.409896]  printk+0x43/0x4b
<4>[ 1875.409896]  ? __bxt_hpd_detection_setup+0x56/0xe0 [i915]
<4>[ 1875.409897]  drm_printk+0x8e/0x90
<4>[ 1875.409897]  ? fwtable_read32+0x251/0x2c0 [i915]
<4>[ 1875.409898]  ? fwtable_read32+0x8d/0x2c0 [i915]
<4>[ 1875.409899]  __bxt_hpd_detection_setup+0x56/0xe0 [i915]
<4>[ 1875.409899]  bxt_hpd_irq_setup+0x72/0x80 [i915]
<4>[ 1875.409900]  intel_hpd_init+0x93/0xa0 [i915]
<4>[ 1875.409900]  i915_driver_load+0x14b8/0x1600 [i915]
<4>[ 1875.409901]  i915_pci_probe+0x37/0x90 [i915]
<4>[ 1875.409902]  pci_device_probe+0xa8/0x130
<4>[ 1875.409902]  driver_probe_device+0x29c/0x450
<4>[ 1875.409903]  __driver_attach+0xe3/0xf0
<4>[ 1875.409903]  ? driver_probe_device+0x450/0x450
<4>[ 1875.409904]  bus_for_each_dev+0x62/0xa0
<4>[ 1875.409904]  driver_attach+0x1e/0x20
<4>[ 1875.409905]  bus_add_driver+0x173/0x270
<4>[ 1875.409906]  driver_register+0x60/0xe0
<4>[ 1875.409906]  __pci_register_driver+0x6b/0x70
<4>[ 1875.409907]  i915_init+0x6f/0x78 [i915]
<4>[ 1875.409908]  ? 0xffffffffa04c8000
<4>[ 1875.409908]  do_one_initcall+0x43/0x170
<4>[ 1875.409909]  ? rcu_read_lock_sched_held+0x7a/0x90
<4>[ 1875.409909]  ? kmem_cache_alloc_trace+0x270/0x2d0
<4>[ 1875.409910]  do_init_module+0x5f/0x206
<4>[ 1875.409910]  load_module+0x2581/0x2dd0
<4>[ 1875.409911]  ? show_coresize+0x30/0x30
<4>[ 1875.409912]  ? kernel_read+0x31/0x50
<4>[ 1875.409912]  SyS_finit_module+0xc1/0x100
<4>[ 1875.409913]  ? SyS_finit_module+0xc1/0x100
<4>[ 1875.409913]  entry_SYSCALL_64_fastpath+0x1c/0xb1
<4>[ 1875.409914] RIP: 0033:0x7fe4d9d839f9
<4>[ 1875.409915] RSP: 002b:00007ffe7bb1cbf8 EFLAGS: 00000202 ORIG_RAX: 0000000000000139
<4>[ 1875.409916] RAX: ffffffffffffffda RBX: ffffffff81492083 RCX: 00007fe4d9d839f9
<4>[ 1875.409917] RDX: 0000000000000000 RSI: 00007fe4dacb6e23 RDI: 0000000000000003
<4>[ 1875.409918] RBP: ffffc90000517f88 R08: 0000000000000000 R09: 0000000000000000
<4>[ 1875.409919] R10: 0000000000000003 R11: 0000000000000202 R12: 0000000000000000
<4>[ 1875.409919] R13: 000055e26b46ced0 R14: 0000000000000000 R15: 000055e26b464990
<4>[ 1875.409920]  ? __this_cpu_preempt_check+0x13/0x20
Comment 6 Marta Löfstedt 2017-11-16 13:20:55 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_7135/shard-apl8/igt@drv_selftest@live_contexts.html

<7>[ 1472.525693] BUG: MAX_LOCKDEP_CHAINS too low!
<7>[ 1472.525694] turning off the locking correctness validator.
<4>[ 1472.525698] CPU: 0 PID: 2592 Comm: kworker/0:3 Tainted: G     U          4.14.0-CI-Patchwork_7135+ #1
<4>[ 1472.525700] Hardware name:                  /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017
<4>[ 1472.525706] Workqueue: events output_poll_execute
<4>[ 1472.525708] Call Trace:
<4>[ 1472.525713]  dump_stack+0x68/0x9f
Comment 7 Marta Löfstedt 2017-11-30 09:24:21 UTC
live_coherency looks good to close.
live_evict as well
live_gtt incompletes on APL is covered by bug 103718
live_contexts has an issue https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3413/shard-apl8/igt@drv_selftest@live_contexts.html, but I will close this and file a new bug for that one instead.
Comment 8 Daniel Vetter 2017-11-30 10:57:42 UTC
Upstream said they won't fix the underlying issue, which means we simply have to carry the simple hack to up the MAX_LOCKDEPS_CHAINS values in our core-for-CI branch forever. Current version:

commit 281f8d96fa83efe67198608fe0d1792518c91925
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Nov 29 10:46:36 2017 +0100

    lockdep: Up MAX_LOCKDEP_CHAINS


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.