Bug 103454 - [IGT] drv_selftest/live_contexts WARN_ON(i915_gem_object_has_pinned_pages(obj))
Summary: [IGT] drv_selftest/live_contexts WARN_ON(i915_gem_object_has_pinned_pages(obj))
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: high critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-25 15:14 UTC by Elizabeth
Modified: 2018-02-13 16:41 UTC (History)
1 user (show)

See Also:
i915 platform: SKL
i915 features:


Attachments
dmesg_live_contexts (117.63 KB, text/plain)
2017-10-25 15:14 UTC, Elizabeth
no flags Details

Description Elizabeth 2017-10-25 15:14:12 UTC
Created attachment 135040 [details]
dmesg_live_contexts

drv_selftests@live_contexts gets stuck forever, dmesg shows warn:

WARN_ON(i915_gem_object_has_pinned_pages(obj))
[ 1170.532089] ------------[ cut here ]------------
[ 1170.532119] WARNING: CPU: 6 PID: 72 at drivers/gpu/drm/i915/i915_gem.c:4553 __i915_gem_free_objects+0x2ad/0x2c0 [i915]
[ 1170.532119] Modules linked in: i915(+) drm_kms_helper drm ip6table_filter ip6_tables cmac bnep iptable_filter arc4 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec snd_hda_core snd_hwdep intel_rapl x86_pkg_temp_thermal snd_pcm intel_powerclamp coretemp snd_seq_midi snd_seq_midi_event kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iwlmvm pcbc snd_rawmidi mac80211 snd_seq snd_seq_device snd_timer aesni_intel binfmt_misc aes_x86_64 crypto_simd glue_helper cryptd intel_cstate nls_iso8859_1 intel_rapl_perf input_leds ir_rc6_decoder snd iwlwifi cfg80211 btusb btrtl btbcm soundcore mei_me mei shpchp intel_pch_thermal hci_uart btqca rc_rc6_mce btintel bluetooth ir_lirc_codec lirc_dev nuvoton_cir rc_core ecdh_generic intel_lpss_acpi intel_lpss acpi_als
[ 1170.532141]  acpi_pad mac_hid kfifo_buf industrialio parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid i2c_algo_bit prime_numbers syscopyarea sysfillrect sysimgblt e1000e fb_sys_fops nvme sdhci_pci ptp pps_core nvme_core sdhci wmi pinctrl_sunrisepoint i2c_hid video pinctrl_intel hid [last unloaded: drm]
[ 1170.532152] CPU: 6 PID: 72 Comm: kworker/6:1 Tainted: G     U          4.14.0-rc6-drm-intel-qa-ww43-commit-5c82a37+ #1
[ 1170.532153] Hardware name:                  /NUC6i7KYB, BIOS KYSKLi70.86A.0050.2017.0831.1924 08/31/2017
[ 1170.532178] Workqueue: events __i915_gem_free_work [i915]
[ 1170.532178] task: ffff8fb999d74d80 task.stack: ffffa951833fc000
[ 1170.532200] RIP: 0010:__i915_gem_free_objects+0x2ad/0x2c0 [i915]
[ 1170.532200] RSP: 0018:ffffa951833ffdf0 EFLAGS: 00010286
[ 1170.532201] RAX: 000000000000002e RBX: ffff8fb2d4ad4300 RCX: 0000000000000000
[ 1170.532202] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000246
[ 1170.532202] RBP: ffffa951833ffe30 R08: 000000000000002e R09: ffffffff8e352300
[ 1170.532203] R10: ffffa951831efe38 R11: 000000000000002e R12: ffff8fb2d4ad43f8
[ 1170.532203] R13: ffff8fb99850ad00 R14: ffff8fb2c3df8000 R15: ffff8fb2c3dfbf10
[ 1170.532204] FS:  0000000000000000(0000) GS:ffff8fb9be980000(0000) knlGS:0000000000000000
[ 1170.532204] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1170.532205] CR2: 0000561c30bb7e80 CR3: 000000064c609003 CR4: 00000000003606e0
[ 1170.532206] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1170.532206] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1170.532207] Call Trace:
[ 1170.532228]  __i915_gem_free_work+0x38/0x50 [i915]
[ 1170.532231]  process_one_work+0x156/0x410
[ 1170.532232]  worker_thread+0x4b/0x410
[ 1170.532234]  kthread+0x109/0x140
[ 1170.532235]  ? process_one_work+0x410/0x410
[ 1170.532236]  ? kthread_create_on_node+0x70/0x70
[ 1170.532238]  ret_from_fork+0x25/0x30
[ 1170.532239] Code: 49 8b 04 24 48 85 c0 75 e9 e9 d6 fd ff ff e8 fb 23 9f cc e9 d5 fe ff ff 48 c7 c6 30 59 f3 c0 48 c7 c7 b9 4b f2 c0 e8 04 ad 49 cc <0f> ff c7 83 d8 01 00 00 00 00 00 00 e9 7e fe ff ff 66 90 0f 1f 
[ 1170.532254] ---[ end trace 5d85e68ee1f1aef7 ]---
Comment 1 Chris Wilson 2017-10-25 15:18:11 UTC
kasan would be useful here; looks like a double unref (or missing ref).
Comment 2 Elizabeth 2017-10-26 17:16:17 UTC
Rising priority since it got our test cycles stuck for hours until we manually power platforms off.
Comment 3 Chris Wilson 2017-10-26 20:06:17 UTC
Now would seem to be the perfect time to work on automated oops handling (or just turn on panic_on_oops)... I assure you this won't be the last.
Comment 4 Chris Wilson 2017-11-15 21:30:01 UTC
commit f991c492aa55fb1c6834882c5d786d5bb3b25f07
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Nov 6 11:15:08 2017 +0000

    drm/i915: Lock llist_del_first() vs llist_del_all()
    
    An oversight in commit 87701b4b5593 ("drm/i915: Only free the oldest
    stale object before a fresh allocation") was that not only do we have to
    serialise concurrent users of llist_del_first(), but we also have to
    lock llist_del_first() vs llist_del_all().
    
    From llist.h,
    
     * This can be summarized as follows:
     *
     *           |   add    | del_first |  del_all
     * add       |    -     |     -     |     -
     * del_first |          |     L     |     L
     * del_all   |          |           |     -
     *
     * Where, a particular row's operation can happen concurrently with a column's
     * operation, with "-" being no lock needed, while "L" being lock is needed.
    
    This should hopefully explain:
    
    <4>[   89.287106] general protection fault: 0000 [#1] PREEMPT SMP
    <4>[   89.287126] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 mii mei_me mei snd_pcm prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel
    <4>[   89.287226] CPU: 2 PID: 23 Comm: ksoftirqd/2 Tainted: G     U          4.14.0-rc8-CI-CI_DRM_3315+ #1
    <4>[   89.287247] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
    <4>[   89.287270] task: ffff88017ab34ec0 task.stack: ffffc90000128000
    <4>[   89.287290] RIP: 0010:llist_add_batch+0x4/0x20
    <4>[   89.287301] RSP: 0018:ffffc9000012bdb8 EFLAGS: 00010296
    <4>[   89.287314] RAX: ffffffff811017ad RBX: 6e468801a1560000 RCX: ef3e53fceecdeb81
    <4>[   89.287330] RDX: 6e468801a1566130 RSI: ffff880103d73d98 RDI: ffff880103d73d98
    <4>[   89.287346] RBP: ffffc9000012bdb8 R08: ffff88017ab35780 R09: 0000000000000000
    <4>[   89.287361] R10: ffffc9000012bd68 R11: 00000000abb18c3d R12: ffffffffa01369e0
    <4>[   89.287377] R13: ffff88017fd1b8f8 R14: ffff88017ab34ec0 R15: 000000000000000a
    <4>[   89.287393] FS:  0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000
    <4>[   89.287411] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    <4>[   89.287424] CR2: 00007ff0c0755018 CR3: 000000016df9b000 CR4: 00000000003406e0
    <4>[   89.287440] Call Trace:
    <4>[   89.287511]  __i915_gem_free_object_rcu+0x20/0x40 [i915]
    <4>[   89.287527]  rcu_process_callbacks+0x27a/0x730
    <4>[   89.287544]  __do_softirq+0xc0/0x4ae
    <4>[   89.287559]  ? smpboot_thread_fn+0x2d/0x280
    <4>[   89.287571]  run_ksoftirqd+0x1f/0x70
    <4>[   89.287582]  smpboot_thread_fn+0x18a/0x280
    <4>[   89.287595]  kthread+0x114/0x150
    <4>[   89.287605]  ? sort_range+0x30/0x30
    <4>[   89.287615]  ? kthread_create_on_node+0x40/0x40
    <4>[   89.287628]  ret_from_fork+0x27/0x40
    <4>[   89.287641] Code: 0d 48 83 ea 01 4c 89 c1 48 83 fa ff 74 12 48 23 0c d7 74 ed 48 c1 e2 06 48 0f bd c9 48 8d 04 0a 5d c3 90 90 90 90 90 55 48 89 e5 <48> 8b 0a 48 89 0e 48 89 c8 f0 48 0f b1 3a 48 39 c1 75 ed 48 85
    <1>[   89.287774] RIP: llist_add_batch+0x4/0x20 RSP: ffffc9000012bdb8
    <4>[   89.287826] ---[ end trace e775d15174d8ae02 ]---
    
    (Lockless lists are only easy (and lockless) when only using
    llist_add/llist_del_all!)
Comment 5 Elizabeth 2017-11-15 22:25:45 UTC
$ : sudo -E ./drv_selftest --r live_contexts --d
IGT-Version: 1.20-g88d6550 (x86_64) (Linux: 4.14.0-drm-intel-qa-ww46-commit-06718a2+ x86_64)
(drv_selftest:12645) igt-kmod-DEBUG: Test requirement passed: err == 0 || err == -ENOENT
(drv_selftest:12645) igt-kmod-DEBUG: Test requirement passed: igt_kselftest_begin(&tst) == 0
(drv_selftest:12645) igt-kmod-DEBUG: Test requirement passed: !igt_list_empty(&tests)
(drv_selftest:12645) igt-kmod-DEBUG: Test requirement passed: err == 0 || err == -ENOENT
(drv_selftest:12645) igt-kmod-DEBUG: Test requirement passed: igt_kselftest_begin(&tst) == 0
(drv_selftest:12645) igt-core-DEBUG: Starting subtest: live_contexts
Subtest live_contexts: SUCCESS (1.988s)
(drv_selftest:12645) igt-kmod-DEBUG: Test requirement passed: !igt_list_empty(&tests)
(drv_selftest:12645) igt-core-DEBUG: Exiting with status code 0

Verified.
Comment 6 Elizabeth 2018-02-13 16:41:48 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.