Bug 99948 - [BDW] igt_ppgtt_shrink, OOPS gen8_ppgtt_insert_4lvl
Summary: [BDW] igt_ppgtt_shrink, OOPS gen8_ppgtt_insert_4lvl
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-02-24 18:16 UTC by mwa
Modified: 2017-02-27 07:51 UTC (History)
1 user (show)

See Also:
i915 platform: BDW
i915 features: GEM/PPGTT


Attachments
dmesg (74.04 KB, text/plain)
2017-02-24 18:16 UTC, mwa
no flags Details

Description mwa 2017-02-24 18:16:24 UTC
Created attachment 129903 [details]
dmesg

[   82.606518] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[   82.606612] IP: gen8_ppgtt_insert_4lvl+0xc0/0x2e0 [i915]
[   82.606639] PGD 22f436067 
[   82.606640] PUD 22b54f067 
[   82.606655] PMD 0 

[   82.606691] Oops: 0000 [#1] SMP
[   82.606709] Modules linked in: i915(+) drm_kms_helper drm rfcomm fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw ip6table_mangle ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat libcrc32c nf_conntrack iptable_raw iptable_mangle iptable_security ebtable_filter ebtables ip6table_filter ip6_tables cmac bnep arc4 iwlmvm mac80211 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda_codec_generic iwlwifi snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support mei_wdt snd_hda_codec irqbypass
[   82.607030]  intel_cstate snd_hwdep intel_uncore snd_hda_core intel_rapl_perf cfg80211 btusb snd_seq btrtl intel_pch_thermal btbcm rtsx_pci_ms btintel i2c_i801 bluetooth memstick lpc_ich snd_seq_device mei_me snd_pcm joydev mei shpchp snd_timer thinkpad_acpi snd nfsd wmi soundcore tpm_tis rfkill tpm_tis_core tpm intel_rst auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc dm_crypt hid_logitech_hidpp hid_logitech_dj hid_microsoft prime_numbers i2c_algo_bit rtsx_pci_sdmmc mmc_core e1000e crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ptp rtsx_pci serio_raw pps_core fjes video [last unloaded: drm]
[   82.607282] CPU: 0 PID: 2983 Comm: drv_selftest Tainted: G     U          4.10.0-debug+ #251
[   82.607321] Hardware name: LENOVO 20BW000FUK/20BW000FUK, BIOS JBET54WW (1.19 ) 11/06/2015
[   82.607358] task: ffff8d44f3fd8000 task.stack: ffff9e2ec1ba8000
[   82.607422] RIP: 0010:gen8_ppgtt_insert_4lvl+0xc0/0x2e0 [i915]
[   82.607451] RSP: 0018:ffff9e2ec1bab7d0 EFLAGS: 00010246
[   82.607476] RAX: 0000000000000000 RBX: 0000000001200000 RCX: ffff8d4477fda5a0
[   82.607509] RDX: 00000000000001ff RSI: 00000000011ff01b RDI: ffff8d449d83b000
[   82.607541] RBP: ffff9e2ec1bab860 R08: ffff8d4477fdc000 R09: ffff8d44f3fd8000
[   82.607574] R10: 0000000000000000 R11: 0000000000000200 R12: 0000000081000000
[   82.607607] R13: ffff8d44832f5000 R14: 000000000000001b R15: 0000000087654321
[   82.607640] FS:  00007f59bd78cdc0(0000) GS:ffff8d44fd800000(0000) knlGS:0000000000000000
[   82.607677] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   82.607704] CR2: 0000000000000010 CR3: 00000001ddb88000 CR4: 00000000003406f0
[   82.607738] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   82.607770] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   82.607803] Call Trace:
[   82.607854]  ppgtt_bind_vma+0x71/0x80 [i915]
[   82.607914]  i915_vma_bind+0xb4/0x150 [i915]
[   82.607973]  __i915_vma_do_pin+0x2cc/0x500 [i915]
[   82.608031]  shrink_hole+0x28a/0x350 [i915]
[   82.610086]  exercise_ppgtt+0xac/0x110 [i915]
[   82.612125]  ? drunk_hole+0x400/0x400 [i915]
[   82.614147]  igt_ppgtt_shrink+0x15/0x20 [i915]
[   82.616162]  __i915_subtests+0x3c/0xc0 [i915]
[   82.618150]  i915_gem_gtt_live_selftests+0x2f/0x40 [i915]
[   82.620139]  __run_selftests+0x113/0x1c0 [i915]
[   82.622113]  i915_live_selftests+0x35/0x60 [i915]
[   82.624069]  i915_pci_probe+0x67/0xb0 [i915]
[   82.625992]  local_pci_probe+0x45/0xa0
[   82.627894]  pci_device_probe+0x103/0x150
[   82.629783]  driver_probe_device+0x2bb/0x460
[   82.631664]  __driver_attach+0xdf/0xf0
[   82.633544]  ? driver_probe_device+0x460/0x460
[   82.635416]  bus_for_each_dev+0x6c/0xc0
[   82.637276]  driver_attach+0x1e/0x20
[   82.639120]  bus_add_driver+0x170/0x270
[   82.640954]  driver_register+0x60/0xe0
[   82.642775]  __pci_register_driver+0x4c/0x50
[   82.644609]  i915_init+0x6f/0x78 [i915]
[   82.646402]  ? 0xffffffffc0324000
[   82.648200]  do_one_initcall+0x52/0x1a0
[   82.649980]  ? __vunmap+0x81/0xd0
[   82.651748]  ? kmem_cache_alloc_trace+0x167/0x1c0
[   82.653498]  ? do_init_module+0x27/0x1f8
[   82.654940]  do_init_module+0x5f/0x1f8
[   82.656325]  load_module+0x25d7/0x29b0
[   82.657662]  ? __symbol_put+0x70/0x70
[   82.658966]  ? vfs_read+0x11b/0x130
[   82.660257]  SYSC_finit_module+0xdf/0x110
[   82.661541]  SyS_finit_module+0xe/0x10
[   82.662820]  entry_SYSCALL_64_fastpath+0x1a/0xa9
[   82.664100] RIP: 0033:0x7f59bbfc1bf9
[   82.665371] RSP: 002b:00007ffcb9aeb148 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   82.666650] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f59bbfc1bf9
[   82.667926] RDX: 0000000000000000 RSI: 0000000001bcf200 RDI: 0000000000000008
[   82.669203] RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000000
[   82.670484] R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffcb9aea140
[   82.671769] R13: 00007ffcb9aea120 R14: 0000000000000005 R15: 0000000001bcab20
[   82.673066] Code: 01 41 be 1b 00 00 00 89 4d 9c 48 8b 4d c8 4c 8b 14 c1 8b 45 b0 85 c0 74 12 83 f8 03 41 be 13 00 00 00 b8 83 00 00 00 4c 0f 45 f0 <49> 8b 42 10 48 8b 4d c0 4c 8b 04 08 48 8b 45 a8 49 8b 44 c0 10 
[   82.674511] RIP: gen8_ppgtt_insert_4lvl+0xc0/0x2e0 [i915] RSP: ffff9e2ec1bab7d0
[   82.676197] CR2: 0000000000000010
Comment 1 Chris Wilson 2017-02-24 18:21:59 UTC
Reproducible? That's near the start of gen8_ppgtt_insert_4lvl, could you translate it to a line?
Comment 2 mwa 2017-02-24 19:39:17 UTC
Seemed reproducible.

The line should be:

834		pd = pdp->page_directory[pdpe];

So 0x10 will be pdp->page_directory.
Comment 3 Chris Wilson 2017-02-24 19:49:40 UTC
Unlikely: diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d4ec05f58ce4..4320c9a764ed 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -910,7 +910,7 @@ static void gen8_ppgtt_insert_4lvl(struct i915_address_space *vm,
 
        while (gen8_ppgtt_insert_pte_entries(ppgtt, pdps[pml4e++], &iter,
                                             start, cache_level))
-               ;
+               GEM_BUG_ON(pml4e >= GEN8_PML4ES_PER_PML4);
 }
 
 static void gen8_free_page_tables(struct i915_address_space *vm,
Comment 4 Chris Wilson 2017-02-24 19:58:17 UTC
and

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 8e33a8bde78a..187444c37bbb 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -62,12 +62,14 @@ fake_get_pages(struct drm_i915_gem_object *obj)
        for (sg = pages->sgl; sg; sg = sg_next(sg)) {
                unsigned long len = min_t(typeof(rem), rem, BIT(31));
 
+               GEM_BUG_ON(!len);
                sg_set_page(sg, pfn_to_page(PFN_BIAS), len, 0);
                sg_dma_address(sg) = page_to_phys(sg_page(sg));
                sg_dma_len(sg) = len;
 
                rem -= len;
        }
+       GEM_BUG_ON(rem);
 
        obj->mm.madv = I915_MADV_DONTNEED;
        return pages;
Comment 5 mwa 2017-02-24 20:10:06 UTC
Hit the GEM_BUG_ON(pml4e >= GEN8_PML4ES_PER_PML4)
Comment 6 Chris Wilson 2017-02-25 19:04:08 UTC
commit 9e89f9ee3b16cca56bed5fa45e63f422d3ac2c3a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Feb 25 18:11:22 2017 +0000

    drm/i915: Advance start address on crossing PML (48b ppgtt) boundary
    
    When advancing onto the next 4th level page table entry, we need to
    reset our indices to 0. Currently we restart from the original address
    which means we start with an offset into the next PML table.
    
    Fixes: 894ccebee2b0 ("drm/i915: Micro-optimise gen8_ppgtt_insert_entries()")
    Reported-by: Matthew Auld <matthew.william.auld@gmail.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99948
    Testcase: igt/drv_selftest/live_gtt
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Matthew Auld <matthew.william.auld@gmail.com>
    Tested-by: Matthew Auld <matthew.william.auld@gmail.com>
    Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
    Link: http://patchwork.freedesktop.org/patch/msgid/20170225181122.4788-4-chris@chris-wilson.co.uk


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.