Bug 89350

Summary: [BSW bisected] kernel fails to start up
Product: DRI Reporter: Ding Heng <hengx.ding>
Component: DRM/IntelAssignee: Mika Kuoppala <mika.kuoppala>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: blocker    
Priority: highest CC: eero.t.tamminen, intel-gfx-bugs, valtteri.rantala
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
boot log
none
drm/i915: Always setup all page directories for gen8
none
drm/i915: Setup all page directories for gen8 none

Description Ding Heng 2015-02-27 03:22:04 UTC
==System Environment==
--------------------------
Non-working platforms: BSW

Regression :No, this is a new issue.

==kernel==
--------------------------
origin/drm-intel-nightly: 376ebc108c774b357e211b5fc0aab558fc89ede9(2015-02-27)  


==Bug detailed description==
-----------------------------
start BSW with latest kernel will fail, monitor will fail to show anything after the kernel is loaded. We've no idea where it stops. Start up process will turn normal when switch to an earlier kernel(nightly branch 2015-02-13 will work).



==Reproduce steps==
---------------------------- 
 Boot up with latest kernel
Comment 1 Ding Heng 2015-02-27 05:26:29 UTC
nightly branch 2015-02-27 with "modeprobe.blacklist=i915" will not fail to boot.
Comment 2 lu hua 2015-02-27 05:46:00 UTC
It's a regression on drm-intel-next-queued kernel.
good commit: cda54fe1188d4900843c0616acab7fb9c2989eef
bad commit: 626ad6f37de1620e9ccd6b28f1743ee959b582c6
Comment 3 Chris Wilson 2015-02-27 09:54:42 UTC
Bisect pending?
Comment 4 Ding Heng 2015-02-28 06:06:25 UTC
06fda602dbca9c59d87db7da71192e4b54c9f5ff is the first bad commit
commit 06fda602dbca9c59d87db7da71192e4b54c9f5ff
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Tue Feb 24 16:22:36 2015 +0000

    drm/i915: Create page table allocators

    As we move toward dynamic page table allocation, it becomes much easier
    to manage our data structures if break do things less coarsely by
    breaking up all of our actions into individual tasks.  This makes the
    code easier to write, read, and verify.

    Aside from the dissection of the allocation functions, the patch
    statically allocates the page table structures without a page directory.
    This remains the same for all platforms,

    The patch itself should not have much functional difference. The primary
    noticeable difference is the fact that page tables are no longer
    allocated, but rather statically declared as part of the page directory.
    This has non-zero overhead, but things gain additional complexity as a
    result.

    This patch exists for a few reasons:
    1. Splitting out the functions allows easily combining GEN6 and GEN8
    code. Page tables have no difference based on GEN8. As we'll see in a
    future patch when we add the DMA mappings to the allocations, it
    requires only one small change to make work, and error handling should
    just fall into place.

    2. Unless we always want to allocate all page tables under a given PDE,
    we'll have to eventually break this up into an array of pointers (or
    pointer to pointer).

    3. Having the discrete functions is easier to review, and understand.
    All allocations and frees now take place in just a couple of locations.
    Reviewing, and catching leaks should be easy.

    4. Less important: the GFP flags are confined to one location, which
    makes playing around with such things trivial.

    v2: Updated commit message to explain why this patch exists

    v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/

    v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)

    v5: Added additional safety checks in gen8 clear/free/unmap.

    v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).

    v7: Make err_out loop symmetrical to the way we allocate in
    alloc_pt_range. Also s/page_tables/page_table and correct commit
    message (Mika)

    Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
    Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
    Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
    Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 c47b91928e0ccde606c69fe35de783ac94c8d583 53a381d20f2e733f82b9a55c0bfe3c07c2ba2808 M      drivers
Comment 5 Ben Widawsky 2015-03-03 01:57:00 UTC
This patch has been reworked a lot since I originally authored it. I'd suggest one of Daniel, Mika, or Michel own this instead. I can help if they get stuck.

Reassigning...
Comment 6 ye.tian 2015-03-03 02:11:20 UTC
Created attachment 113931 [details]
boot log

   29.354302] Call Trace:
[   29.383655]  [<ffffffffa00b0af6>] ? intel_logical_rings_init+0xf3/0x42e [i915]
[   29.470313]  [<ffffffffa009ad9e>] ? i915_gem_init+0x17c/0x1cb [i915]
[   29.546559]  [<ffffffffa00fd843>] ? i915_driver_load+0xe3e/0x1047 [i915]
[   29.626898]  [<ffffffff81064b54>] ? __wake_up+0x33/0x44
[   29.689524]  [<ffffffff8104a1fb>] ? call_usermodehelper_exec+0xee/0xfd
[   29.767773]  [<ffffffff81330000>] ? bsg_open+0xd7/0x28d
[   29.830395]  [<ffffffff81338929>] ? kobject_uevent_env+0x47f/0x4b7
[   29.904483]  [<ffffffff813e65ff>] ? get_device+0xf/0x18
[   29.967111]  [<ffffffff8178ca92>] ? klist_add_tail+0x1c/0x3e
[   30.034941]  [<ffffffff813e7a35>] ? device_add+0x4d6/0x4e7
[   30.100688]  [<ffffffff813366d2>] ? idr_replace+0x2d/0x93
[   30.165408]  [<ffffffffa0007721>] ? drm_dev_register+0x73/0xe5 [drm]
[   30.241585]  [<ffffffffa00098e0>] ? drm_get_pci_dev+0xf7/0x1b3 [drm]
[   30.317750]  [<ffffffff8135f631>] ? local_pci_probe+0x35/0x79
[   30.386623]  [<ffffffff8135f740>] ? pci_device_probe+0xcb/0xef
[   30.456538]  [<ffffffff813e9af2>] ? driver_probe_device+0x9c/0x1d1
[   30.530625]  [<ffffffff813e9caf>] ? __driver_attach+0x53/0x73
[   30.599497]  [<ffffffff813e9c5c>] ? __device_attach+0x35/0x35
[   30.668369]  [<ffffffff813e83c5>] ? bus_for_each_dev+0x6e/0x78
[   30.738284]  [<ffffffff813e93c0>] ? bus_add_driver+0x101/0x1cb
[   30.808198]  [<ffffffff813ea2a5>] ? driver_register+0x83/0xbb
[   30.877070]  [<ffffffffa0145000>] ? 0xffffffffa0145000
[   30.938650]  [<ffffffff810002fd>] ? do_one_initcall+0xe2/0x161
[   31.008570]  [<ffffffff8110b04f>] ? kmem_cache_alloc_trace+0x2a/0xfb
[   31.084736]  [<ffffffff8179032b>] ? do_init_module+0x55/0x1b5
[   31.153613]  [<ffffffff8108f1b5>] ? load_module+0x1479/0x1951
[   31.222488]  [<ffffffff8108cd08>] ? store_uevent+0x36/0x36
[   31.288236]  [<ffffffff8179a922>] ? page_fault+0x22/0x30
[   31.351900]  [<ffffffff8108f71b>] ? SyS_init_module+0x8e/0x99
[   31.420773]  [<ffffffff81798e92>] ? system_call_fastpath+0x12/0x17
[   31.494849] Code: b8 00 00 00 41 8b 4e 0c 8d 91 74 02 00 00 89 90 c0 00 00 00 41 8b 76 0c 8d 96 70 02 00 00 89 90 c8 00 00 00 49 8b 97 90 01 00 00 <8b> 52 0c 89 90 94 00 00 00 49 8b 97 90 01 00 00 48 8b 52 08 89
[   31.727814] RIP  [<ffffffffa00b06aa>] intel_lr_context_deferred_create+0x541/0x7f5 [i915]
[   31.826034]  RSP <ffff880175447958>
[   31.867819] CR2: 000000000000000c
[   31.907521] ---[ end trace 9b8866b9721808b2 ]---
Comment 7 Mika Kuoppala 2015-03-03 11:28:37 UTC
Created attachment 113947 [details] [review]
drm/i915: Always setup all page directories for gen8
Comment 8 Ding Heng 2015-03-04 06:46:32 UTC
(In reply to Mika Kuoppala from comment #7)
> Created attachment 113947 [details] [review] [review]
> drm/i915: Always setup all page directories for gen8

I installed this patch to the latest nightly branch commit a5217f77503a1089aeb8a9f4e3731e29c1ac2d41

The BSW machine could start up now.
Comment 9 Mika Kuoppala 2015-03-04 13:04:58 UTC
Created attachment 113995 [details] [review]
drm/i915: Setup all page directories for gen8
Comment 10 Mika Kuoppala 2015-03-04 13:06:41 UTC
More proper fix. Previous one bloated the ppgtt size past ggtt parts and break aliasing. Please test
Comment 11 Jeff Zheng 2015-03-05 00:55:32 UTC
(In reply to Mika Kuoppala from comment #9)
> Created attachment 113995 [details] [review] [review]
> drm/i915: Setup all page directories for gen8

I apply this patch upon drm-intel-testing-2015-02-27 and bsw boots.
Comment 12 ye.tian 2015-03-06 03:09:35 UTC
This bug does not exists on the latest kernel(nightly-63cd2a).
Verified it.
Comment 13 Elizabeth 2017-10-06 14:31:21 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.