Bug 75785 - [BDW Regression]gnome-session causes system hang
Summary: [BDW Regression]gnome-session causes system hang
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: highest major
Assignee: Ben Widawsky
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-05 05:47 UTC by lu hua
Modified: 2016-10-12 08:58 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (22.68 KB, text/plain)
2014-03-05 05:47 UTC, lu hua
no flags Details

Description lu hua 2014-03-05 05:47:04 UTC
Created attachment 95132 [details]
dmesg

System Environment:
--------------------------
Platform: Broadwell
kernel:   drm-intel-next-queued/03494932b88b57cb4a177f754bb0c87bf342d1b8

Bug detailed description:
---------------------------
Clean boot system, run xinit, gnome-session, system will hang.
It happens on Broadwell with -nightly and -queued kernel.

The latest good commit: b5ea642a76ee0884a4d378b4d5fe290ddb461524
The latest bad commit: 03494932b88b57cb4a177f754bb0c87bf342d1b8

Reproduce steps:
---------------------------- 
1. xinit
2. gnome-session
Comment 1 Chris Wilson 2014-03-05 07:12:41 UTC
Please bisect. Note that there is a GPU hang involved as well.
Comment 2 Ben Widawsky 2014-03-05 22:28:35 UTC
Is this the same bug which is board rework dependent?
Comment 3 lu hua 2014-03-06 01:23:26 UTC
(In reply to comment #2)
> Is this the same bug which is board rework dependent?

It's an independent bug.
Comment 4 lu hua 2014-03-06 08:28:12 UTC
Bisect shows: 307dc4f99f6d3a74a78b0e776838f35b2004f14d is the first bad commit
commit 307dc4f99f6d3a74a78b0e776838f35b2004f14d
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date:   Thu Feb 20 11:51:21 2014 -0800

    drm/i915/bdw: Reorganize PT allocations

    The previous allocation mechanism would get 2 contiguous allocations,
    one for the page directories, and one for the page tables. As each page
    table is 1 page, and there are 512 of these per page directory, this
    goes to 2MB. An unfriendly request at best. Worse still, our HW now
    supports 4 page directories, and a 2MB allocation is not allowed.

    In order to fix this, this patch attempts to split up each page table
    allocation into a single, discrete allocation. There is nothing really
    fancy about the patch itself, it just has to manage an extra pointer
    indirection, and have a fancier bit of logic to free up the pages.

    To accommodate some of the added complexity, two new helpers are
    introduced to allocate, and free the page table pages.

    NOTE: I really wanted to split the way we do allocations, and the way in
    which we identify the page table/page directory being used. I found
    splitting this functionality up to be too unwieldy. I apologize in
    advance to the reviewer. I'd recommend looking at the result, rather
    than the diff.

    v2/NOTE2: This patch predated commit:
    6f1cc993518462ccf039e195fabd47e7aa5bfd13
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Tue Dec 31 15:50:31 2013 +0000

        drm/i915: Avoid dereference past end of page arr

    It fixed the same issue as that patch, but because of the limbo state of
    PPGTT, Chris patch was merged instead. The excess churn is a result of
    my using my original patch, which has my preferred naming. Primarily
    act_* is changed to which_*, but it's mostly the same otherwise. I've
    kept the convention Chris used for the pte wrap (I had something
    slightly different, and broken - but fixable)

    v3: Rename which_p[..]e to drop which_ (Chris)
    Remove BUG_ON in inner loop (Chris)
    Redo the pde/pdpe wrap logic (Chris)

    v4: s/1MB/2MB in commit message (Imre)
    Plug leaking gen8_pt_pages in both the error path, as well as general
    free case (Imre)

    v5: Rename leftover "which_" variables (Imre)
    Add the pde = 0 wrap that was missed from v3 (Imre)

    Reviewed-by: Imre Deak <imre.deak@intel.com>
    Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 5 Imre Deak 2014-03-06 09:21:08 UTC
Could be the bug fixed by

http://lists.freedesktop.org/archives/intel-gfx/2014-March/040996.html

It's already in -nightly, could you retest?
Comment 6 lu hua 2014-03-07 03:11:26 UTC
Fixed on latest -nightly kernel.Close it.
Comment 7 lu hua 2014-03-07 03:12:04 UTC
Verified.Fixed.
Comment 8 Jari Tahvanainen 2016-10-12 08:58:03 UTC
Closing verified+fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.