Bug 88688

Summary: [BDW ppgtt Bisected]igt/gem_reset_stats/ban-bsd causes system hang
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Nick Hoath <nicholas.hoath>
Status: CLOSED DUPLICATE QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: highest CC: intel-gfx-bugs
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg none

Description lu hua 2015-01-22 03:23:41 UTC
Created attachment 112638 [details]
dmesg

==System Environment==
--------------------------
Regression: Yes 
good commit:  6f34cc393f6407fbec91ff6d4fd1e29fe86b59d5
bad commit: 93180785d44e3d417099e293b9ff6eeb4fd20aa2

no-working platforms: BDW

==kernel==
--------------------------
drm-intel-nightly/d6bc7a6a0a7573350e8be8ec54002c20d1dbe1e0
commit d6bc7a6a0a7573350e8be8ec54002c20d1dbe1e0
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Jan 20 15:10:59 2015 +0100

    drm-intel-nightly: 2015y-01m-20d-14h-10m-40s UTC integration manifest

==Bug detailed description==
-----------------------------
It causes system hang on drm-intel-nightly and drm-intel-next-queued kernel.

output:
IGT-Version: 1.9-g3d65ff7 (x86_64) (Linux: 3.19.0-rc4_drm-intel-nightly_d6bc7a_20150121+ x86_64)
Subtest ban-bsd: SUCCESS (14.499s)
Killed

dmesg:
[  388.528466] BUG: unable to handle kernel paging request at 00000000ffffffff
[  388.530132] IP: [<ffffffff8110dc5e>] kmem_cache_alloc_trace+0xce/0x104
[  388.531802] PGD a1588067 PUD 0
[  388.533461] Oops: 0000 [#1] SMP
[  388.535115] Modules linked in: netconsole configfs ipv6 battery parport_pc parport dm_mod ac acpi_cpufreq i915 button video drm_kms_helper drm cfbfillrect cfbimgblt cfbcopyarea [last unloaded: netconsole]
[  388.536955] CPU: 1 PID: 4755 Comm: systemd-udevd Not tainted 3.19.0-rc4_drm-intel-nightly_d6bc7a_20150121+ #718
[  388.538755] task: ffff8800a75d9800 ti: ffff8800a1568000 task.ti: ffff8800a1568000
[  388.540552] RIP: 0010:[<ffffffff8110dc5e>]  [<ffffffff8110dc5e>] kmem_cache_alloc_trace+0xce/0x104
[  388.542392] RSP: 0018:ffff8800a156bbf8  EFLAGS: 00010282
[  388.544231] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000002257
[  388.546156] RDX: 0000000000002256 RSI: 00000000000000d0 RDI: ffff88014a003900
[  388.548021] RBP: 00000000ffffffff R08: 0000000000015520 R09: 0000000000000000
[  388.549886] R10: ffff880149c52780 R11: ffff8b919a899a8a R12: 00000000000000d0
[  388.551753] R13: 0000000000000080 R14: ffffffff8112e8e7 R15: ffff88014a003900
[  388.553622] FS:  00007f79f0568880(0000) GS:ffff88014ec40000(0000) knlGS:0000000000000000
[  388.555510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  388.557397] CR2: 00000000ffffffff CR3: 00000000a1587000 CR4: 00000000003407e0
[  388.559305] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  388.561198] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  388.563080] Stack:
[  388.564938]  ffff880002da8000 0000000000000000 ffff880002da8000 ffffffff817d22f0
[  388.566836]  ffff880002edca50 ffff880002da8010 ffffffff817d2660 ffffffff8112e8e7
[  388.568731]  0000000000000000 ffff880002e9d580 ffff880002da8000 ffffffff8116d5d5
[  388.570625] Call Trace:
[  388.572500]  [<ffffffff8112e8e7>] ? seq_open+0x2c/0x7f
[  388.574379]  [<ffffffff8116d5d5>] ? kernfs_fop_open+0x167/0x2be
[  388.576260]  [<ffffffff8116d46e>] ? kernfs_fop_release+0x4c/0x4c
[  388.578133]  [<ffffffff81111fe1>] ? do_dentry_open+0x184/0x2a6
[  388.579993]  [<ffffffff8111decc>] ? do_last+0x942/0xb75
[  388.581844]  [<ffffffff8111c2bd>] ? __inode_permission+0x2f/0x6e
[  388.583692]  [<ffffffff8111e231>] ? link_path_walk+0x51/0x74a
[  388.585523]  [<ffffffff8111f69c>] ? path_openat+0x20f/0x560
[  388.587352]  [<ffffffff81144eea>] ? ep_read_events_proc+0x92/0x92
[  388.589174]  [<ffffffff81120577>] ? do_filp_open+0x2b/0x6f
[  388.590980]  [<ffffffff81129b81>] ? __alloc_fd+0x58/0xe3
[  388.592780]  [<ffffffff811131b6>] ? do_sys_open+0x14b/0x1cf
[  388.594565]  [<ffffffff8179efd2>] ? system_call_fastpath+0x12/0x17
[  388.596351] Code: 7e 08 45 89 e1 49 89 d8 4c 89 e9 48 89 ea 4c 89 fe 41 ff 16 [  388.596467] [drm] Simulated gpu hang, resetting stop_rings
[  388.596468] drm/i915: Resetting chip after gpu hang
[  388.596491] [drm:gen8_init_common_ring] Execlists enabled for render ring
[  388.596494] [drm:gen8_init_common_ring] Execlists enabled for bsd ring
[  388.596505] [drm:gen8_init_common_ring] Execlists enabled for blitter ring
[  388.596506] ------------[ cut here ]------------


==Reproduce steps==
---------------------------- 
1. ./gem_reset_stats --run-subtest ban-bsd
Comment 1 Jani Nikula 2015-01-22 07:12:26 UTC
Seems like there's a problem outside of i915.
Comment 2 Chris Wilson 2015-01-22 07:46:04 UTC
No, this is memory corruption caused by i915.
Comment 3 Chris Wilson 2015-01-22 07:47:16 UTC
*** Bug 88685 has been marked as a duplicate of this bug. ***
Comment 4 lu hua 2015-02-03 05:46:13 UTC
I test on drm-intel-nightly kernel(98592c_20150122) with i915.enable_execlists=0, it works well.
Test on the latest drm-intel-nightly(8b4216_20150203) kernel, it takes more than 10 minutes and doesn't exit testing, I guess it has a new regression, We will file a new bug to this this on the latest drm-intel-nightly kernel.
Comment 5 lu hua 2015-02-03 06:39:57 UTC
(In reply to lu hua from comment #4)
> Test on the latest drm-intel-nightly(8b4216_20150203) kernel, it takes more
> than 10 minutes and doesn't exit testing, I guess it has a new regression,
> We will file a new bug to this this on the latest drm-intel-nightly kernel.

on the latest drm-intel-nightly kernel,it doesn't exit testing, reported bug 88933.
Comment 6 lu hua 2015-02-09 06:39:19 UTC
add i915.enable_ppgtt=0,it works well.
Comment 7 lu hua 2015-02-10 05:37:47 UTC
Bisect shows: 6d3d8274bc45de4babb62d64562d92af984dd238 is the first bad commit.
commit 6d3d8274bc45de4babb62d64562d92af984dd238
Author:     Nick Hoath <nicholas.hoath@intel.com>
AuthorDate: Thu Jan 15 13:10:39 2015 +0000
Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
CommitDate: Tue Jan 27 09:50:53 2015 +0100

    drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request

    Move all remaining elements that were unique to execlists queue items
    in to the associated request.

    Issue: VIZ-4274

    v2: Rebase. Fixed issue of overzealous freeing of request.
    v3: Removed re-addition of cleanup work queue (found by Daniel Vetter)
    v4: Rebase.
    v5: Actual removal of intel_ctx_submit_request. Update both tail and postfix
    pointer in __i915_add_request (found by Thomas Daniel)
    v6: Removed unrelated changes

    Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
    Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
    [danvet: Reformat comment with strange linebreaks.]
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 8 Jani Nikula 2015-02-10 08:18:14 UTC

*** This bug has been marked as a duplicate of bug 88652 ***
Comment 9 lu hua 2015-03-09 08:29:26 UTC
Fixed.
Comment 10 Elizabeth 2017-10-06 14:31:59 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.