Bug 80984 - [GM45] Eaglelake gen4.5 - GPU HANG: ecode -1:0x00000000 when using DRI_PRIME
Summary: [GM45] Eaglelake gen4.5 - GPU HANG: ecode -1:0x00000000 when using DRI_PRIME
Status: CLOSED WONTFIX
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-06 20:44 UTC by Shawn Starr
Modified: 2017-07-24 22:53 UTC (History)
1 user (show)

See Also:
i915 platform: GM45
i915 features: GPU hang


Attachments
Crash dump from /sys/class/drm/card0/error (193.50 KB, text/plain)
2014-07-06 20:44 UTC, Shawn Starr
no flags Details
GPU crash dump from sysfs (1.40 MB, text/plain)
2014-07-06 20:48 UTC, Shawn Starr
no flags Details
Avoid struct mutex recursion. (6.21 KB, patch)
2014-07-07 08:03 UTC, Chris Wilson
no flags Details | Splinter Review

Description Shawn Starr 2014-07-06 20:44:52 UTC
Created attachment 102332 [details]
Crash dump from /sys/class/drm/card0/error

Kernel: kernel-3.16.0-0.rc3.git3.1.fc21.x86_64
MESA: mesa-dri-drivers-10.2.2-3.20140625.fc21.x86_64
Xorg: xorg-x11-server-Xorg-1.15.99.903-100.fc21.x86_64 (patched with [PATCH] dri2: Use the PrimeScreen when creating/reusing buffers) 


Playing Second Life with radeon GPU offload to Intel GPU running with export LIBGL_DRI3_DISABLE=1 since DRI3 in this Mesa has no DRI3 DRI_PRIME support yet.

Kernel spit out this error:

[  293.644011] [drm] GPU HANG: ecode -1:0x00000000, reason: Command parser error, iir 0x00008010, action: continue
[  293.649949] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[  293.649949] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[  293.649949] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[  293.649949] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[  293.649949] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  293.649949] i915: render error detected, EIR: 0x00000010
[  293.649949] i915:   IPEIR: 0x00000000
[  293.649949] i915:   IPEHR: 0x01000000
[  293.649949] i915:   INSTDONE_0: 0xfffffffe
[  293.649949] i915:   INSTDONE_1: 0xffffffff
[  293.649949] i915:   INSTDONE_2: 0x00000000
[  293.649949] i915:   INSTDONE_3: 0x00000000
[  293.649949] i915:   INSTPS: 0x0001e000
[  293.649949] i915:   ACTHD: 0x0181c148
[  293.649949] i915: page table error
[  293.649949] i915:   PGTBL_ER: 0x00000001
[  293.649949] [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking

[  331.792605] =============================================
[  331.792782] [ INFO: possible recursive locking detected ]
[  331.792971] 3.16.0-0.rc3.git3.1.fc21.x86_64 #1 Tainted: G        W    
[  331.793012] ---------------------------------------------
[  331.793012] Xorg.bin/1015 is trying to acquire lock:
[  331.793012]  (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa010a919>] i915_gem_unmap_dma_buf+0x39/0x110 [i915]
[  331.793012] 
but task is already holding lock:
[  331.793012]  (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa0084902>] drm_gem_object_handle_unreference_unlocked+0x102/0x130 [drm]
[  331.793012] 
other info that might help us debug this:
[  331.793012]  Possible unsafe locking scenario:

[  331.793012]        CPU0
[  331.793012]        ----
[  331.793012]   lock(&dev->struct_mutex);
[  331.793012]   lock(&dev->struct_mutex);
[  331.793012] 
 *** DEADLOCK ***

[  331.793012]  May be due to missing lock nesting notation

[  331.793012] 1 lock held by Xorg.bin/1015:
[  331.793012]  #0:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa0084902>] drm_gem_object_handle_unreference_unlocked+0x102/0x130 [drm]
[  331.793012] 
stack backtrace:
[  331.793012] CPU: 0 PID: 1015 Comm: Xorg.bin Tainted: G        W     3.16.0-0.rc3.git3.1.fc21.x86_64 #1
[  331.793012] Hardware name: LENOVO 4058CTO/4058CTO, BIOS 6FET93WW (3.23 ) 10/12/2012
[  331.793012]  0000000000000000 00000000ad742009 ffff88024ff4ba80 ffffffff81807cec
[  331.793012]  ffffffff82bc8240 ffff88024ff4bb60 ffffffff81100fd0 ffffffff81024369
[  331.793012]  ffff88024ff4bac0 ffffffff810e1b3d ffff880000000000 00000000007583ac
[  331.793012] Call Trace:
[  331.793012]  [<ffffffff81807cec>] dump_stack+0x4d/0x66
[  331.793012]  [<ffffffff81100fd0>] __lock_acquire+0x1450/0x1ca0
[  331.793012]  [<ffffffff81024369>] ? sched_clock+0x9/0x10
[  331.793012]  [<ffffffff810e1b3d>] ? sched_clock_local+0x1d/0x90
[  331.793012]  [<ffffffff81024369>] ? sched_clock+0x9/0x10
[  331.793012]  [<ffffffff810e1b3d>] ? sched_clock_local+0x1d/0x90
[  331.793012]  [<ffffffff81102104>] lock_acquire+0xa4/0x1d0
[  331.793012]  [<ffffffffa010a919>] ? i915_gem_unmap_dma_buf+0x39/0x110 [i915]
[  331.793012]  [<ffffffff8180ccd5>] mutex_lock_nested+0x85/0x440
[  331.793012]  [<ffffffffa010a919>] ? i915_gem_unmap_dma_buf+0x39/0x110 [i915]
[  331.793012]  [<ffffffffa010a919>] ? i915_gem_unmap_dma_buf+0x39/0x110 [i915]
[  331.793012]  [<ffffffffa010a919>] i915_gem_unmap_dma_buf+0x39/0x110 [i915]
[  331.793012]  [<ffffffff81532591>] dma_buf_unmap_attachment+0x51/0x80
[  331.793012]  [<ffffffffa009c6c2>] drm_prime_gem_destroy+0x22/0x40 [drm]
[  331.793012]  [<ffffffffa0468112>] radeon_gem_object_free+0x42/0x70 [radeon]
[  331.793012]  [<ffffffffa0084387>] drm_gem_object_free+0x27/0x40 [drm]
[  331.793012]  [<ffffffffa0084920>] drm_gem_object_handle_unreference_unlocked+0x120/0x130 [drm]
[  331.793012]  [<ffffffffa00849ff>] drm_gem_handle_delete+0xcf/0x1a0 [drm]
[  331.793012]  [<ffffffffa0085205>] drm_gem_close_ioctl+0x25/0x30 [drm]
[  331.793012]  [<ffffffffa0082cdf>] drm_ioctl+0x1df/0x6a0 [drm]
[  331.793012]  [<ffffffff81810af6>] ? _raw_spin_unlock_irqrestore+0x36/0x70
[  331.793012]  [<ffffffff810ff72d>] ? trace_hardirqs_on_caller+0x15d/0x200
[  331.793012]  [<ffffffff810ff7dd>] ? trace_hardirqs_on+0xd/0x10
[  331.793012]  [<ffffffffa043604c>] radeon_drm_ioctl+0x4c/0x80 [radeon]
[  331.793012]  [<ffffffff812628d0>] do_vfs_ioctl+0x2f0/0x520
[  331.793012]  [<ffffffff8126ef6a>] ? __fget+0x12a/0x2f0
[  331.793012]  [<ffffffff8126ee45>] ? __fget+0x5/0x2f0
[  331.793012]  [<ffffffff8126f1a0>] ? __fget_light+0x30/0x160
[  331.793012]  [<ffffffff81262b81>] SyS_ioctl+0x81/0xa0
[  331.793012]  [<ffffffff818118e9>] system_call_fastpath+0x16/0x1b
[  418.754354] DMA-API: debugging out of memory - disabling
[  858.336541] NMI: PCI system error (SERR) for reason a1 on CPU 0.
[  858.336762] Dazed and confused, but trying to continue
[  858.336959] dmar: DRHD: handling fault status reg 3
[  858.337115] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e5001000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.337515] dmar: DRHD: handling fault status reg 3
[  858.337660] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e50b9000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.338093] dmar: DRHD: handling fault status reg 3
[  858.338257] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e512b000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.338093] dmar: DRHD: handling fault status reg 3
[  858.338093] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e518e000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.339171] dmar: DRHD: handling fault status reg 3
[  858.339325] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e51ee000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.339718] dmar: DRHD: handling fault status reg 3
[  858.339863] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e5254000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.340235] dmar: DRHD: handling fault status reg 3
[  858.340387] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e52ba000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.340802] dmar: DRHD: handling fault status reg 3
[  858.340947] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e5322000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.341332] dmar: DRHD: handling fault status reg 3
[  858.341483] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e538f000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.341868] dmar: DRHD: handling fault status reg 3
[  858.342006] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e53e8000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.342390] dmar: DRHD: handling fault status reg 3
[  858.342534] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e543f000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.342907] dmar: DRHD: handling fault status reg 3
[  858.343054] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e54af000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.343443] dmar: DRHD: handling fault status reg 3
[  858.343587] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e550a000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.343970] dmar: DRHD: handling fault status reg 3
[  858.344120] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e557a000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.344510] dmar: DRHD: handling fault status reg 3
[  858.344655] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e55ce000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.345028] dmar: DRHD: handling fault status reg 3
[  858.345178] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e563b000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.345571] dmar: DRHD: handling fault status reg 3
[  858.345719] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e569d000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.346108] dmar: DRHD: handling fault status reg 3
[  858.346250] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e5706000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.346650] dmar: DRHD: handling fault status reg 3
[  858.346797] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e5766000 
DMAR:[fault reason 05] PTE Write access is not set
[  858.347168] dmar: DRHD: handling fault status reg 3
[  858.347305] dmar: DMAR:[DMA Write] Request device [01:00.0] fault addr e57c3000 
DMAR:[fault reason 05] PTE Write access is not set

Attached is the crash dump from sysfs
Comment 1 Shawn Starr 2014-07-06 20:48:56 UTC
Created attachment 102333 [details]
GPU crash dump from sysfs
Comment 2 Chris Wilson 2014-07-07 08:03:04 UTC
The GPU hang is immaterial - it is just one of those freak faults gen4 throws out for host access. The DMAR errors look more substantial and also not related - though I am impressed that your have a working DMAR on a gen4 platform, but they need to be directed towards -radeon I guess.
Comment 3 Chris Wilson 2014-07-07 08:03:35 UTC
Created attachment 102352 [details] [review]
Avoid struct mutex recursion.
Comment 4 Chris Wilson 2014-07-07 08:42:42 UTC
(In reply to comment #3)
> Created attachment 102352 [details] [review] [review]
> Avoid struct mutex recursion.

Dave Airlie pointed out that they are two different struct mutexes.
Comment 5 Jairo Miramontes 2015-08-11 14:14:46 UTC
Closed after more than one year of inactivity. Feel free to reopen if needed. Thanks


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.