Bug 111970 - [kbl] ELSP[0] transition GPU hang
Summary: [kbl] ELSP[0] transition GPU hang
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
: 111981 111982 111990 112173 112200 112241 112289 112299 112300 112301 112316 112317 112322 112338 112339 112369 112374 112375 112378 112384 112388 112389 112399 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-10-11 01:30 UTC by Marat Bakeev
Modified: 2019-11-29 17:31 UTC (History)
25 users (show)

See Also:
i915 platform: KBL
i915 features: GPU hang


Attachments
gpu crash dump (17.17 KB, text/plain)
2019-10-11 01:30 UTC, Marat Bakeev
no flags Details

Description Marat Bakeev 2019-10-11 01:30:11 UTC
Created attachment 145699 [details]
gpu crash dump

Linux 5.3.4-arch1-1-ARCH
lenovo p52s
Archlinux
3 displays - builtin laptop screen, 
HDMI port - Dell Inc. DELL P2418D,
DisplayPort - DP to HDMI adapter to another Dell Inc. DELL P2418D.
Comment 1 Chris Wilson 2019-10-11 07:44:47 UTC
rcs0 command stream:
  IDLE?: no
  START: 0x000d8000
  HEAD:  0x00000a38 [0x00000000]
  TAIL:  0x00000a38 [0x00000000, 0x00000000]
  CTL:   0x00003001
  MODE:  0x00000000
  HWS:   0xfedfe000
  ACTHD: 0x00000000 00000a38
  IPEIR: 0x00000000
  IPEHR: 0x7a000004
  INSTDONE: 0xffdfffff
  SC_INSTDONE: 0xffffffff
  SAMPLER_INSTDONE[0][0]: 0xffffffff
  SAMPLER_INSTDONE[0][1]: 0xffffffff
  SAMPLER_INSTDONE[0][2]: 0xffffffff
  ROW_INSTDONE[0][0]: 0xffffffff
  ROW_INSTDONE[0][1]: 0xffffffff
  ROW_INSTDONE[0][2]: 0xffffffff
  BBADDR: 0x0000fffe_ec064454
  BB_STATE: 0x00000020
  INSTPS: 0x00008840
  INSTPM: 0x00000000
  FADDR: 0x00000000 000d8a38
  RC PSMI: 0x00000010
  FAULT_REG: 0x00000000
  GFX_MODE: 0x00008000
  PDP0: 0x000000082920f000
  PDP1: 0x0000000000000000
  PDP2: 0x0000000000000000
  PDP3: 0x0000000000000000
  ring->head: 0x00000000
  ring->tail: 0x00000000
  hangcheck timestamp: 0ms (4298511936; epoch)
  engine reset count: 0
  ELSP[0]:  pid 2433, seqno       1b:001b0ac2!, prio 2, emitted 1431653766ms, start 000d8000, head 00000998, tail 00000a38
  ELSP[1]:  pid 0, seqno        5:0005ec0c, prio -4093, emitted 1431653766ms, start 00301000, head 00000850, tail 000008b8

The gpu didn't switch to ELSP[1] at the end of the first context.
Comment 2 Chris Wilson 2019-10-11 19:46:00 UTC
*** Bug 111981 has been marked as a duplicate of this bug. ***
Comment 3 Chris Wilson 2019-10-11 19:46:39 UTC
*** Bug 111982 has been marked as a duplicate of this bug. ***
Comment 4 Chris Wilson 2019-10-14 08:20:58 UTC
*** Bug 111990 has been marked as a duplicate of this bug. ***
Comment 5 Chris Wilson 2019-10-15 16:33:20 UTC
One thing that would be very useful to confirm is whether you are able to reproduce this on drm-tip. At present I am assuming it is fixed, and am looking for candidate fixes since v5.3
Comment 6 vakevk+freedesktopbugzilla 2019-10-17 13:19:52 UTC
(In reply to Chris Wilson from comment #5)
> One thing that would be very useful to confirm is whether you are able to
> reproduce this on drm-tip. At present I am assuming it is fixed, and am
> looking for candidate fixes since v5.3

Thank you for looking into this Chris.

How can I reproduce this on drm-tip? This might be obvious to developers but I am just a bug reporter. Is this referring to the "master" version of the drm kernel module? I guess its not possible to temporarily use a newer version of just one kernel module, so I would have to recompile the whole kernel?

If that is the case then it is probably too much work for me at the moment. At what kernel version number should I expect this bug to be fixed so that I can report it again if still happens when I have that version through normal updates? Currently I am at `Linux 5.3.6`.
Comment 7 Lakshmi 2019-10-21 05:23:33 UTC
(In reply to vakevk+freedesktopbugzilla from comment #6)
> (In reply to Chris Wilson from comment #5)
> > One thing that would be very useful to confirm is whether you are able to
> > reproduce this on drm-tip. At present I am assuming it is fixed, and am
> > looking for candidate fixes since v5.3
> 
> Thank you for looking into this Chris.
> 
> How can I reproduce this on drm-tip? 
Can you verify with drmtip kernel (https://cgit.freedesktop.org/drm-tip).
Feedback from this kernel is needed. If the issue still persists with drmtip kernel please attach the error log(crash dump).
Comment 8 Chris Wilson 2019-10-30 08:42:27 UTC
*** Bug 112173 has been marked as a duplicate of this bug. ***
Comment 9 Chris Wilson 2019-10-30 17:56:07 UTC
Put the likely suspect into a branch at
https://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug111970
Comment 10 Chris Wilson 2019-11-02 20:40:45 UTC
*** Bug 112200 has been marked as a duplicate of this bug. ***
Comment 11 Chris Wilson 2019-11-10 10:36:51 UTC
*** Bug 112241 has been marked as a duplicate of this bug. ***
Comment 12 Lakshmi 2019-11-12 12:55:29 UTC
@MArat, any updates with latest drmtip?
Comment 13 Chris Wilson 2019-11-15 10:44:51 UTC
*** Bug 112289 has been marked as a duplicate of this bug. ***
Comment 14 Chris Wilson 2019-11-15 15:54:31 UTC
*** Bug 112299 has been marked as a duplicate of this bug. ***
Comment 15 Chris Wilson 2019-11-15 17:56:17 UTC
*** Bug 112300 has been marked as a duplicate of this bug. ***
Comment 16 Chris Wilson 2019-11-15 20:35:35 UTC
*** Bug 112301 has been marked as a duplicate of this bug. ***
Comment 17 Chris Wilson 2019-11-18 10:12:29 UTC
*** Bug 112316 has been marked as a duplicate of this bug. ***
Comment 18 Chris Wilson 2019-11-18 10:49:45 UTC
*** Bug 112317 has been marked as a duplicate of this bug. ***
Comment 19 Lakshmi 2019-11-18 12:50:49 UTC
Reporter, Update the kernel to latest drmtip (https://cgit.freedesktop.org/drm-tip) and see if this issue is reproducible.
Comment 20 Chris Wilson 2019-11-18 20:34:09 UTC
*** Bug 112322 has been marked as a duplicate of this bug. ***
Comment 21 Chris Wilson 2019-11-19 10:05:22 UTC
*** Bug 112338 has been marked as a duplicate of this bug. ***
Comment 22 Chris Wilson 2019-11-19 23:14:25 UTC
*** Bug 112339 has been marked as a duplicate of this bug. ***
Comment 23 Marat Bakeev 2019-11-19 23:23:39 UTC
(In reply to Lakshmi from comment #19)
> Reporter, Update the kernel to latest drmtip
> (https://cgit.freedesktop.org/drm-tip) and see if this issue is reproducible.

Sorry for the delay, I've installed drm-tip and I'll check how it goes. 
Thanks
Comment 24 Nick Hogg 2019-11-19 23:49:41 UTC
I also was experiencing the issue, and updated the kernel to drm-tip on Arch linux using AUR package linux-drm-tip-git.  

Now now longer experiencing the GPU crash.
Comment 25 Lakshmi 2019-11-20 10:32:11 UTC
(In reply to Nick Hogg from comment #24)
> I also was experiencing the issue, and updated the kernel to drm-tip on Arch
> linux using AUR package linux-drm-tip-git.  
> 
> Now now longer experiencing the GPU crash.

Thanks for feedback.

@Marat, based on your feedback we can close this issue.
Comment 26 Marat Bakeev 2019-11-21 03:19:46 UTC
(In reply to Lakshmi from comment #25)
> (In reply to Nick Hogg from comment #24)
> > I also was experiencing the issue, and updated the kernel to drm-tip on Arch
> > linux using AUR package linux-drm-tip-git.  
> > 
> > Now now longer experiencing the GPU crash.
> 
> Thanks for feedback.
> 
> @Marat, based on your feedback we can close this issue.

I still see errors '[drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=1419640 end=1419641) time 121 us, min 1431, max 1439, scanline start 1430, end 1441' even on drm-tip kernel. 
Are those relevant?
Comment 27 Lakshmi 2019-11-21 11:46:43 UTC
(In reply to Marat Bakeev from comment #26)
> (In reply to Lakshmi from comment #25)
> > (In reply to Nick Hogg from comment #24)
> > > I also was experiencing the issue, and updated the kernel to drm-tip on Arch
> > > linux using AUR package linux-drm-tip-git.  
> > > 
> > > Now now longer experiencing the GPU crash.
> > 
> > Thanks for feedback.
> > 
> > @Marat, based on your feedback we can close this issue.
> 
> I still see errors '[drm:intel_pipe_update_end [i915]] *ERROR* Atomic update
> failure on pipe B (start=1419640 end=1419641) time 121 us, min 1431, max
> 1439, scanline start 1430, end 1441' even on drm-tip kernel. 
> Are those relevant?

They are not relevant to GPU Hangs as this bug for. Please check if it's similar to bug 106107.

I will mark this issue as resolved as GPU hang didn't happen on drmtip. Thanks!
Comment 28 Chris Wilson 2019-11-25 15:28:44 UTC
*** Bug 112388 has been marked as a duplicate of this bug. ***
Comment 29 Chris Wilson 2019-11-25 15:28:58 UTC
*** Bug 112384 has been marked as a duplicate of this bug. ***
Comment 30 Chris Wilson 2019-11-25 15:29:03 UTC
*** Bug 112378 has been marked as a duplicate of this bug. ***
Comment 31 Chris Wilson 2019-11-25 15:29:09 UTC
*** Bug 112375 has been marked as a duplicate of this bug. ***
Comment 32 Chris Wilson 2019-11-25 15:29:29 UTC
*** Bug 112374 has been marked as a duplicate of this bug. ***
Comment 33 Chris Wilson 2019-11-25 15:29:42 UTC
*** Bug 112369 has been marked as a duplicate of this bug. ***
Comment 34 Chris Wilson 2019-11-25 17:03:12 UTC
*** Bug 112389 has been marked as a duplicate of this bug. ***
Comment 35 RF King 2019-11-25 19:34:13 UTC
(In reply to Lakshmi from comment #7)
> Can you verify with drmtip kernel (https://cgit.freedesktop.org/drm-tip).
> Feedback from this kernel is needed. If the issue still persists with drmtip
> kernel please attach the error log(crash dump).

drmtip resolves the problem.  Note that a hang remains if boot option i915.enable_rc6=0 is present, which was a proposed (but failed) workaround suggested elsewhere for this problem.  (bug 105962)
Comment 36 Chris Wilson 2019-11-26 14:57:11 UTC
*** Bug 112399 has been marked as a duplicate of this bug. ***
Comment 37 Chris Wilson 2019-11-29 17:31:12 UTC
*** Bug 112431 has been marked as a duplicate of this bug. ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.