Bug 111970 - [kbl] ELSP[0] transition GPU hang
Summary: [kbl] ELSP[0] transition GPU hang
Status: NEEDINFO
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged
Keywords:
: 111981 111982 111990 112173 112200 112241 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-10-11 01:30 UTC by Marat Bakeev
Modified: 2019-11-12 12:55 UTC (History)
6 users (show)

See Also:
i915 platform: KBL
i915 features: GPU hang


Attachments
gpu crash dump (17.17 KB, text/plain)
2019-10-11 01:30 UTC, Marat Bakeev
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marat Bakeev 2019-10-11 01:30:11 UTC
Created attachment 145699 [details]
gpu crash dump

Linux 5.3.4-arch1-1-ARCH
lenovo p52s
Archlinux
3 displays - builtin laptop screen, 
HDMI port - Dell Inc. DELL P2418D,
DisplayPort - DP to HDMI adapter to another Dell Inc. DELL P2418D.
Comment 1 Chris Wilson 2019-10-11 07:44:47 UTC
rcs0 command stream:
  IDLE?: no
  START: 0x000d8000
  HEAD:  0x00000a38 [0x00000000]
  TAIL:  0x00000a38 [0x00000000, 0x00000000]
  CTL:   0x00003001
  MODE:  0x00000000
  HWS:   0xfedfe000
  ACTHD: 0x00000000 00000a38
  IPEIR: 0x00000000
  IPEHR: 0x7a000004
  INSTDONE: 0xffdfffff
  SC_INSTDONE: 0xffffffff
  SAMPLER_INSTDONE[0][0]: 0xffffffff
  SAMPLER_INSTDONE[0][1]: 0xffffffff
  SAMPLER_INSTDONE[0][2]: 0xffffffff
  ROW_INSTDONE[0][0]: 0xffffffff
  ROW_INSTDONE[0][1]: 0xffffffff
  ROW_INSTDONE[0][2]: 0xffffffff
  BBADDR: 0x0000fffe_ec064454
  BB_STATE: 0x00000020
  INSTPS: 0x00008840
  INSTPM: 0x00000000
  FADDR: 0x00000000 000d8a38
  RC PSMI: 0x00000010
  FAULT_REG: 0x00000000
  GFX_MODE: 0x00008000
  PDP0: 0x000000082920f000
  PDP1: 0x0000000000000000
  PDP2: 0x0000000000000000
  PDP3: 0x0000000000000000
  ring->head: 0x00000000
  ring->tail: 0x00000000
  hangcheck timestamp: 0ms (4298511936; epoch)
  engine reset count: 0
  ELSP[0]:  pid 2433, seqno       1b:001b0ac2!, prio 2, emitted 1431653766ms, start 000d8000, head 00000998, tail 00000a38
  ELSP[1]:  pid 0, seqno        5:0005ec0c, prio -4093, emitted 1431653766ms, start 00301000, head 00000850, tail 000008b8

The gpu didn't switch to ELSP[1] at the end of the first context.
Comment 2 Chris Wilson 2019-10-11 19:46:00 UTC
*** Bug 111981 has been marked as a duplicate of this bug. ***
Comment 3 Chris Wilson 2019-10-11 19:46:39 UTC
*** Bug 111982 has been marked as a duplicate of this bug. ***
Comment 4 Chris Wilson 2019-10-14 08:20:58 UTC
*** Bug 111990 has been marked as a duplicate of this bug. ***
Comment 5 Chris Wilson 2019-10-15 16:33:20 UTC
One thing that would be very useful to confirm is whether you are able to reproduce this on drm-tip. At present I am assuming it is fixed, and am looking for candidate fixes since v5.3
Comment 6 vakevk+freedesktopbugzilla 2019-10-17 13:19:52 UTC
(In reply to Chris Wilson from comment #5)
> One thing that would be very useful to confirm is whether you are able to
> reproduce this on drm-tip. At present I am assuming it is fixed, and am
> looking for candidate fixes since v5.3

Thank you for looking into this Chris.

How can I reproduce this on drm-tip? This might be obvious to developers but I am just a bug reporter. Is this referring to the "master" version of the drm kernel module? I guess its not possible to temporarily use a newer version of just one kernel module, so I would have to recompile the whole kernel?

If that is the case then it is probably too much work for me at the moment. At what kernel version number should I expect this bug to be fixed so that I can report it again if still happens when I have that version through normal updates? Currently I am at `Linux 5.3.6`.
Comment 7 Lakshmi 2019-10-21 05:23:33 UTC
(In reply to vakevk+freedesktopbugzilla from comment #6)
> (In reply to Chris Wilson from comment #5)
> > One thing that would be very useful to confirm is whether you are able to
> > reproduce this on drm-tip. At present I am assuming it is fixed, and am
> > looking for candidate fixes since v5.3
> 
> Thank you for looking into this Chris.
> 
> How can I reproduce this on drm-tip? 
Can you verify with drmtip kernel (https://cgit.freedesktop.org/drm-tip).
Feedback from this kernel is needed. If the issue still persists with drmtip kernel please attach the error log(crash dump).
Comment 8 Chris Wilson 2019-10-30 08:42:27 UTC
*** Bug 112173 has been marked as a duplicate of this bug. ***
Comment 9 Chris Wilson 2019-10-30 17:56:07 UTC
Put the likely suspect into a branch at
https://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug111970
Comment 10 Chris Wilson 2019-11-02 20:40:45 UTC
*** Bug 112200 has been marked as a duplicate of this bug. ***
Comment 11 Chris Wilson 2019-11-10 10:36:51 UTC
*** Bug 112241 has been marked as a duplicate of this bug. ***
Comment 12 Lakshmi 2019-11-12 12:55:29 UTC
@MArat, any updates with latest drmtip?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.