Bug 102393 - [APL] Random GPU Random Hang Apollo Lake (ecode 9:1:0xeeffefa1) reason: Hang on bcs
Summary: [APL] Random GPU Random Hang Apollo Lake (ecode 9:1:0xeeffefa1) reason: Hang ...
Status: CLOSED DUPLICATE of bug 102035
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 102821 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-08-24 15:32 UTC by utaminna
Modified: 2018-02-13 16:23 UTC (History)
4 users (show)

See Also:
i915 platform: BXT
i915 features: GPU hang


Attachments
journalcte error text (5.91 KB, text/plain)
2017-08-24 15:34 UTC, utaminna
no flags Details
log file with drm.debug=0x1e log_bug_len=2M options (13.58 KB, text/x-log)
2017-09-20 15:59 UTC, jug
no flags Details
i915 debug log (1.51 MB, text/x-log)
2017-09-27 06:16 UTC, jug
no flags Details
/sys/class/drm/card0/error (15.49 KB, text/plain)
2018-02-07 11:26 UTC, samo
no flags Details
dmesg output at hang (1.09 KB, text/plain)
2018-02-07 11:27 UTC, samo
no flags Details

Description utaminna 2017-08-24 15:32:24 UTC
Linux Distro: Antergos
Linux Kernel: Linux Standard 4.12.8-2-ARCH (x86_64)
Specs: Apollo Lake N3450 with 6GB RAM (Jumper EZBook 3 Pro)

Random hang while opening Chromium (youtube player) / Movie Player (Parole) / Android Emulator.

Attaching journalcte message on next attachment.
Comment 1 utaminna 2017-08-24 15:34:10 UTC
Created attachment 133755 [details]
journalcte error text
Comment 2 utaminna 2017-08-24 15:36:14 UTC
cat /sys/class/drm/card0/error

No error state collected
Comment 3 Elizabeth 2017-08-25 18:42:03 UTC
Hello, could you please try to replicate with drm.debug=0x1e log_bug_len=2M and attach full dmesg log. Thank you.
Comment 4 jug 2017-09-20 15:58:31 UTC
Hi,
I've encountered same problem on the same hardware, although i've got Devuan & 4.13 kernel.
I added log with drm.debug=0x1e log_bug_len=2M settings as an attachement.
Comment 5 jug 2017-09-20 15:59:37 UTC
Created attachment 134373 [details]
log file with drm.debug=0x1e log_bug_len=2M options
Comment 6 Elizabeth 2017-09-21 15:55:05 UTC
Hello Jug, 
The attachment only includes warning messages. When a hang occurs, the file /sys/class/drm/card0/error would have all the information related to the hang, but if this one is empty full dmesg log (from boot till the problem) would give the information. So could you share either error state or full dmesg? 
Thank you.
Comment 7 Elizabeth 2017-09-21 16:13:06 UTC
*** Bug 102821 has been marked as a duplicate of this bug. ***
Comment 8 jug 2017-09-25 19:01:08 UTC
Hello Elizabeth,
somehow, without updating or changing anything i didn't encounter bug in past several days, if it will appear again, i'd post complete log.
Comment 9 jug 2017-09-27 06:16:14 UTC
Created attachment 134500 [details]
i915 debug log

I've found additional messages concerning hang in different file, i filtered only time around hang, since the rest of it is pretty much the same.
Comment 10 samo 2018-02-07 11:23:26 UTC
I have the same hardware and have the same fault mainly playing video at least 2 or three times an hour, will freeze screen for 4-10 seconds and can fail twice in a row.
Running Linux mint cinnamon and xfce 
I can post logs too if required.
Comment 11 samo 2018-02-07 11:26:52 UTC
Created attachment 137212 [details]
/sys/class/drm/card0/error
Comment 12 samo 2018-02-07 11:27:58 UTC
Created attachment 137213 [details]
dmesg output at hang
Comment 13 Chris Wilson 2018-02-07 12:31:33 UTC
commit ba74cb10c775c839f6e1d0fabd1e772eabd9c43f
Author: Michel Thierry <michel.thierry@intel.com>
Date:   Mon Nov 20 12:34:58 2017 +0000

    drm/i915/execlists: Delay writing to ELSP until HW has processed the previous write
    
    The hardware needs some time to process the information received in the
    ExecList Submission Port, and expects us to not write anything more until
    it has 'acknowledged' this new submission by sending an IDLE_ACTIVE or
    PREEMPTED CSB event.
    
    If we do not follow this, the driver could write new data into the ELSP
    before HW had finishing fetching the previous one, putting us in
    'undefined behaviour' space.
    
    This seems to be the problem causing the spurious PREEMPTED & COMPLETE
    events after a COMPLETE like the one below:
    
    [] vcs0: sw rd pointer = 2, hw wr pointer = 0, current 'head' = 3.
    [] vcs0:  Execlist CSB[0]: 0x00000018 _ 0x00000007
    [] vcs0:  Execlist CSB[1]: 0x00000001 _ 0x00000000
    [] vcs0:  Execlist CSB[2]: 0x00000018 _ 0x00000007  <<< COMPLETE
    [] vcs0:  Execlist CSB[3]: 0x00000012 _ 0x00000007  <<< PREEMPTED & COMPLETE
    [] vcs0:  Execlist CSB[4]: 0x00008002 _ 0x00000006
    [] vcs0:  Execlist CSB[5]: 0x00000014 _ 0x00000006
    
    The ELSP writes that lead to this CSB sequence show that the HW hadn't
    started executing the previous execlist (the one with only ctx 0x6) by the
    time the new one was submitted; this is a bit more clear in the data
    show in the EXECLIST_STATUS register at the time of the ELSP write.
    
    [] vcs0: ELSP[0] = 0x0_0        [execlist1] - status_reg = 0x0_302
    [] vcs0: ELSP[1] = 0x6_fedb2119 [execlist0] - status_reg = 0x0_8302
    
    [] vcs0: ELSP[2] = 0x7_fedaf119 [execlist1] - status_reg = 0x0_8308
    [] vcs0: ELSP[3] = 0x6_fedb2119 [execlist0] - status_reg = 0x7_8308
    
    Note that having to wait for this ack does not disable lite-restores,
    although it may reduce their numbers.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102035
    Signed-off-by: Michel Thierry <michel.thierry@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/<20171118003038.7935-1-michel.thierry@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20171120123458.23242-4-chris@chris-wilson.co.uk
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

*** This bug has been marked as a duplicate of bug 102035 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.