Bug 102433 - GPU hang resulting in Freeze(?) then unclean logout (possibly connected to LibreOffice)
Summary: GPU hang resulting in Freeze(?) then unclean logout (possibly connected to Li...
Status: CLOSED DUPLICATE of bug 101780
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-27 17:33 UTC by wettererscheinung
Modified: 2018-01-04 20:02 UTC (History)
2 users (show)

See Also:
i915 platform: SKL
i915 features: GPU hang


Attachments
Output xrandr --verbose (10.41 KB, text/plain)
2017-08-27 17:33 UTC, wettererscheinung
no flags Details
syslog from 31.08.2017 (8.31 KB, text/plain)
2017-08-31 18:00 UTC, wettererscheinung
no flags Details
Error state of the GPU Hang from 31.08.2017 (24.67 KB, text/plain)
2017-08-31 18:04 UTC, wettererscheinung
no flags Details
sys_class_drm_card0_error_17-11-08.txt (46.56 KB, text/plain)
2017-11-08 11:43 UTC, wettererscheinung
no flags Details
sys_class_drm_card0_error_17-11-11.txt (46.74 KB, text/plain)
2017-11-11 14:45 UTC, wettererscheinung
no flags Details
sys_class_drm_card0_error_17-11-11_B.txt (40.75 KB, text/plain)
2017-11-11 17:02 UTC, wettererscheinung
no flags Details

Description wettererscheinung 2017-08-27 17:33:41 UTC
Created attachment 133815 [details]
Output xrandr --verbose

Dear Developers,

Steps to reproduce:
* work for at about one week without shutting off notebook or logging out user in at least two different KDE activities at the same time (while three activities are running). Usually permanent running software (among others): Thunderbird, several instances of Okular, LibreOffice, Dolphin, Firefox
* work in LibreOffice with long document
* Over the nights close notebook without shutting off or logging out (sleep mode)

result:
* Suddenly (not after fixed amount of time but always after at about one week) no reactions on keyboard entries of any kind
** One time, with delay of several seconds the last entered words were auto-corrected very slowly, but no new entries with keyboard were possible.
* Mouse is moving, but no window or button is reacting
* Then X shuts down and logs me out.
* Previously running software is not shut down cleanly.
* I have to log in again and hope, that the last (auto)save was not too long ago. 

How often does this occur?
* I can't tell for sure, as I do not always run my notebook for more than one week without shutting off or logging out.
* Seems pretty regular to always though (but I can't tell a certain amount of time after (re-)starting the notebook. It's usually at about one week.

What have I tried?
* I tried to purge xserver-xorg-video-intel. But it occured again.

I hope I didn't forget anything. I'll try to get more info, if necessary.

Sincerely Yours
Maria



System Environment
Thinkpad T560

System Architecture (# uname -m)
> x86_64
Kernel Version (# uname -r)
> 4.12.0-1-amd64
Linux Distribution
> Debian - testing (500) unstable (100)
Processor
> 4xIntel(r)Core(tm) i5-6200U CPU 2,30GHz
> HD Graphics 520
RAM
> 15,6 GiB
KDE-Plasma 5.8.7
KDE-Frameworks 5.28.0
Qt 5.7.1
XServer 11.0



file:///sys/class/drm/card0/error (after restart, I didn't know this trick then)
> No error state collected



syslog:
Aug 27 15:23:17 COMPUTER kernel: [166611.867112] [drm] GPU HANG: ecode 9:0:0x85dffffb, in Xorg [1042], reason: Hang on rcs, action: reset
Aug 27 15:23:17 COMPUTER kernel: [166611.867113] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Aug 27 15:23:17 COMPUTER kernel: [166611.867114] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Aug 27 15:23:17 COMPUTER kernel: [166611.867114] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Aug 27 15:23:17 COMPUTER kernel: [166611.867115] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Aug 27 15:23:17 COMPUTER kernel: [166611.867115] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Aug 27 15:23:17 COMPUTER kernel: [166611.867152] drm/i915: Resetting chip after gpu hang
Aug 27 15:23:17 COMPUTER kernel: [166611.867466] [drm] RC6 on
Aug 27 15:23:18 COMPUTER kernel: [166612.978002] asynchronous wait on fence i915:kwin_x11[1694]/1:eccb2 timed out
Aug 27 15:23:25 COMPUTER kernel: [166619.826158] drm/i915: Resetting chip after gpu hang
Aug 27 15:23:25 COMPUTER kernel: [166619.826376] [drm] RC6 on
Aug 27 15:23:27 COMPUTER wpa_supplicant[959]: dbus: wpa_dbus_get_object_properties: failed to get object properties: (org.freedesktop.DBus.Error.Failed) failed to parse RSN IE
Aug 27 15:23:27 COMPUTER wpa_supplicant[959]: dbus: Failed to construct signal
Aug 27 15:23:33 COMPUTER kernel: [166627.858190] drm/i915: Resetting chip after gpu hang
Aug 27 15:23:33 COMPUTER kernel: [166627.858485] [drm] RC6 on
Aug 27 15:23:41 COMPUTER kernel: [166635.826182] drm/i915: Resetting chip after gpu hang
Aug 27 15:23:41 COMPUTER kernel: [166635.826388] [drm] RC6 on
Aug 27 15:23:49 COMPUTER kernel: [166643.858173] drm/i915: Resetting chip after gpu hang
Aug 27 15:23:49 COMPUTER kernel: [166643.858656] [drm] RC6 on
Aug 27 15:23:49 COMPUTER org.kde.kuiserver[1585]: kuiserver: Fatal IO error: client killed
Aug 27 15:23:49 COMPUTER org.kde.kaccessibleapp[1585]: kaccessibleapp: Fatal IO error: client killed
Aug 27 15:23:49 COMPUTER org.a11y.atspi.Registry[1772]: XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
Aug 27 15:23:49 COMPUTER org.a11y.atspi.Registry[1772]:       after 95300 requests (95300 known processed) with 0 events remaining.
Aug 27 15:23:49 COMPUTER sddm[819]: kwalletd5: Checking for pam module
Aug 27 15:23:49 COMPUTER sddm[819]: kwalletd5: Got pam-login param
Aug 27 15:23:49 COMPUTER sddm[819]: kwalletd5: Waiting for hash on 15-
Aug 27 15:23:49 COMPUTER sddm[819]: kwalletd5: waitingForEnvironment on: 18
Aug 27 15:23:49 COMPUTER sddm[819]: kwalletd5: client connected
Aug 27 15:23:49 COMPUTER sddm[819]: kwalletd5: client disconnected
Aug 27 15:23:49 COMPUTER org.kde.KScreen[1585]: The X11 connection broke (error 1). Did the X11 server die?
Aug 27 15:23:49 COMPUTER org.kde.kglobalaccel[1585]: The X11 connection broke (error 1). Did the X11 server die?
Aug 27 15:23:51 COMPUTER sddm-helper[1538]: [PAM] Closing session
Aug 27 15:23:51 COMPUTER sddm-helper[1538]: [PAM] Ended.
Aug 27 15:23:51 COMPUTER sddm[819]: Auth: sddm-helper exited successfully
Aug 27 15:23:51 COMPUTER sddm[819]: Socket server stopping...
Aug 27 15:23:51 COMPUTER sddm[819]: Socket server stopped.
Aug 27 15:23:51 COMPUTER sddm[819]: Display server stopping...
Aug 27 15:23:51 COMPUTER NetworkManager[670]: <info>  [1503840231.3350] device (wlan0): state change: activated -> deactivating (reason 'connection-removed', internal state 'managed')
Aug 27 15:23:51 COMPUTER NetworkManager[670]: <info>  [1503840231.3352] manager: NetworkManager state is now DISCONNECTING
Aug 27 15:23:51 COMPUTER systemd[1]: Stopping User Manager for UID 1000...
Aug 27 15:23:51 COMPUTER systemd[1539]: Stopped target Default.
Aug 27 15:23:51 COMPUTER systemd[1539]: Stopped target Basic System.
Aug 27 15:23:51 COMPUTER systemd[1539]: Stopped target Paths.
Aug 27 15:23:51 COMPUTER systemd[1539]: Stopped target Sockets.
Aug 27 15:23:51 COMPUTER systemd[1539]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Aug 27 15:23:51 COMPUTER systemd[1539]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Aug 27 15:23:51 COMPUTER systemd[1539]: Closed GnuPG network certificate management daemon.
Aug 27 15:23:51 COMPUTER systemd[1539]: Closed GnuPG cryptographic agent (access for web browsers).
Aug 27 15:23:51 COMPUTER systemd[1539]: Closed GnuPG cryptographic agent and passphrase cache.
Aug 27 15:23:51 COMPUTER systemd[1539]: Reached target Shutdown.
Aug 27 15:23:51 COMPUTER systemd[1539]: Starting Exit the Session...
Aug 27 15:23:51 COMPUTER systemd[1539]: Stopped target Timers.
Aug 27 15:23:51 COMPUTER sddm[819]: Display server stopped.
Aug 27 15:23:51 COMPUTER sddm[819]: Running display stop script  "/usr/share/sddm/scripts/Xstop"
Aug 27 15:23:51 COMPUTER systemd[1539]: Received SIGRTMIN+24 from PID 21522 (kill).
Aug 27 15:23:51 COMPUTER sddm[819]: Removing display ":0" ...
Aug 27 15:23:51 COMPUTER sddm[819]: Adding new display on vt 7 ...
Aug 27 15:23:51 COMPUTER sddm[819]: Display server starting...
Aug 27 15:23:51 COMPUTER sddm[819]: Running: /usr/bin/X -nolisten tcp -auth /var/run/sddm/{6f4ee929-bef0-4caa-96f4-820b5828525a} -background none -noreset -displayfd 19 vt7
Aug 27 15:23:51 COMPUTER systemd[1]: Stopped User Manager for UID 1000.
Aug 27 15:23:51 COMPUTER dbus[625]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service'
Aug 27 15:23:51 COMPUTER NetworkManager[670]: <info>  [1503840231.3547] device (wlan0): state change: deactivating -> disconnected (reason 'connection-removed', internal state 'managed')
Aug 27 15:23:51 COMPUTER avahi-daemon[4769]: Withdrawing address record for 2003:cf:3ee:9300:2a14:5de6:f87b:ef27 on wlan0.
Aug 27 15:23:51 COMPUTER avahi-daemon[4769]: Leaving mDNS multicast group on interface wlan0.IPv6 with address 2003:cf:3ee:9300:2a14:5de6:f87b:ef27.
Aug 27 15:23:51 COMPUTER avahi-daemon[4769]: Joining mDNS multicast group on interface wlan0.IPv6 with address fe80::42e7:c091:fe31:12dc.
Aug 27 15:23:51 COMPUTER systemd[1]: Starting Network Manager Script Dispatcher Service...
Aug 27 15:23:51 COMPUTER avahi-daemon[4769]: Registering new address record for fe80::42e7:c091:fe31:12dc on wlan0.*.
Aug 27 15:23:51 COMPUTER systemd[1]: Removed slice User Slice of USER.
Aug 27 15:23:51 COMPUTER avahi-daemon[4769]: Withdrawing address record for fe80::42e7:c091:fe31:12dc on wlan0.
Aug 27 15:23:51 COMPUTER avahi-daemon[4769]: Leaving mDNS multicast group on interface wlan0.IPv6 with address fe80::42e7:c091:fe31:12dc.
Aug 27 15:23:51 COMPUTER avahi-daemon[4769]: Interface wlan0.IPv6 no longer relevant for mDNS.
Aug 27 15:23:51 COMPUTER dbus[625]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Aug 27 15:23:51 COMPUTER nm-dispatcher: req:1 'connectivity-change': new request (2 scripts)
Aug 27 15:23:51 COMPUTER nm-dispatcher: req:1 'connectivity-change': start running ordered scripts...
Aug 27 15:23:51 COMPUTER systemd[1]: Started Network Manager Script Dispatcher Service.
Aug 27 15:23:51 COMPUTER NetworkManager[670]: <info>  [1503840231.3714] dhcp4 (wlan0): canceled DHCP transaction, DHCP client pid 11574
Aug 27 15:23:51 COMPUTER NetworkManager[670]: <info>  [1503840231.3715] dhcp4 (wlan0): state changed bound -> done
Aug 27 15:23:51 COMPUTER NetworkManager[670]: <info>  [1503840231.3717] dhcp6 (wlan0): canceled DHCP transaction
Aug 27 15:23:51 COMPUTER avahi-daemon[4769]: Withdrawing address record for 192.168.2.22 on wlan0.
Aug 27 15:23:51 COMPUTER avahi-daemon[4769]: Leaving mDNS multicast group on interface wlan0.IPv4 with address 192.168.2.22.
Aug 27 15:23:51 COMPUTER avahi-daemon[4769]: Interface wlan0.IPv4 no longer relevant for mDNS.
Aug 27 15:23:51 COMPUTER kernel: [166645.822946] wlan0: deauthenticating from bc:05:43:10:67:75 by local choice (Reason: 3=DEAUTH_LEAVING)
Aug 27 15:23:51 COMPUTER wpa_supplicant[959]: wlan0: CTRL-EVENT-DISCONNECTED bssid=bc:05:43:10:67:75 reason=3 locally_generated=1
Aug 27 15:23:51 COMPUTER NetworkManager[670]: <info>  [1503840231.3898] device (wlan0): set-hw-addr: set MAC address to 62:AE:DB:CF:F3:C3 (scanning)
Aug 27 15:23:51 COMPUTER NetworkManager[670]: <info>  [1503840231.3948] manager: NetworkManager state is now DISCONNECTED
Aug 27 15:23:51 COMPUTER kernel: [166645.844030] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready.txt
Aug 27 15:23:51 COMPUTER NetworkManager[670]: <warn>  [1503840231.3962] sup-iface[0x55c057eee9f0,wlan0]: connection disconnected (reason -3)
Aug 27 15:23:51 COMPUTER nm-dispatcher: req:2 'down' [wlan0]: new request (2 scripts)
Aug 27 15:23:51 COMPUTER NetworkManager[670]: <info>  [1503840231.3963] device (wlan0): supplicant interface state: completed -> disconnected
Aug 27 15:23:51 COMPUTER nm-dispatcher: req:2 'down' [wlan0]: start running ordered scripts...
Aug 27 15:23:51 COMPUTER sddm[819]: Running display setup script  "/usr/share/sddm/scripts/Xsetup"
Aug 27 15:23:51 COMPUTER sddm[819]: Display server started.
Aug 27 15:23:51 COMPUTER sddm[819]: Socket server starting...
Aug 27 15:23:51 COMPUTER sddm[819]: Socket server started.
Aug 27 15:23:51 COMPUTER sddm[819]: Greeter starting...
Comment 1 Chris Wilson 2017-08-27 18:16:06 UTC
(In reply to wettererscheinung from comment #0)
> What have I tried?
> * I tried to purge xserver-xorg-video-intel. But it occured again.

It's a gpu hang from using -modesetting.

Please do attach the error state so that we can triage it.
Comment 2 wettererscheinung 2017-08-27 19:51:48 UTC
Dear Chris,

thanks for your answer, is it possible to reconstruct/retrieve the error
state, after I restarted? Because now it shows "No error state collected".

Otherwise it will take one or two weeks until it occurs again.

Sincerely yours
Maria
Comment 3 wettererscheinung 2017-08-31 18:00:35 UTC
Created attachment 133911 [details]
syslog from 31.08.2017

This is the Syslog to the GPU Hang from today.
Comment 4 wettererscheinung 2017-08-31 18:04:12 UTC
Created attachment 133912 [details]
Error state of the GPU Hang from 31.08.2017

This time the GPU hang occured earlier than the last times. My notebook was only one night on suspend since last reboot.
Comment 5 wettererscheinung 2017-08-31 18:06:42 UTC
Added the requested info.
Thanks alot for your time and help!
Maria
Comment 6 Elizabeth 2017-10-26 21:05:36 UTC
Hello Maria, Could you try to reproduce with intel_iommu=igfx_off on grub? If it works may be a dup of bug 89360 or bug 103076.
Comment 7 wettererscheinung 2017-11-08 11:43:56 UTC
Created attachment 135297 [details]
sys_class_drm_card0_error_17-11-08.txt

Dear Elizabeth,

sorry for not answering so long and thanks for the hint. Since I didn't
experience the bug for some time I thought/hoped it had vanished.
Unluckily it happened again today (I added the GPU Hang error output to
this mail).

* Nonetheless, how do I apply this on grub?

* I read that virtualization won't work anymore - is that true?
(This would be a problem as I do use virtualbox regularly)

* When I understood the info about this feature right you mean that
possibly the GPU doesn't work correctly with the DMA Re-Mapping?
Is that an hardware/guarantee issue?


Bytheway I experience two kinds of occurances: 1 - freeze, then logout;
2 - total freeze, no change, only poweroff helps therefore no error
report possible

Yours
Maria



bugzilla-daemon@freedesktop.org:
> Elizabeth <mailto:elizabethx.de.la.torre.mena@intel.com> changed bug
> 102433 <https://bugs.freedesktop.org/show_bug.cgi?id=102433>
> What 	Removed 	Added
> Status 	NEW 	NEEDINFO
> 
> *Comment # 6 <https://bugs.freedesktop.org/show_bug.cgi?id=102433#c6> on
> bug 102433 <https://bugs.freedesktop.org/show_bug.cgi?id=102433> from
> Elizabeth <mailto:elizabethx.de.la.torre.mena@intel.com> *
> 
> Hello Maria, Could you try to reproduce with intel_iommu=igfx_off on grub? If
> it works may be a dup of bug 89360 <show_bug.cgi?id=89360> or bug 103076 <show_bug.cgi?id=103076>.
> 
> ------------------------------------------------------------------------
> You are receiving this mail because:
> 
>   * You are on the CC list for the bug.
>   * You reported the bug.
>
Comment 8 Elizabeth 2017-11-08 16:23:50 UTC
(In reply to wettererscheinung from comment #7)
>...
> * Nonetheless, how do I apply this on grub?
Hello Maria, to apply this execute:
$ sudo nano /etc/default/grub
  Add intel_iommu=igfx_off inside the "" after the grub command line, i.e.:
  GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=igfx_off"
Save and close. Then apply:
$sudo update-grub
And then reboot.

> * I read that virtualization won't work anymore - is that true?
> (This would be a problem as I do use virtualbox regularly)
You can find more information over the internet: https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit#Virtualization , Virtualization should keep working.

> * When I understood the info about this feature right you mean that
> possibly the GPU doesn't work correctly with the DMA Re-Mapping?
> Is that an hardware/guarantee issue?
You could do a memtest86 to be sure your memory is working correctly:
On debian,  do 'apt install memtest86'. You should see it in the grub options as a boot target that you can choose.
There is no log.  If memtest reports an error, you have to replace your memory. 
If it was a DMAR error, that should be follow on bug 89360.

> Bytheway I experience two kinds of occurances: 1 - freeze, then logout;
> 2 - total freeze, no change, only poweroff helps therefore no error
> report possible
Those could be different issues, though you would need to identify a patron to determine if they should be worked separately.

> Yours
> Maria

From error state:

ERROR: 0x00000000
FAULT_TLB_DATA: 0x0000001b 0xaacb0b2b
    Address 0x0000baacb0b2b000 GGTT
DONE_REG: 0x07ffffff
render command stream:
  START: 0x00011000
  HEAD:  0xf9001d80 [0x00001d28]
    head = 0x00001d80, wraps = 1992
  TAIL:  0x00001da8 [0x00001d80, 0x00001da8]
  CTL:   0x00003001
    len=16384, enabled
  MODE:  0x00000000
  HWS:   0xfffe8000
  ACTHD: 0x00000000 f9001d80
    at ring: 0x00000000
  IPEIR: 0x00000000
  IPEHR: 0x7a000004
  INSTDONE: 0xffdfffff
    busy: CS
  SC_INSTDONE: 0xfffffbff
  SAMPLER_INSTDONE[0][0]: 0xffffffff
  SAMPLER_INSTDONE[0][1]: 0xffffffff
  SAMPLER_INSTDONE[0][2]: 0xffffffff
  ROW_INSTDONE[0][0]: 0xfffffffd
  ROW_INSTDONE[0][1]: 0xfffffffd
  ROW_INSTDONE[0][2]: 0xfffffffd
  batch: [0x00000000_044a6000, 0x00000000_044ae000]
  BBADDR: 0x00000000_044a631c
  BB_STATE: 0x00000020
  INSTPS: 0x00008980
  INSTPM: 0x00000000
  FADDR: 0x00000000 00012da8
  RC PSMI: 0x00000010
  FAULT_REG: 0x00000000
  SYNC_0: 0x00000000
  SYNC_1: 0x00000000
  SYNC_2: 0x00000000
  GFX_MODE: 0x00008000
  PDP0: 0x000000041915e000
  PDP1: 0x0000000000000000
  PDP2: 0x0000000000000000
  PDP3: 0x0000000000000000
  seqno: 0x002e45d4
  last_seqno: 0x002e45d6
  waiting: yes
  ring->head: 0x00001d00
  ring->tail: 0x00001da8
  hangcheck stall: yes
  hangcheck action: dead
  hangcheck action timestamp: 4331761496, 122744 ms ago
  ELSP[0]:  pid 1042, ban score 0, seqno        2:002e45d5, emitted 123896ms ago, head 00001d28, tail 00001da8
  ELSP[1]:  pid 1904, ban score 0, seqno        a:002e45d6, emitted 123896ms ago, head 00001c10, tail 00001c88
  Active context: Xorg[1042] user_handle 1 hw_id 2, ban score 0 guilty 0 active 0
Comment 9 wettererscheinung 2017-11-11 14:45:21 UTC
Created attachment 135397 [details]
sys_class_drm_card0_error_17-11-11.txt

Dear Elizabeth,

just now it happened again (Freeze for like 10 seconds, then sudden
logout), although "intel_iommu=igfx_off" was activated. It seems to only
happen, when I close LibreOffice before night, leave my computer active
and logged in over night and then the next day work for some time with
LibreOffice.

I attached the new report to this mail.

If you need any other logs or reports please tell me.

Thanks for your help!
Maria
Comment 10 wettererscheinung 2017-11-11 17:02:54 UTC
Created attachment 135399 [details]
sys_class_drm_card0_error_17-11-11_B.txt

I have to correct myself, now it freezed and kicked shortly after a
fresh reboot. This didn't used to happen before.

It makes working pretty hard *sigh* I am typing the same text the third
time ...

Yours Maria



Maria:
> Dear Elizabeth,
> 
> just now it happened again (Freeze for like 10 seconds, then sudden
> logout), although "intel_iommu=igfx_off" was activated. It seems to only
> happen, when I close LibreOffice before night, leave my computer active
> and logged in over night and then the next day work for some time with
> LibreOffice.
> 
> I attached the new report to this mail.
> 
> If you need any other logs or reports please tell me.
> 
> Thanks for your help!
> Maria
>
Comment 11 Elizabeth 2017-11-13 18:20:03 UTC
(In reply to wettererscheinung from comment #10)
> 
Hello Maria, you can remove iommu parameter doing the same procedure, clearly it isn't related. I'm duplicating this bug to bug 101780 that is the same issue but reported earlier. Please keep track of the issue in that bug.

*** This bug has been marked as a duplicate of bug 101780 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.