Created attachment 141772 [details]
GPU crash dump
X unusable on this machine. Most of the time after 2nd start of X the machine freezes. (have to power cycle)
no Xorg.conf, but intel-specific settings in xorg.conf.d:
Option "AccelMethod" "sna"
Option "ExaNoComposite" "false"
Option "CacheLines" "1024"
Option "XvMC" "true"
Option "PreferredMode" "1280x1024
output of lspci:
00:02.0 VGA compatible controller: Intel Corporation Atom/Celeron/Pentium Processor x5-E8000/J3xxx/N3xxx Integrated Graphics Controller (rev 21)
00:02.0 0300: 8086:22b1 (rev 21) (numeric)
It walked off the end of the batch:
bcs0 command stream:
HEAD: 0x00000170 [0x00000150]
TAIL: 0x000001a0 [0x00000188, 0x000001a8]
ACTHD: 0x00000001 00000048
batch: [0x00000000_00942000, 0x00000000_00946000]
The batch is just a single blit with a valid MI_BBE; suggesting a TLB error or some other incoherency.
Just checking my various installations:
the same hardware used to work with xorg-server-1.16.4.-r5, xf86-video-intel-2.99.917_p20160203 and kernel 4.3.3-gentoo (all gentoo portage version numbering). I was just about upgrading my standard installation to latest versions and ran into this issue.
compiled xf86-video-intel from git master , now GPU HANG: ecode 8:0:0x00072727
Created attachment 141773 [details]
GPU crash dump with latest master git sources
It's not likely a userspace issue, so preferably check with drm-tip [https://github.com/freedesktop/drm-tip]
(In reply to Chris Wilson from comment #6)
> It's not likely a userspace issue, so preferably check with drm-tip
Ok, I'll give it a try next week, will have check how to acomplish that within gentoo...
> Ok, I'll give it a try next week, will have check how to acomplish that
> within gentoo...
Thanks for giving it a try. If issue persists, attach dmesg from boot with kernel parameters drm.debug=0x1e log_buf_len=4M.
(In reply to Lakshmi from comment #8)
> > Ok, I'll give it a try next week, will have check how to acomplish that
> > within gentoo...
> Thanks for giving it a try. If issue persists, attach dmesg from boot with
> kernel parameters drm.debug=0x1e log_buf_len=4M.
Now GPU HANG: ecode 8:0:0x85dffffb. Will reboot with suggested kernel opts and upload dmesg output.
Created attachment 141820 [details]
GPU crash dump with kernel 4-19.0-rc5
Created attachment 141821 [details]
dmesg output with extended drm debugging
xdm is not autostart, manually started @ ~ 44 secs uptime
btw. tried my old kernel 4.3.3-gentoo together with all the recent userspace apps (xorg-server, xf86-video-intel ... without even touching these), so basically just switch back to old kernel and no GPU hang occurs, X responds within normal time (whereas it is quite slow using recent kernel versions) and no artefacts are displayed on screen (was also the case sometimes with the GPU hangs.
Tried kernel 4.14.65 (the lastone officially marked stable on gentoo) crashes the machine at first start of X. No entries in log. Complete hangup, have to reset machine :-(
kernel 4.9.122-gentoo (in gentoo next stable after 4.14.65 going downwards) works. So some changes after this kernel must break DRI on this type of intel graphics controller. I'll think I'll stick to this kernel, unfortunately no specte/meltdown protection yet, but better that X completely broken ...
Gerhard, can attach cat /sys/class/drm/card$N/error from latest kernel after gpu hang? This will help in investigating the issue.
That last error is most bizarre. It is complaining it hit an absent PTE for the logical context image.
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7f308e713fae..2df5b8a1c988 100644
@@ -111,10 +111,7 @@ i915_get_ggtt_vma_pages(struct i915_vma *vma);
static void gen6_ggtt_invalidate(struct drm_i915_private *dev_priv)
- * Note that as an uncached mmio write, this will flush the
- * WCB of the writes into the GGTT before it triggers the invalidate.
(In reply to Lakshmi from comment #15)
> Gerhard, can attach cat /sys/class/drm/card$N/error from latest kernel after
> gpu hang? This will help in investigating the issue.
Sorry, this is still on my list. Was busy with other tasks recently, but if machine hangs completely then no chance :-( ... sometimes only X will crash, sometimes complete machine will be frozen.
Gerard, could you try to reproduce the error using drm-tip (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e log_buf_len=4M, and if the problem persists attach the full dmesg from boot. Thanks!
(In reply to Francesco Balestrieri from comment #19)
> Gerard, could you try to reproduce the error using drm-tip
> (https://cgit.freedesktop.org/drm-tip) and kernel parameters drm.debug=0x1e
> log_buf_len=4M, and if the problem persists attach the full dmesg from boot.
Is the repo identical to https://github.com/freedesktop/drm-tip, I'm sitting behind a firewall without git/ssh pass through ?
I'm not completely sure, but judging from the commit logs they are the same at the moment at least, so it should be fine to try with https://github.com/freedesktop/drm-tip
Apparently you should also be able to clone from https://anongit.freedesktop.org/git/drm-tip.git but I haven't tried it myself.
Reporter, have you tried to verify with drmtip?
(In reply to Lakshmi from comment #23)
> Reporter, have you tried to verify with drmtip?
Sorry, not yet, was quite busy last weeks ....
Gerhard, sorry for the bother but did you have a chance to try?
(In reply to Francesco Balestrieri from comment #25)
> Gerhard, sorry for the bother but did you have a chance to try?
Oops, sorry, not yet.
Gerhard, any updates?
Sorry, still no updates. Meanwhile I put the machine into factory, so I'll have to clone an new one for testing.
Gerhard, Any update with latest drmtip? If this issue not seen lately, I can close this bug.
If you do not have the possibility to reproduce the issue, I can close this issue so that you reopen when the issue appears again. What do you think?