Bug 101980 - [g33 v4.9] gpu hung
Summary: [g33 v4.9] gpu hung
Status: CLOSED NOTOURBUG
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-31 11:39 UTC by miguelramos
Modified: 2018-03-02 15:18 UTC (History)
1 user (show)

See Also:
i915 platform: G33
i915 features: GPU hang


Attachments
Crash dump error file (700.63 KB, text/plain)
2017-07-31 11:39 UTC, miguelramos
no flags Details
Crash dump error file (19.31 KB, text/plain)
2017-09-13 10:57 UTC, miguelramos
no flags Details

Description miguelramos 2017-07-31 11:39:01 UTC
Created attachment 133145 [details]
Crash dump error file

I've been getting this as a log after my desktop freezes and the screen goes black and then comes back working again:


[drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
jul 31 13:06:31 xxxx kernel: [drm] drm/i915 developers can then reassign to the right component if it's not a kernel 
jul 31 13:06:31 xxxx kernel: [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
jul 31 13:06:31 xxxx kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
jul 31 13:06:31 xxxx kernel: drm/i915: Resetting chip after gpu hang

uname -a gives:

 Linux xxxx 4.9.39-gentoo #2 SMP PREEMPT Wed Jul 26 22:58:26 CEST 2017 x86_64 Intel(R) Core(TM)2 Quad CPU Q8300 @ 2.50GHz GenuineIntel GNU/Linux
Comment 1 Chris Wilson 2017-07-31 12:34:12 UTC
Hmm, not much going on. The first thing that stood out as needing to be checked is
    
     00000000_1e7ff000  8388608 36 00 [ 12c57 00 00 00 00 ] 12c57 Y dirty render uncached

that is whether the alignment is correct for the unfenced Y buffer.

u32 i915_gem_fence_alignment(struct drm_i915_private *i915, u32 size,
                             unsigned int tiling, unsigned int stride)
{
        GEM_BUG_ON(!size);

        /*
         * Minimum alignment is 4k (GTT page size), but might be greater
         * if a fence register is needed for the object.
         */
        if (tiling == I915_TILING_NONE)
                return I915_GTT_MIN_ALIGNMENT;

        if (INTEL_GEN(i915) >= 4)
                return I965_FENCE_PAGE;

        /*
         * Previous chips need to be aligned to the size of the smallest
         * fence register that can contain the object.
         */
        return i915_gem_fence_size(i915, size, tiling, stride);
}

so that's clearly wrong. 

Can you please try a later kernel, say v4.12?
Comment 2 miguelramos 2017-07-31 12:47:18 UTC
OK. I will try kernel v4.12.4 as soon as possible.
Comment 3 Elizabeth 2017-08-31 21:26:35 UTC
(In reply to miguelramos from comment #2)
> OK. I will try kernel v4.12.4 as soon as possible.
Hello Miguel, any update testing with newer kernel? Thank you.
Comment 4 miguelramos 2017-08-31 22:40:47 UTC
I was on vacation with no access to the machine to make testing. I expect to tell you something in a few days.
Comment 5 miguelramos 2017-09-07 09:56:31 UTC
After a few days of testing using Gentoo linux kernel sources 4.12.10, the problem seems to have dissapeared. 

uname -a gives:

Linux xxx 4.12.10-gentoo #1 SMP PREEMPT Mon Sep 4 10:00:43 CEST 2017 x86_64 Intel(R) Core(TM)2 Quad CPU Q8300 @ 2.50GHz GenuineIntel GNU/Linux
Comment 6 miguelramos 2017-09-13 10:57:13 UTC
Created attachment 134197 [details]
Crash dump error file
Comment 7 miguelramos 2017-09-13 11:04:34 UTC
Problems are back as can be read in the attachment which contains /sys/class/drm/card0/error .

As before, my desktop freezes and the screen turns black and then comes back working again:


uname -a gives:

Linux xxx 4.12.11-gentoo #2 SMP PREEMPT Tue Sep 12 09:22:55 CEST 2017 x86_64 Intel(R) Core(TM)2 Quad CPU Q8300 @ 2.50GHz GenuineIntel GNU/Linux

dmesg registered:


[10384.533276] [drm] GPU HANG: ecode 3:0:0x028df8c7, in gnome-shell [20821], reason: Hang on rcs, action: reset
[10384.533278] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[10384.533279] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[10384.533280] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[10384.533280] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[10384.533281] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[10384.549124] drm/i915: Resetting chip after gpu hang
Comment 8 Elizabeth 2017-09-13 16:49:04 UTC
Active process (on ring render): gnome-shell [20821], score 0
  INSTDONE: 0x7d0ff8c1
    busy: Framebuffer Compression
    busy: Strips and fans
    busy: Setup engine
    busy: Windowizer
    busy: Intermediate Z
    busy: Bypass FIFO
    busy: Pixel shader
    busy: Color calculator
Last action executed before hang:
  IPEHR: 0x7f820006
0x004321ac:      0x7f820006: 3DPRIMITIVE random indirect TRILIST (6)
0x004321b0:      0x00040003:        indices: 0x0003, 0x0004
0x004321b4:      0x00040006:        indices: 0x0006, 0x0004
0x004321b8:      0x00060005:        indices: 0x0005, 0x0006
Comment 9 Elizabeth 2017-09-13 16:52:27 UTC
(In reply to miguelramos from comment #7)
> Problems are back as can be read in the attachment which contains
> /sys/class/drm/card0/error .
>...
Is this result with kernel unpatched? Thanks.
Comment 10 miguelramos 2017-09-15 07:38:12 UTC
This result comes out using gentoo patched kernel sources.

Now I am just testing with vanilla (unpatched) 4.13.2 sources.

Let see what happens.
Comment 11 miguelramos 2017-10-03 08:13:12 UTC
After detailed testing, I can assure without a doubt that the problem only appears with the Gentoo patched kernel if experimental CPU GCC optimisations are active.

One can keep those optimisations without getting the bug report for the GPU when linux-headers are properly upgraded. That action might produce any other problem to  your Gentoo distribution (like, for instance, a broken Chromium compilation as usually happens), which fortunately is not my case.

I think the bug might be closed in this thread at least.
Comment 12 Elizabeth 2018-01-25 23:47:41 UTC
Closing then. Thanks for your time.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.