Bug 89334

Summary: [945 regression] 4.0-rc1 kernel GPU hang: ecode 3:0:0x02faffcf (i915) stuck on render ring ecode 3:0:0x02faffcf
Product: DRI Reporter: Jim <jimmcdevitt60>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED DUPLICATE QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: medium CC: intel-gfx-bugs, jimmcdevitt60, kai.huuhko, mathenge
Version: DRI git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: I945G i915 features: GPU hang
Attachments:
Description Flags
Requested logs
none
Ensure that no fences are active for the vertex buffer
none
w/o gallium
none
Result after patch 113914 applied
none
Force set-domain on batch
none
more logs without gallium
none
with gallium & both patches
none
New error reports
none
log from rc5
none
/sys/class/drm/card0/error none

Description Jim 2015-02-26 08:05:08 UTC
After eliminating the other members of the stack, It appears that 4.0~rc1 has a regression. One of two things happen:

1)
Feb 25 18:24:42 Aesop kernel: [ 1318.339897] [drm] stuck on render ring
Feb 25 18:24:42 Aesop kernel: [ 1318.341695] [drm] GPU HANG: ecode 3:0:0x02e6ffc1, in Xorg [2282], reason: Ring hung, action: reset
Feb 25 18:24:42 Aesop kernel: [ 1318.341698] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
Feb 25 18:24:42 Aesop kernel: [ 1318.341700] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
Feb 25 18:24:42 Aesop kernel: [ 1318.341702] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
Feb 25 18:24:42 Aesop kernel: [ 1318.341703] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
Feb 25 18:24:42 Aesop kernel: [ 1318.341705] [drm] GPU crash dump saved to /sys/class/drm/card0/error
Feb 25 18:24:42 Aesop kernel: [ 1318.371624] drm/i915: Resetting chip after gpu hang
Feb 25 18:48:56 Aesop kernel: [ 2772.339749] [drm] stuck on render ring
Feb 25 18:48:56 Aesop kernel: [ 2772.341710] [drm] GPU HANG: ecode 3:0:0x02faffcf, in Xorg [2282], reason: Ring hung, action: reset
Feb 25 18:48:56 Aesop kernel: [ 2772.371456] drm/i915: Resetting chip after gpu hang

which you can restart X (provided you know your system blind as nothing is intelligent on the screen.) This occurs seemingly at random, frequently.

2)
The system just locks - screen perfectly frozen. Cycle the power. No trace of any crash or oops.

As usual, for item one - you can not get at the crash dump when you try to ssh into the box to look at it; its ALWAYS empty (/sys/class/drm/card0/error). Also there is no pattern as to when it happens either.

Additional info:

OpenGL renderer string: Gallium 0.4 on i915 (chipset: 945G)
OpenGL version string: 2.1 Mesa 10.4.5

Compiled with: gcc version 4.9.2 (Both OS & Mesa)

xorg-server: 1.15.1
Module intel: vendor="X.Org Foundation"
compiled for 1.15.1, module version = 2.99.917
Module class: X.Org Video Driver
ABI class: X.Org Video Driver, version 15.0

Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.0.0-0-reaper root=UUID=8536097e-c02a-4a8c-9cdf-b40ba4e3b74d ro crashkernel=384M-2G:64M,2G-:128M drm_kms_helper.edid_firmware=edid/1280x1024_75.bin thermal.crt=75 quiet splash vt.handoff=7

BIOS DMI: ECS 945GCT-M2/945GCT-M2, BIOS 080012  07/18/2008

Thanks in advance again for your help.
Comment 1 Chris Wilson 2015-02-26 08:20:27 UTC
/sys/class/drm/card0/error is never empty; ls lies.
Comment 2 Jim 2015-02-26 20:59:05 UTC
cat too? must be a conspiracy.
Comment 3 Chris Wilson 2015-02-26 21:40:28 UTC
It's a conspiracy:

=0 ickle:/usr/src/linux$ sudo cat /sys/class/drm/card0/error 
no error state collected
Comment 4 Jim 2015-02-28 12:05:50 UTC
Oops

"its ALWAYS empty" -pedantic == "devoid of useful information" meā culpā.
Comment 5 Chris Wilson 2015-02-28 13:20:00 UTC
Please attach whatever you can find in /sys/class/drm/card0/error and dmesg after hang type 1. Also try i915c rather i915g (i.e. the intel driver rather than the gallium version). Attaching Xorg.0.log would explain a bit more.
Comment 6 Jim 2015-03-01 02:00:01 UTC
K Will have for you tommorow AM (GMT +8)
Comment 7 Jim 2015-03-02 07:35:17 UTC
It took around TWENTY crashes before I finally got an error log in sys...
nothing really useful in the other files but here they are.

This is with the gallium driver. I'll do the regular driver next.
Comment 8 Jim 2015-03-02 07:37:41 UTC
Created attachment 113908 [details]
Requested logs

gallium driver
Comment 9 Chris Wilson 2015-03-02 08:55:33 UTC
Oh, that is interesting. The kernel did not release the fence for the vertex buffer, which is naughty.

0x018000d8:      0x7d040031: 3DSTATE_LOAD_STATE_IMMEDIATE_1
0x018000dc:      0x01500000:    S0: vbo offset: 0x01500000
0x018000e0:      0x04040000:    S1: vertex width: 4, vertex pitch: 4

but

    01500000   262144 76 00 19dc 0 dirty render uncached (fence: 11)
  fence[11] = 01500031
    valid, x-tiled, pitch: 4096, start: 0x01500000, size: 1048576
Comment 10 Chris Wilson 2015-03-02 08:56:34 UTC
(I am not sure if that actually would cause a GPU hang, but if you get a chance a kernel bisect would be useful, I have a few suspect commits in mind...)
Comment 11 Chris Wilson 2015-03-02 09:02:38 UTC
Created attachment 113914 [details] [review]
Ensure that no fences are active for the vertex buffer

This tests the same idea from xf86-video-intel by flagging that the kernel needs to revoke any fences on the vertex buffers.
Comment 12 Chris Wilson 2015-03-02 09:10:56 UTC
Actually thinking about what changed: it is that we no longer use the GTT to write into the vertex buffers so the fence is no longer being flushed before the batch. Interesting...
Comment 13 Jim 2015-03-02 10:50:52 UTC
Created attachment 113915 [details]
w/o gallium
Comment 14 Jim 2015-03-02 20:37:17 UTC
Created attachment 113925 [details]
Result after patch 113914 applied
Comment 15 Jim 2015-03-02 20:41:33 UTC
(In reply to Chris Wilson from comment #10)
> (I am not sure if that actually would cause a GPU hang, but if you get a
> chance a kernel bisect would be useful, I have a few suspect commits in
> mind...)

Care to share so I can try before I have to do the bisect?
Comment 16 Jim 2015-03-02 20:44:00 UTC
My explanation got cut about your patch:

After I applied your patch I thought that you nailed it but the problem persists; however, syslog doesn't complain but the error log AND xorg do. I enclose both for your perusal. What was different also was that this time I didn't have to keep trying until a error log was produced - It was produced when the issue happened.
Comment 17 Chris Wilson 2015-03-02 20:52:12 UTC
(In reply to Jim from comment #16)
> My explanation got cut about your patch:
> 
> After I applied your patch I thought that you nailed it but the problem
> persists; however, syslog doesn't complain but the error log AND xorg do. I
> enclose both for your perusal. What was different also was that this time I
> didn't have to keep trying until a error log was produced - It was produced
> when the issue happened.

(In reply to Jim from comment #14)
> Created attachment 113925 [details]
> Result after patch 113914 applied

Right, I think perhaps we have a second issue here. Sadly the error capture failed - it dumped the wrong batch - could you grab a few more and lets see if one pinpoints the issue. So I think the fenced vertex buffer is definitely one issue, and that is probably responsible for the system hangs. Now on to the next.
Comment 18 Chris Wilson 2015-03-02 21:12:24 UTC
Ok, the no-gallium logs point towards an incoherent batch buffer. It's likely to bisect to mmap(wc) again, but the symptoms are a little more puzzling.
Comment 19 Chris Wilson 2015-03-02 22:11:48 UTC
Created attachment 113929 [details] [review]
Force set-domain on batch

I hope this does not match a difference...
Comment 20 Jim 2015-03-03 07:02:58 UTC
Created attachment 113934 [details]
more logs without gallium

The second patch seems to make things worse for the no gallium driver as well as the gallium driver.
Comment 21 Jim 2015-03-03 07:04:53 UTC
Created attachment 113935 [details]
with gallium & both patches

With the second patch to xf86 driver the gallium driver acts like it always did except it now logs the error in the error file. With only the first patch, the problem does not happen nearly as much - So for the gallium driver, just the first patch works better than both patches together.
Comment 22 Jim 2015-03-03 07:06:28 UTC
I will send more logs when I have more time latter. I thought you might like to have a look at these while you wait.
Comment 23 Chris Wilson 2015-03-03 08:09:20 UTC
One final ddx change just to check a potential root cause of the problem:

diff --git a/src/sna/kgem.c b/src/sna/kgem.c
index a5571aa..adf52d6 100644
--- a/src/sna/kgem.c
+++ b/src/sna/kgem.c
@@ -83,7 +83,7 @@ search_snoop_cache(struct kgem *kgem, unsigned int num_pages, unsigned flags);
 #define DBG_NO_FAST_RELOC 0
 #define DBG_NO_HANDLE_LUT 0
 #define DBG_NO_WT 0
-#define DBG_NO_WC_MMAP 0
+#define DBG_NO_WC_MMAP 1
 #define DBG_DUMP 0
 #define DBG_NO_MALLOC_CACHE 0
Comment 24 Jim 2015-03-04 13:59:12 UTC
(In reply to Chris Wilson from comment #23)
> One final ddx change just to check a potential root cause of the problem:
> 
> diff --git a/src/sna/kgem.c b/src/sna/kgem.c
> index a5571aa..adf52d6 100644
> --- a/src/sna/kgem.c
> +++ b/src/sna/kgem.c
> @@ -83,7 +83,7 @@ search_snoop_cache(struct kgem *kgem, unsigned int
> num_pages, unsigned flags);
>  #define DBG_NO_FAST_RELOC 0
>  #define DBG_NO_HANDLE_LUT 0
>  #define DBG_NO_WT 0
> -#define DBG_NO_WC_MMAP 0
> +#define DBG_NO_WC_MMAP 1
>  #define DBG_DUMP 0
>  #define DBG_NO_MALLOC_CACHE 0

OK have more for you in the AM. Many thanks for your time.
Comment 25 Jim 2015-03-05 11:49:00 UTC
I had just 1 (one) system lock-up. No hang yet. The screen now goes black for about 3 sec the comes back to normal. This happens about once every 60-90 min.

I will run piglet all night and post back the results.

You were right - goes to mmap.
Comment 26 Jim 2015-03-11 12:17:19 UTC
No more hangs it is a bit slow though.

Sorry I took so long - had to go on a trip.
Comment 27 Jim 2015-03-12 23:36:54 UTC
I now get these diatribes:

After these happen you have to power cycle as system is frozen.

Mar 13 07:12:54 Aesop kernel: [ 9132.585443] ------------[ cut here ]------------
Mar 13 07:12:54 Aesop kernel: [ 9132.585489] kernel BUG at /home/jim/software/ubuntu/4.0-rc3/drivers/gpu/drm/drm_mm.c:305!
Mar 13 07:12:54 Aesop kernel: [ 9132.585533] invalid opcode: 0000 [#1] SMP 
Mar 13 07:12:54 Aesop kernel: [ 9132.585562] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_DSCP ctr ccm nf_log_ipv4 nf_log_common xt_tcpudp ip6table_mangle nf_conntrack_irc xt_TCPMSS xt_LOG ipt_REJECT xt_multiport xt_state xt_limit xt_conntrack nf_conntrack_ftp ip6table_filter ip6_tables lp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables x_tables arc4 rt2800usb rt2800lib crc_ccitt rt2x00usb rt2x00lib mac80211 gspca_zc3xx gspca_main cfg80211 videodev cdc_ether usbnet uas snd_hda_codec_idt usb_storage snd_hda_codec_generic snd_hda_intel snd_hda_controller ipv6 snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore gpio_ich lpc_ich mfd_core ppdev serio_raw parport_pc parport it87 hwmon_vid pata_acpi 8139too 8139cp mii i915 drm_kms_helper
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] CPU: 1 PID: 4416 Comm: compiz Not tainted 4.0.0-0-reaper #3~rc3
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] Hardware name: ECS 945GCT-M2/945GCT-M2, BIOS 080012  07/18/2008
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] task: ffff88005c53dfc0 ti: ffff88007b2ac000 task.ti: ffff88007b2ac000
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] RIP: 0010:[<ffffffff814a00c2>]  [<ffffffff814a00c2>] drm_mm_insert_node_in_range_generic+0x3a7/0x3b0
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] RSP: 0018:ffff88007b2afa28  EFLAGS: 00010206
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] RAX: ffff88003047b600 RBX: 0000000000000000 RCX: ffff88003047b610
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] RDX: 0000000000118000 RSI: ffff88003047b500 RDI: ffff88003047b600
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] RBP: ffff88007b2afab8 R08: 0000000000018000 R09: 0000000000000000
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] R10: 0000000000100000 R11: 0000000000000000 R12: 0000000000000000
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] R13: 0000000000000000 R14: ffff880078037e08 R15: ffff88003047b500
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] FS:  00007fc366636780(0000) GS:ffff88007f480000(0000) knlGS:0000000000000000
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] CR2: 00007fc343976000 CR3: 0000000057cc6000 CR4: 00000000000007e0
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] Stack:
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  ffff88003047b600 0000000000100000 ffff88003047b500 0000000000100000
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  0010000000000000 0000000000000000 0000000000018000 0000000000018000
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  ffff88007b2afa88 0000000000100000 0000000000100000 000000005fe9b2fa
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] Call Trace:
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  [<ffffffffa00578cb>] i915_gem_object_pin_view+0x627/0x89d [i915]
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  [<ffffffffa0049d65>] i915_gem_execbuffer_reserve_vma.isra.16+0x6f/0xf5 [i915]
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  [<ffffffffa004a0ee>] i915_gem_execbuffer_reserve+0x303/0x364 [i915]
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  [<ffffffffa004ab37>] i915_gem_do_execbuffer.isra.22+0x5a0/0x101d [i915]
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  [<ffffffff81172b94>] ? __slab_free+0x66/0x249
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  [<ffffffffa00563a2>] ? i915_gem_object_put_fence+0x20/0xe4 [i915]
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  [<ffffffff81172ac1>] ? __kmalloc+0x10e/0x17b
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  [<ffffffffa004c7cf>] i915_gem_execbuffer2+0xb2/0x283 [i915]
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  [<ffffffff81497c8b>] drm_ioctl+0x1d2/0x5cb
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  [<ffffffff811991a5>] ? __dentry_kill+0x145/0x1be
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  [<ffffffff811993ea>] ? dput+0x1cc/0x1fd
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  [<ffffffff81195cde>] do_vfs_ioctl+0x355/0x4ae
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  [<ffffffff81183c88>] ? ____fput+0xe/0x10
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  [<ffffffff81195eb0>] SyS_ioctl+0x79/0x89
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  [<ffffffff816dce76>] system_call_fastpath+0x16/0x1b
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] Code: ff 48 89 f0 e9 ae fe ff ff b8 e4 ff ff ff e9 4e ff ff ff 0f 0b 8b 4d 94 29 d1 48 8d 04 31 48 89 45 b8 e9 90 fe ff ff 0f 0b 0f 0b <0f> 0b 0f 0b e8 d4 f7 ba ff 0f 1f 44 00 00 55 48 89 e5 41 57 41 
Mar 13 07:12:54 Aesop kernel: [ 9132.585763] RIP  [<ffffffff814a00c2>] drm_mm_insert_node_in_range_generic+0x3a7/0x3b0
Mar 13 07:12:54 Aesop kernel: [ 9132.585763]  RSP <ffff88007b2afa28>
Mar 13 07:12:54 Aesop kernel: [ 9132.673716] ---[ end trace b5709a427aa7df79 ]---

another:

Mar 13 07:23:42 Aesop kernel: [  516.020001] ------------[ cut here ]------------
Mar 13 07:23:42 Aesop kernel: [  516.020046] kernel BUG at /home/jim/software/ubuntu/4.0-rc3/drivers/gpu/drm/drm_mm.c:305!
Mar 13 07:23:42 Aesop kernel: [  516.020090] invalid opcode: 0000 [#1] SMP 
Mar 13 07:23:42 Aesop kernel: [  516.020119] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_DSCP cdc_ether usbnet uas usb_storage ctr ccm nf_log_ipv4 nf_log_common xt_tcpudp ip6table_mangle nf_conntrack_irc xt_TCPMSS xt_LOG ipt_REJECT xt_multiport xt_state xt_limit xt_conntrack nf_conntrack_ftp ip6table_filter ip6_tables lp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_filter ip_tables x_tables arc4 rt2800usb rt2800lib crc_ccitt rt2x00usb rt2x00lib mac80211 cfg80211 gspca_zc3xx gspca_main videodev snd_hda_codec_idt snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec ipv6 snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore gpio_ich lpc_ich mfd_core serio_raw ppdev parport_pc parport it87 hwmon_vid pata_acpi 8139too 8139cp mii i915 drm_kms_helper
Mar 13 07:23:42 Aesop kernel: [  516.020709] CPU: 0 PID: 4429 Comm: compiz Not tainted 4.0.0-0-reaper #3~rc3
Mar 13 07:23:42 Aesop kernel: [  516.020746] Hardware name: ECS 945GCT-M2/945GCT-M2, BIOS 080012  07/18/2008
Mar 13 07:23:42 Aesop kernel: [  516.020782] task: ffff8800774a17f0 ti: ffff880053bd8000 task.ti: ffff880053bd8000
Mar 13 07:23:42 Aesop kernel: [  516.020822] RIP: 0010:[<ffffffff814a00c2>]  [<ffffffff814a00c2>] drm_mm_insert_node_in_range_generic+0x3a7/0x3b0
Mar 13 07:23:42 Aesop kernel: [  516.020884] RSP: 0018:ffff880053bdba28  EFLAGS: 00010206
Mar 13 07:23:42 Aesop kernel: [  516.020914] RAX: ffff88002a60c800 RBX: 0000000000000000 RCX: ffff88002a60c810
Mar 13 07:23:42 Aesop kernel: [  516.022960] RDX: 0000000001c00000 RSI: ffff88007829fa00 RDI: ffff88002a60c800
Mar 13 07:23:42 Aesop kernel: [  516.023289] RBP: ffff880053bdbab8 R08: 0000000000700000 R09: 0000000000000000
Mar 13 07:23:42 Aesop kernel: [  516.023289] R10: 0000000001800000 R11: 0000000000000000 R12: 0000000000000000
Mar 13 07:23:42 Aesop kernel: [  516.023289] R13: 0000000000000000 R14: ffff880078007e08 R15: ffff88007829fa00
Mar 13 07:23:42 Aesop kernel: [  516.023289] FS:  00007ffa49876780(0000) GS:ffff88007f400000(0000) knlGS:0000000000000000
Mar 13 07:23:42 Aesop kernel: [  516.023289] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 13 07:23:42 Aesop kernel: [  516.023289] CR2: 00007ffa26f16000 CR3: 000000005ef39000 CR4: 00000000000007f0
Mar 13 07:23:42 Aesop kernel: [  516.023289] Stack:
Mar 13 07:23:42 Aesop kernel: [  516.023289]  ffff88002a60c800 0000000001000000 ffff88007829fa00 0000000001800000
Mar 13 07:23:42 Aesop kernel: [  516.023289]  0100000000000000 0000000000000000 0000000000700000 0000000000c00000
Mar 13 07:23:42 Aesop kernel: [  516.023289]  dead000000100100 0000000001000000 0000000001800000 0000000030408e9c
Mar 13 07:23:42 Aesop kernel: [  516.043640] Call Trace:
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffffa00578cb>] i915_gem_object_pin_view+0x627/0x89d [i915]
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffffa005020b>] ? i915_gem_obj_lookup_or_create_vma_view+0x46/0x1bf [i915]
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffffa0049d65>] i915_gem_execbuffer_reserve_vma.isra.16+0x6f/0xf5 [i915]
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffffa004a0ee>] i915_gem_execbuffer_reserve+0x303/0x364 [i915]
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffffa004ab37>] i915_gem_do_execbuffer.isra.22+0x5a0/0x101d [i915]
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffff81172b94>] ? __slab_free+0x66/0x249
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffffa00563a2>] ? i915_gem_object_put_fence+0x20/0xe4 [i915]
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffff81172ac1>] ? __kmalloc+0x10e/0x17b
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffffa004c7cf>] i915_gem_execbuffer2+0xb2/0x283 [i915]
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffff81497c8b>] drm_ioctl+0x1d2/0x5cb
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffff811991a5>] ? __dentry_kill+0x145/0x1be
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffff811993ea>] ? dput+0x1cc/0x1fd
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffff81195cde>] do_vfs_ioctl+0x355/0x4ae
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffff81183c88>] ? ____fput+0xe/0x10
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffff81195eb0>] SyS_ioctl+0x79/0x89
Mar 13 07:23:42 Aesop kernel: [  516.043640]  [<ffffffff816dce76>] system_call_fastpath+0x16/0x1b
Mar 13 07:23:42 Aesop kernel: [  516.043640] Code: ff 48 89 f0 e9 ae fe ff ff b8 e4 ff ff ff e9 4e ff ff ff 0f 0b 8b 4d 94 29 d1 48 8d 04 31 48 89 45 b8 e9 90 fe ff ff 0f 0b 0f 0b <0f> 0b 0f 0b e8 d4 f7 ba ff 0f 1f 44 00 00 55 48 89 e5 41 57 41 
Mar 13 07:23:42 Aesop kernel: [  516.043640] RIP  [<ffffffff814a00c2>] drm_mm_insert_node_in_range_generic+0x3a7/0x3b0
Mar 13 07:23:42 Aesop kernel: [  516.043640]  RSP <ffff880053bdba28>
Mar 13 07:23:42 Aesop kernel: [  516.094834] ---[ end trace 165f5c9e30f1777e ]---
Comment 28 Jim 2015-03-17 01:19:01 UTC
Update. Using xf86-video-intel commit 254bbd67 AND updating the kernel to 4.0~rc4 seems to be working - I still infrequently get machine hangs where the colors bleed all over the screen and the machine hangs. I'm not able to capture any details about that though.

Any suggestions?

Thanks

As far as I'm concerned, you may close this if you want.
Comment 29 Jim 2015-03-17 03:33:59 UTC
Created attachment 114362 [details]
New error reports

Every time just when I think it fixed. Please see log if it offers anything new.
Comment 30 Chris Wilson 2015-03-17 11:44:50 UTC
Yeah, the CS is still exhibiting incoherence. :(
Comment 31 Jim 2015-03-29 10:30:04 UTC
Created attachment 114700 [details]
log from rc5

Here is one from rc5.

Cheers
Comment 32 Jim 2015-04-08 04:27:48 UTC
I have not been able to test rc 6 & 7 because both versions cause X to seg fault. I need to use UXA instead of SNA now. This was the last thing I checked because I had ruled out SNA/UXA as a factor earlier - but since rc6 my hand is forced to use UXA. It is a performance hit in certain areas - but still acceptable considering...

If i go back to rc5, Using either SNA or UXB I still exhibit the same failure. It is also the same whether I use Gallium drivers or the intel c driver.
Comment 33 Jim 2015-04-13 07:36:38 UTC
rc6, rc7 and 4.0 final exhibits no error if I disable SNA.

This is a bit confusing to me but it is what it is.
Comment 34 Ben Hjelt 2015-08-25 17:22:17 UTC
Slackware -current (xf86-video-intel-git_20150824_3e07681) kernel 4.1.6 on an IBM T60:

[ 1741.004033] [drm] stuck on render ring
[ 1741.005028] [drm] GPU HANG: ecode 3:0:0x70efffc1, in Xorg [890], reason: Ring hung, action: reset
[ 1741.005031] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 1741.005033] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 1741.005035] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 1741.005036] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 1741.005038] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 1741.315597] drm/i915: Resetting chip after gpu hang
Comment 35 Ben Hjelt 2015-08-25 17:28:04 UTC
Created attachment 117912 [details]
/sys/class/drm/card0/error
Comment 36 cprigent 2015-10-01 15:53:17 UTC
Bug scrub:
Priority updated to medium.
Comment 37 Jim 2016-05-24 03:44:44 UTC
Update and an observation:

Using kernel 4.6 (from rc1 to release - no other kernel works) and mesa 11.2.2 (no other version works), using SNA not UXA. Driver version Problem has disappeared. I have found no other Linux/Mesa combination that works. I have 9 machines that have been running flawlessly since my birthday er May 13, 2016).

Driver is at 2.99.917. Driver can be release version or with any updates since release. In fact ANY recent versions work as long as kernel is 4.6 AND Mesa is at 11.2.2.

Cheers
Comment 38 Chris Wilson 2016-08-19 09:24:11 UTC

*** This bug has been marked as a duplicate of bug 90841 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.