Bug 89403 - crash with xorg-server-1.16.4 - xf86-video-intel-2.99.917 and kernel 3.10
Summary: crash with xorg-server-1.16.4 - xf86-video-intel-2.99.917 and kernel 3.10
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: high critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-03 08:29 UTC by Agostino Sarubbo
Modified: 2017-07-24 22:48 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg.0.log (18.31 KB, text/plain)
2015-03-14 19:30 UTC, Agostino Sarubbo
no flags Details
drm.debug=7 dmesg (181.14 KB, text/plain)
2015-03-14 19:36 UTC, Agostino Sarubbo
no flags Details
i915_error_state (695.96 KB, text/plain)
2015-03-17 08:40 UTC, Agostino Sarubbo
no flags Details

Description Agostino Sarubbo 2015-03-03 08:29:46 UTC
After the update to xorg-server-1.16.4 I get this crash and xorg hangs:

mar 02 14:58:01 arcadia kernel: ------------[ cut here ]------------
mar 02 14:58:01 arcadia kernel: WARNING: at drivers/gpu/drm/i915/i915_gem.c:1405 i915_gem_fault+0x1a9/0x210()
mar 02 14:58:01 arcadia kernel: unhandled error in i915_gem_fault: -22
mar 02 14:58:01 arcadia kernel: CPU: 0 PID: 1710 Comm: X Not tainted 3.10.61-gentoo #1
mar 02 14:58:01 arcadia kernel: Hardware name: Packard Bell DOTS E2/SJE02_PT, BIOS V3.12(DDR3) 12/16/2010
mar 02 14:58:01 arcadia kernel:  ffffffff8167eb2c ffffffff81036401 00000000ffffffea ffff880079ee5ce8
mar 02 14:58:01 arcadia kernel:  ffff88007c004000 ffff880079ee5d68 ffff880060096600 ffffffff81036487
mar 02 14:58:01 arcadia kernel:  ffffffff8184cbe0 0000000000000020 ffff880079ee5cf8 ffff880079ee5cb8
mar 02 14:58:01 arcadia kernel: Call Trace:
mar 02 14:58:01 arcadia kernel:  [<ffffffff8167eb2c>] ? dump_stack+0xc/0x15
mar 02 14:58:01 arcadia kernel:  [<ffffffff81036401>] ? warn_slowpath_common+0x51/0x70
mar 02 14:58:01 arcadia kernel:  [<ffffffff81036487>] ? warn_slowpath_fmt+0x47/0x50
mar 02 14:58:01 arcadia kernel:  [<ffffffff8127158c>] ? idr_get_empty_slot+0x16c/0x3d0
mar 02 14:58:01 arcadia kernel:  [<ffffffff81350f29>] ? i915_gem_fault+0x1a9/0x210
mar 02 14:58:01 arcadia kernel:  [<ffffffff810c7bda>] ? __do_fault+0x6a/0x4a0
mar 02 14:58:01 arcadia kernel:  [<ffffffff8132d82c>] ? drm_mm_get_block_generic+0x3c/0x50
mar 02 14:58:01 arcadia kernel:  [<ffffffff810ca4ff>] ? handle_pte_fault+0x8f/0x730
mar 02 14:58:01 arcadia kernel:  [<ffffffff810cecc3>] ? vma_link+0x73/0xc0
mar 02 14:58:01 arcadia kernel:  [<ffffffff810cb64a>] ? handle_mm_fault+0x10a/0x1b0
mar 02 14:58:01 arcadia kernel:  [<ffffffff8102c343>] ? __do_page_fault+0x153/0x440
mar 02 14:58:01 arcadia kernel:  [<ffffffff810d1538>] ? do_mmap_pgoff+0x2f8/0x3b0
mar 02 14:58:01 arcadia kernel:  [<ffffffff810c18cb>] ? vm_mmap_pgoff+0x9b/0xc0
mar 02 14:58:01 arcadia kernel:  [<ffffffff816876c2>] ? page_fault+0x22/0x30
mar 02 14:58:01 arcadia kernel: ---[ end trace db6862cb75be8e0f ]---
mar 02 14:58:03 arcadia org.kde.kuiserver[1820]: kuiserver: Fatal IO error: client killed
mar 02 14:58:03 arcadia kdm[1708]: X server for display :0 terminated unexpectedly
mar 02 14:58:03 arcadia kdm[1727]: :0[1727]: pam_unix(kde-np:session): session closed for user ago


Feel free to ask any other detail you need.
Comment 1 Chris Wilson 2015-03-03 09:12:16 UTC
Xorg.0.log?
Comment 2 Chris Wilson 2015-03-03 09:59:41 UTC
If possible, an Xorg.0.log with --enable-debug=full would be very useful (USE=full-debug)
Comment 3 Chris Wilson 2015-03-03 11:04:33 UTC
And on the other side, a drm.debug=7 dmesg (capturing the error message) would also be useful.
Comment 4 Agostino Sarubbo 2015-03-14 19:30:40 UTC
Created attachment 114313 [details]
Xorg.0.log

(In reply to Chris Wilson from comment #1)
> Xorg.0.log?

That's it.


(In reply to Chris Wilson from comment #2)
> If possible, an Xorg.0.log with --enable-debug=full would be very useful
> (USE=full-debug)

Sorry, but if I compile with debug=full, xorg does not start and the machine hangs. I don't have a shell and I'm able to poweroff just because of acpid.
Comment 5 Agostino Sarubbo 2015-03-14 19:36:06 UTC
Created attachment 114314 [details]
drm.debug=7 dmesg

(In reply to Chris Wilson from comment #3)
> And on the other side, a drm.debug=7 dmesg (capturing the error message)
> would also be useful.

that's it.
Comment 6 Chris Wilson 2015-03-14 21:12:14 UTC
Now we have a slightly different issue. The GPU hangs very early. Can you please attach /sys/class/drm/card0/error?
Comment 7 Agostino Sarubbo 2015-03-14 21:47:28 UTC
(In reply to Chris Wilson from comment #6)
> Now we have a slightly different issue. The GPU hangs very early. Can you
> please attach /sys/class/drm/card0/error?

There is no file called error after the hang.


BTW, do you guess that update to a newer kernel would solve the problem?
Comment 8 Chris Wilson 2015-03-14 22:06:30 UTC
After the hang, before the next reboot. In the latest Xorg/dmesg attached, there should be an error state. If it is not there, you have multiple bugs and we may need to diagnose each separately (i.e. attach Xorg/dmesg for each different failure mode).
Comment 9 Agostino Sarubbo 2015-03-14 22:18:02 UTC
(In reply to Chris Wilson from comment #8)
> After the hang, before the next reboot. 
I know, but there isn't

From dmesg I see that it points to /sys/kernel/debug/dri/0/i915_error_state instead of something else..
Comment 10 Chris Wilson 2015-03-15 12:39:47 UTC
Ok, your kernel is that old! Please attach /sys/kernel/debug/dri/0/i915_error_state then!
Comment 11 Agostino Sarubbo 2015-03-17 08:40:58 UTC
Created attachment 114379 [details]
i915_error_state
Comment 12 Agostino Sarubbo 2015-04-07 15:29:14 UTC
I confirm that the problem is just the kernel 3.10. I updated to 3.18 and all work as expected.

Do you guess this bug should remains as NEEDINFO?
Comment 13 Chris Wilson 2015-04-07 15:35:14 UTC
The relocation routine in the kernel failed, so the first BLT overwrote the CS data structures and not unsurprisingly the GPU died.

If you really, really want to stick with 3.10, you need to work with the gentoo kernel team to see which patch they are missing. But as for upstream, we can mark this as resolved.
Comment 14 Agostino Sarubbo 2015-04-07 15:54:06 UTC
(In reply to Chris Wilson from comment #13)
> If you really, really want to stick with 3.10, you need to work with the
> gentoo kernel team to see which patch they are missing. But as for upstream,
> we can mark this as resolved.

The problem is not gentoo here. We are using the vanilla sources plus some patches. So the fact is that xorg-server-1.16.4 and xf86-video-intel-2.99.917 does not work with the 3.10 upstream kernel which is an LTS.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.