Created attachment 50444 [details] dmesg output When I build the xf86-video-intel-2.16.0 driver with the new SandyBridge New Acceleration code, my xorg-server-1.10.4 (also 1.10.3) frequently crashes. But it seems the only application that is able to provoke the crash is Eclipse (3.7), which uses SWT. Using that, I only need to open some java file and scrolling around a bit or switching to some other tab. It seems that the crash will occur earlier if I scroll faster. Often after the crash, I don't even get back to a console using Ctrl-Alt-F1, but I only have a blank screen, so that I have to reboot the system using SysRQ keys. After a crash, my Xorg.0.log doesn't contain anything related to the crash (no backtrace or error messages, only the stuff written when X starts up). After compiling the driver without SNA support seems to fix the issue, at least I didn't occur a crash since then. Attached is the output of dmesg. I'm happy to provide any additional information, you just have to tell me how to gather it. I can also test-drive the git version if required. System environment: -- chipset: Mobile GM965/GL960 Integrated Graphics Controller -- system architecture: 64-bit -- xf86-video-intel: 2.16.0 -- xserver: xorg-server-1.10.4 -- mesa: 7.11 -- libdrm: 2.4.26 -- kernel: 3.0.3 -- Linux distribution: Gentoo GNU/Linux -- Machine or mobo model: Lenovo ThinkPad T61
Is that a dmesg after the crash? Can you grab the stderr from Xorg (usually captured in something like gdm.log)? My suspicion is that you're having fun with a buffer leak related to rendering trapezoids...
(In reply to comment #1) > Is that a dmesg after the crash? No, that output was from running dmesg on my current system where the driver is compiled without --enable-sna. If you need the dmesg output with --enable-sna after the crash occured, I can provide one this evening. > Can you grab the stderr from Xorg (usually > captured in something like gdm.log)? My suspicion is that you're having fun > with a buffer leak related to rendering trapezoids... Here's the output, and indeed it contains a backtrace: X.Org X Server 1.10.4 Release Date: 2011-08-19 X Protocol Version 11, Revision 0 Build Operating System: Linux 3.0.3-gentoo x86_64 Gentoo Current Operating System: Linux tsdh 3.0.3-gentoo #3 SMP PREEMPT Mon Aug 22 11:29:51 CEST 2011 x86_64 Kernel command line: root=/dev/sda3 init=/bin/systemd quiet Build Date: 21 August 2011 10:50:00AM Current version of pixman: 0.22.2 Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. (==) Log file: "/var/log/Xorg.0.log", Time: Mon Aug 22 12:20:57 2011 (==) Using config file: "/etc/X11/xorg.conf" (==) Using system config directory "/usr/share/X11/xorg.conf.d" (EE) keyboard: No device specified. (EE) PreInit returned 2 for "keyboard" (EE) ioctl EVIOCGNAME failed: Inappropriate ioctl for device (EE) PreInit returned 8 for "mouse" (EE) SynPS/2 Synaptics TouchPad Unable to query/initialize Synaptics hardware. (EE) PreInit returned 11 for "SynPS/2 Synaptics TouchPad" (EE) Query no Synaptics: 6003C8 (EE) SynPS/2 Synaptics TouchPad Unable to query/initialize Synaptics hardware. (EE) PreInit returned 11 for "SynPS/2 Synaptics TouchPad" [mi] EQ overflowing. The server is probably stuck in an infinite loop. Backtrace: 0: /usr/bin/X (xorg_backtrace+0x28) [0x45fcb8] 1: /usr/bin/X (mieqEnqueue+0x1f3) [0x45a223] 2: /usr/bin/X (xf86PostMotionEventM+0x92) [0x47fc42] 3: /usr/bin/X (xf86PostMotionEventP+0x3c) [0x47fd3c] 4: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7fd32d581000+0x48fa) [0x7fd32d5858fa] 5: /usr/bin/X (0x400000+0x6d567) [0x46d567] 6: /usr/bin/X (0x400000+0x11c029) [0x51c029] 7: /lib64/libpthread.so.0 (0x7fd3308ab000+0xf430) [0x7fd3308ba430] 8: /lib64/libc.so.6 (ioctl+0x7) [0x7fd32f8af0c7] 9: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x7fd32de52d88] 10: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fd32d998000+0x3eee9) [0x7fd32d9d6ee9] 11: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fd32d998000+0x3fc7b) [0x7fd32d9d7c7b] 12: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fd32d998000+0x54ee7) [0x7fd32d9ecee7] 13: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fd32d998000+0x40688) [0x7fd32d9d8688] 14: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fd32d998000+0x710e5) [0x7fd32da090e5] 15: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fd32d998000+0x71423) [0x7fd32da09423] 16: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fd32d998000+0x72533) [0x7fd32da0a533] 17: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fd32d998000+0x53c05) [0x7fd32d9ebc05] 18: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fd32d998000+0x547fa) [0x7fd32d9ec7fa] 19: /usr/bin/X (0x400000+0xdaa05) [0x4daa05] 20: /usr/bin/X (0x400000+0xd3e9c) [0x4d3e9c] 21: /usr/bin/X (0x400000+0x30b41) [0x430b41] 22: /usr/bin/X (0x400000+0x24aed) [0x424aed] 23: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7fd32f7fdf0d] 24: /usr/bin/X (0x400000+0x24699) [0x424699] xinit: connection to X server lost waiting for X server to shut down
Can you please do a $ addr2line -e /usr/lib64/xorg/modules/drivers/intel_drv.so 0x3eee9 0x3fc7b 0x54ee7 0x40688 0x710e5 0x71423 0x72533 0x53c05 0x547fa You just need to be careful that the intel_drv.so matches the one at the time of the crash (and the debug symbols are available!).
(In reply to comment #3) > Can you please do a > $ addr2line -e /usr/lib64/xorg/modules/drivers/intel_drv.so 0x3eee9 0x3fc7b > 0x54ee7 0x40688 0x710e5 0x71423 0x72533 0x53c05 0x547fa > > You just need to be careful that the intel_drv.so matches the one at the time > of the crash (and the debug symbols are available!). Oh, it wasn't compiled with debugging symbols. Now I've recompiled it again with --enable-sna, no stripping and these flags: CFLAGS="-mtune=native -O1 -pipe -g -ggdb" CXXFLAGS="${CFLAGS}" The new backtrace I got after crashing X by scrolling in Eclipse was this one: [mi] EQ overflowing. The server is probably stuck in an infinite loop. Backtrace: 0: /usr/bin/X (xorg_backtrace+0x28) [0x45fcb8] 1: /usr/bin/X (mieqEnqueue+0x1f3) [0x45a223] 2: /usr/bin/X (xf86PostMotionEventM+0x92) [0x47fc42] 3: /usr/bin/X (xf86PostMotionEventP+0x3c) [0x47fd3c] 4: /usr/lib64/xorg/modules/input/evdev_drv.so (0x7f561634b000+0x48fa) [0x7f561634f8fa] 5: /usr/bin/X (0x400000+0x6d567) [0x46d567] 6: /usr/bin/X (0x400000+0x11c029) [0x51c029] 7: /lib64/libpthread.so.0 (0x7f561966a000+0xf430) [0x7f5619679430] 8: /lib64/libc.so.6 (ioctl+0x7) [0x7f561866e0c7] 9: /usr/lib64/libdrm.so.2 (drmIoctl+0x28) [0x7f5616c11d88] 10: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f5616762000+0x3c1a0) [0x7f561679e1a0] 11: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f5616762000+0x3caab) [0x7f561679eaab] 12: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f5616762000+0x457f9) [0x7f56167a77f9] 13: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f5616762000+0x4d5b4) [0x7f56167af5b4] 14: /usr/bin/X (WakeupHandler+0x7b) [0x434afb] 15: /usr/bin/X (WaitForSomething+0x1bc) [0x45d73c] 16: /usr/bin/X (0x400000+0x308d2) [0x4308d2] 17: /usr/bin/X (0x400000+0x24aed) [0x424aed] 18: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7f56185bcf0d] 19: /usr/bin/X (0x400000+0x24699) [0x424699] (EE) intel(0): Detected a hung GPU, disabling acceleration. (EE) intel(0): When reporting this, please include i915_error_state from debugfs and the full dmesg. Every time my X crashed, my complete display got dysfunctional (even no console) and the file I piped stderr to didn't contain a backtrace. The backtrace above was from my 5th try, but since I had to console available, I was not able to run dmesg anymore or check that i915_error_state from debugfs. :-( Anyway, after a reboot, I executed addr2line with addresses adapted to the current backtrace: addr2line -e /usr/lib64/xorg/modules/drivers/intel_drv.so 0x3c1a0 0x3caab 0x457f9 0x4d5b4 /var/tmp/portage/x11-drivers/xf86-video-intel-2.16.0/work/xf86-video-intel-2.16.0/src/sna/kgem.c:216 /var/tmp/portage/x11-drivers/xf86-video-intel-2.16.0/work/xf86-video-intel-2.16.0/src/sna/kgem.c:586 /var/tmp/portage/x11-drivers/xf86-video-intel-2.16.0/work/xf86-video-intel-2.16.0/src/sna/sna_accel.c:3136 /var/tmp/portage/x11-drivers/xf86-video-intel-2.16.0/work/xf86-video-intel-2.16.0/src/sna/sna_driver.c:585
Ok, that's starting to make sense. We hung the GPU, waited for a long time for the hangcheck, and to cap it all the fallback didn't work. The stack trace is also from the point at which we check upon the GPU status -- it would be good if I can make that non-blocking! As hinted, there should be a /sys/kernel/debug/dri/0/i915_error_state after the hang (but will be lost on rebooting). You will need both CONFIG_DEBUGFS compiled into the kernel and mount -tdebugfs debug /sys/kernel/debug to make that error state accessible. Thanks.
Created attachment 50456 [details] /sys/kernel/debug/dri/0/i915_error_state
Ok, this time I was able to get to a console after the crash. It seems, having no external monitor attached helps. The backtrace was exactly the same as last time (same addresses), so the file/line numbers are the same. The i915_error_state is attached. After the crash, dmesg reports these additional line. [ 418.972052] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 418.972058] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 418.972899] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 27009 at 27004, next 27010)
This is where it dies: 0x04c722fc: 0x79000002: 3DSTATE_DRAWING_RECTANGLE 0x04c72300: 0x00000000: top left: 0,0 0x04c72304: 0x00000000: bottom right: 0,0 0x04c72308: 0x00000000: origin: 0,0 0x04c7230c: 0x7b003c04: 3DPRIMITIVE: rect list sequential 0x04c72310: 0x00000003: vertex count 0x04c72314: 0x0000001e: start vertex 0x04c72318: 0x00000001: instance count 0x04c7231c: 0x00000000: start instance 0x04c72320: 0x00000000: index bias 0x04c72324: 0x02000004: MI_FLUSH The 0x0 bottom-right looks slightly odd (a 1x1 surface) but should not be fatal. However that batch exhibits large amounts of corruption thats look reminiscent of tiling i.e. the GPU has overwritten that batch. That is a bad, bad sign. ./configure --enable-sna --enable-debug=full may help and spot the error, but it will slow your system down immensely with the extremely verbose logging.
(In reply to comment #8) > ./configure --enable-sna --enable-debug=full may help and spot the error, but > it will slow your system down immensely with the extremely verbose logging. With both --enable-debug=full or --enable-debug, X crashes directly at startup. --------------------------------------------------------------------------------------- X.Org X Server 1.10.4 Release Date: 2011-08-19 X Protocol Version 11, Revision 0 Build Operating System: Linux 3.0.3-gentoo x86_64 Gentoo Current Operating System: Linux thinkpad 3.0.3-gentoo #5 SMP PREEMPT Mon Aug 22 15:14:58 CEST 2011 x86_64 Kernel command line: root=/dev/sda3 init=/bin/systemd quiet Build Date: 21 August 2011 10:50:00AM Current version of pixman: 0.22.2 Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. (==) Log file: "/var/log/Xorg.0.log", Time: Mon Aug 22 20:59:48 2011 (==) Using config file: "/etc/X11/xorg.conf" (==) Using system config directory "/usr/share/X11/xorg.conf.d" sna_init_scrn sna_pre_init sna_open_drm_master kgem_init: using vmap=0 kgem_init: has relaxed fencing=1 kgem_init: aperture low=134216704 [127], high=402650112 [383] kgem_init: aperture mappable=268435456 [256] kgem_init: max object size 134216704 kgem_init: max fences=14 sna_crtc_init: attached crtc[0] id=3, pipe=0 sna_crtc_init: attached crtc[1] id=4, pipe=1 sna_screen_init sna_dri_open() sna_dri_open: loading dri driver 'i965' [gen=40] for device '/dev/dri/card0' uploaded 80128 bytes of static state kgem_create_linear(80128) kgem_create_linear: new handle=3 gem_write(handle=3, offset=0, len=80128) sna_accel_init(backend=Broadwater, have_render=1) sna_video_overlay_setup() sna_video_overlay_update_attrs() sna_uevent_init sna_create_screen_resources(1920x1200@24) sna_create_pixmap(1920, 1200, 24, usage=10) kgem_can_create_2d(1920x1200, bpp=32, tiling=-1) = 1 sna_pixmap_force_to_gpu(pixmap=0x7fade5464010) kgem_create_2d(1920x1200, bpp=32, tiling=1, exact=1, inactive=0) searched 0 active, no match new pitch=7680, tiling=1, handle=4, id=1 sna_pixmap_force_to_gpu: created gpu bo sna_pixmap_move_to_gpu() sna_pixmap_move_to_gpu: CPU damage? 0 realize_glyph_caches sna_create_pixmap(1024, 1024, 8, usage=1) sna_pixmap_create_scratch(1024, 1024, 8, tiling=2) kgem_choose_tiling: 1024x1024 -> 2 kgem_can_create_2d(1024x1024, bpp=8, tiling=2) = 1 kgem_create_2d(1024x1024, bpp=8, tiling=2, exact=0, inactive=0) searched 0 active, no match new pitch=1024, tiling=2, handle=5, id=2 sna_create_pixmap(1024, 1024, 32, usage=1) sna_pixmap_create_scratch(1024, 1024, 32, tiling=2) kgem_choose_tiling: 1024x1024 -> 2 kgem_can_create_2d(1024x1024, bpp=32, tiling=2) = 1 kgem_create_2d(1024x1024, bpp=32, tiling=2, exact=0, inactive=0) searched 0 active, no match new pitch=4096, tiling=2, handle=6, id=3 kgem_create_linear(4096) kgem_create_linear: new handle=7 sna_create_pixmap(0, 0, 24, usage=0) kgem_create_for_name(name=1) kgem_create_for_name: new handle=9 sna_blt_copy_boxes src=(0, 0) -> (320, 200) x 1, tiling=(0, 1), pitch=(5120, 7680) sna_blt_copy_boxes: box=(0, 0)x(1280, 800) _sna_damage_add_box(None + [(0, 0), (1280, 800)]) = [[(0, 0), (1280, 800)]: [(0, 0), (1280, 800)] + [0 : ...]] sna_enter_vt sna_crtc_set_mode_major(rotation=1, x=0, y=0, mode=1280x800@71000) sna_crtc_set_mode_major: current fb pixmap = 0, front is 1 sna_mode_remove_fb: deleting fb id 0 for pixmap serial 0 sna_pixmap_force_to_gpu(pixmap=0x7fade5464010) sna_pixmap_move_to_gpu() sna_pixmap_move_to_gpu: CPU damage? 0 sna_crtc_set_mode_major: create fb 1920x1200@24/32 sna_crtc_set_mode_major: handle 4 attached to fb 18 batch[3/0]: 10 10 4096, nreloc=2, nexec=2, nfence=0, aperture=9216000 0x00000000: 0x54f00806: XY_SRC_COPY_BLT (rgb enabled, alpha enabled, src tile 0, dst tile 1) 0x00000004: 0x03cc0780: format 8888, dst pitch 1920, clipping disabled 0x00000008: 0x00c80140: dst (320,200) 0x0000000c: 0x03e80640: dst (1600,1000) 0x00000010: 0x00000000: dst offset 0x00000000 [handle=4, delta=0, read=2, write=2, (fenced? 0, tiling? 1)] 0x00000014: 0x00000000: src (0,0) 0x00000018: 0x00001400: src pitch 5120 0x0000001c: 0x00000000: src offset 0x00000000 [handle=9, delta=0, read=2, write=0 (fenced? 0, tiling? 0)] 0x00000020: 0x05000000: MI_BATCH_BUFFER_END 0x00000024: 0x00000000: MI_NOOP kgem_create_linear(40) kgem_create_linear: new handle=10 gem_write(handle=10, offset=0, len=40) sna_crtc_apply: applying crtc [4] mode=1280x800@71000, fb=18 update to 1 outputs sna_crtc_dpms(pipe 0, dpms mode -> 3):= active=0 sna_crtc_resize (1920, 1200) -> (1280, 800) Backtrace: 0: /usr/bin/X (xorg_backtrace+0x28) [0x45fcb8] 1: /usr/bin/X (0x400000+0x64299) [0x464299] 2: /lib64/libpthread.so.0 (0x7fade9fa9000+0xf430) [0x7fade9fb8430] 3: /usr/lib64/xorg/modules/libfb.so (_fbGetWindowPixmap+0x2f) [0x7fade683945f] 4: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7fade7068000+0x59030) [0x7fade70c1030] 5: /usr/bin/X (0x400000+0x92c60) [0x492c60] 6: /usr/bin/X (xf86RandR12CreateScreenResources+0x1f7) [0x493987] 7: /usr/bin/X (0x400000+0x88db0) [0x488db0] 8: /usr/bin/X (0x400000+0x24994) [0x424994] 9: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7fade8efbf0d] 10: /usr/bin/X (0x400000+0x24699) [0x424699] Segmentation fault at address 0x20 Fatal server error: Caught signal 11 (Segmentation fault). Server aborting Please consult the The X.Org Foundation support at http://wiki.x.org for help. Please also check the log file at "/var/log/Xorg.0.log" for additional information. sna_leave_vt xinit: giving up xinit: unable to connect to X server: Connection refused xinit: unexpected signal 2 --------------------------------------------------------------------------------------- addr2line -e /usr/lib64/xorg/modules/drivers/intel_drv.so 0x59030 /var/tmp/portage/x11-drivers/xf86-video-intel-2.16.0/work/xf86-video-intel-2.16.0/src/sna/sna_display.c:1606
I forget which one exactly causes this X crash, but all of these are required (in some form or other): http://cgit.freedesktop.org/~ickle/xserver/commit/?id=52e121737dc848337c0a6b360599356baffc3dd0 http://cgit.freedesktop.org/~ickle/xserver/commit/?id=8b36e3c9c8d00b5e3c292276171a509e6d1bdc1a http://cgit.freedesktop.org/~ickle/xserver/commit/?id=787835b6a99f816288721bebdcad9f2ce7f3079c http://cgit.freedesktop.org/~ickle/xserver/commit/?id=1c9d9b70c91480a4a9e5e4099d7d40a584ea1277
Created attachment 53178 [details] i915_error_state output I've been unable to use SNA on the same hardware for some months now, since it's been in heavy development I've been waiting to see if it stablises, but given this bug's here I've attached the error_state for my GPU lockup, hopefully it will help track down the problem.
More garbage in the batch buffer.
I believe these are all related to the underlying bug: commit c501ae7f332cdaf42e31af30b72b4b66cbbb1604 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Dec 14 13:57:23 2011 +0100 drm/i915: Only clear the GPU domains upon a successful finish By clearing the GPU read domains before waiting upon the buffer, we run the risk of the wait being interrupted and the domains prematurely cleared. The next time we attempt to wait upon the buffer (after userspace handles the signal), we believe that the buffer is idle and so skip the wait. There are a number of bugs across all generations which show signs of an overly haste reuse of active buffers. Such as: https://bugs.freedesktop.org/show_bug.cgi?id=29046 https://bugs.freedesktop.org/show_bug.cgi?id=35863 https://bugs.freedesktop.org/show_bug.cgi?id=38952 https://bugs.freedesktop.org/show_bug.cgi?id=40282 https://bugs.freedesktop.org/show_bug.cgi?id=41098 https://bugs.freedesktop.org/show_bug.cgi?id=41102 https://bugs.freedesktop.org/show_bug.cgi?id=41284 https://bugs.freedesktop.org/show_bug.cgi?id=42141 A couple of those pre-date i915_gem_object_finish_gpu(), so may be unrelated (such as a wild write from a userspace command buffer), but this does look like a convincing cause for most of those bugs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
to mark dup
*** This bug has been marked as a duplicate of bug 29046 ***
Closing resolved+duplicate as duplicate of closed+fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.