Summary: | [965GM] hang during firefox-talos-svg | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Sitsofe Wheeler <sitsofe> | ||||||||||||
Component: | DRM/Intel | Assignee: | Chris Wilson <chris> | ||||||||||||
Status: | CLOSED WORKSFORME | QA Contact: | |||||||||||||
Severity: | normal | ||||||||||||||
Priority: | medium | CC: | eugeni | ||||||||||||
Version: | XOrg git | ||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||
OS: | Linux (All) | ||||||||||||||
Whiteboard: | |||||||||||||||
i915 platform: | i915 features: | ||||||||||||||
Attachments: |
|
Created attachment 38542 [details]
Output of /sys/kernel/debug/dri/0/i915_error_state
Version information: mesa-dri-drivers-7.8.1-8.fc13.x86_64 mesa-libGL-7.8.1-8.fc13.x86_64 Finally my i965 has a working mesa, previously it just segfaulted when trying to run cairo-gl. Don't see any hangs so far... With 0e49892ed0bd2c34c1c3461d38a409ae18bfdc64 from drm-intel-staging I still get a hang and the following is printed: [ 374.318024] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [ 374.318793] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 8052 at 8029) [ 374.415003] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [...] [ 449.829780] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 10689 at 8029) [ 449.829796] [drm:i915_gem_do_execbuffer] *ERROR* Failed to pin buffer 239 of 251, total 105504768 bytes, 0 fences: -5 [ 449.829799] [drm:i915_gem_do_execbuffer] *ERROR* 5520 objects [241 pinned], 264699904 object bytes [106901504 pinned], 179945472/260308992 gtt bytes [ 449.844200] ------------[ cut here ]------------ [ 449.845005] kernel BUG at drivers/gpu/drm/i915/i915_gem.c:2575! [ 449.845005] invalid opcode: 0000 [#1] [ 449.845005] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sr0/uevent [ 449.845005] CPU 0 [ 449.845005] Pid: 2755, comm: lt-cairo-perf-t Not tainted 2.6.36-rc5+ #4 DG965WH/ [ 449.845005] RIP: 0010:[<ffffffff811db1ba>] [<ffffffff811db1ba>] i915_gem_object_bind_to_gtt+0x298/0x2f0 [ 449.845005] RSP: 0018:ffff8801247e1bd8 EFLAGS: 00010202 [ 449.845005] RAX: ffff8800c7e30e90 RBX: ffff8800ca16e200 RCX: ffff880000010000 [ 449.845005] RDX: ffff8800ca16e290 RSI: ffff880128c77eb8 RDI: ffff880128c77e00 [ 449.845005] RBP: ffff8801247e1c18 R08: 000000000000077f R09: 0000000000000246 [ 449.845005] R10: 0000000000000280 R11: 00000000078b8000 R12: 0000000000000000 [ 449.845005] R13: ffff88012be8e800 R14: ffff88012be23350 R15: 0000000000001000 [ 449.845005] FS: 00007fb15d0cf840(0000) GS:ffffffff8161b000(0000) knlGS:0000000000000000 [ 449.845005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 449.845005] CR2: 00007fb159e05000 CR3: 00000001247f8000 CR4: 00000000000006b0 [ 449.845005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 449.845005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 449.845005] Process lt-cairo-perf-t (pid: 2755, threadinfo ffff8801247e0000, task ffff88012b3405b0) [ 449.845005] Stack: [ 449.845005] ffff88012be232c0 00001200247e1de8 ffff880037816c00 ffff8800ca16e200 [ 449.845005] <0> ffff88012be8e800 0000000000000000 ffff8800ca16e200 000000000000003e [ 449.845005] <0> ffff8801247e1c48 ffffffff811dc2dd ffffc90019587460 0000000000000000 [ 449.845005] Call Trace: [ 449.845005] [<ffffffff811dc2dd>] i915_gem_object_pin+0xe5/0x170 [ 449.845005] [<ffffffff811dec4c>] i915_gem_do_execbuffer+0x518/0xe4f [ 449.845005] [<ffffffff81090f5d>] ? __vmalloc_node+0x82/0x92 [ 449.845005] [<ffffffff811de732>] ? drm_malloc_ab+0x4e/0x50 [ 449.845005] [<ffffffff811df651>] i915_gem_execbuffer2+0xce/0x12c [ 449.845005] [<ffffffff811bf736>] drm_ioctl+0x273/0x339 [ 449.845005] [<ffffffff811df583>] ? i915_gem_execbuffer2+0x0/0x12c [ 449.845005] [<ffffffff8111f801>] ? file_has_perm+0x95/0xa3 [ 449.845005] [<ffffffff810a79ef>] do_vfs_ioctl+0x4a1/0x4f0 [ 449.845005] [<ffffffff812dcae7>] ? schedule+0x193/0x37e [ 449.845005] [<ffffffff810a7a94>] sys_ioctl+0x56/0x7a [ 449.845005] [<ffffffff81001e98>] system_call_fastpath+0x16/0x1b [ 449.845005] Code: 00 00 00 49 89 96 58 13 00 00 49 81 c6 50 13 00 00 48 89 83 98 00 00 00 4c 89 b3 90 00 00 00 48 89 10 f7 43 6c be ff ff ff 74 04 <0f> 0b eb fe f7 43 70 be ff ff ff 74 04 0f 0b eb fe 44 8b b3 d8 [ 449.845005] RIP [<ffffffff811db1ba>] i915_gem_object_bind_to_gtt+0x298/0x2f0 [ 449.845005] RSP <ffff8801247e1bd8> [ 449.926679] ---[ end trace 75509941ef16036b ]--- I think X hangs in the D state when this happens... (The assertion being hit on i915_gem.c:2575 is BUG_ON(obj->read_domains & I915_GEM_GPU_DOMAINS); ) Sigh - tested the wrong kernel. I should have used -next rather than -staging and the symptoms should really have been reported back in bug #30083... Sitsofe, thanks for testing again... I don't think I'm going to apply the domain fixup to -fixes since we so late in the cycle and this is an error path that we should not hit in the first place. Any chance you can try a more recent version of mesa? I've just tried a 32 bit Ubuntu 10.10 live cd which had the following packages: libgl1-mesa-dri 7.9~git20100924-0ubuntu2 libgl1-mesa-glx 7.9~git20100924-0ubuntu2 xserver-xorg 1:7.5+6ubuntu3 xserver-xorg-video-intel 2:2.12.0-1ubuntu5 linux-image-2.6.35-22-generic 2.6.35-22.33 and it had the same issue as described in comment #1. Still here in the latest Fedora 15 pre-pre-release. At first everything is frozen and then after a few seconds the screen goes black and stays black no matter what (the system still appears to be alive though). 00:02.0 VGA compatible controller [0300]: Intel Corporation 82G965 Integrated Graphics Controller [8086:29a2] (rev 02) Kernel: 2.6.38-0.rc6.git6.1.fc15.x86_64 xorg-x11-drv-intel-2.14.0-2.fc15.x86_64 xorg-x11-server-Xorg-1.9.99.1-6.20101201.fc15.x86_64 dmesg snippet to be attached. Created attachment 44370 [details]
dmesg output
Still here with rawhide on this t61 (965gm)... Rawhide with drm-intel-next, mesa.git and cairo.git (+ deps). WORKSFORME. :| Just recently I fixed mesa to handle the URB_FENCE erratum on 965GM. If we cross our fingers, that might just be the issue. And we also found a bug where HWS could only handle being allocated in the low 4GiB, and there is are further unfixed errata regarding handling of pages >4GiB by the i965. Do you have more than 4GiB and, if so, does mem=3g help? Can you try rawhide (or at least mesa.git and drm-intel-next) and grab a few i915_error_states if the hang persists? Thanks. > Do you have more than 4GiB No - this machine has 4GiB exactly. The previous results were from a LiveCD on http://download.fedora.redhat.com/pub/alt/nightly-composes/desktop/ which was probably rawhide (see http://www.mail-archive.com/fedora-devel-list@redhat.com/msg10108.html ). Unfortunately, due to the nature of the system in question I can compile and test later kernels but I can't switch anything in userland unless I boot temporarily to a LiveCD. I have tried a later kernel from drm-intel-next with the old Fedora 13 userland: Kernel: 2.6.38-rc7+ (47ae63e0c2e5fdb582d471dc906eb29be94c732f drm-intel-next) xorg-x11-drv-intel-2.11.0-5.fc13.x86_64 mesa-dri-drivers-7.8.2-1.fc13.x86_64 Output to be attached. Created attachment 44483 [details]
i915_error_state 2.6.38-rc7+
Created attachment 44485 [details]
dmesg 2.6.38-rc7+
Hmm, still nothing obvious. We need to find a way of getting mesa.git onto your test machine so we can experiment. A very quick heads up: Testing with an Ubuntu Natty LiveCD with all the pieces updated to xorg-edgers and after pulling in goodness knows how many packages AND updating my cairo directory to the latest git version it ran the test fine. Unfortunately I don't know which piece fixed things. Versions: xserver-xorg-core 2:1.10.0+git20110307+server-1.10-branch.35503964-0ubuntu0sarvatt libgl1-mesa-dri 7.11.0+git20110316.b47d5084-0ubuntu0sarvatt libgl1-mesa-glx 7.10.1-0ubuntu2 linux-image-2.6.38-6-generic 2.6.38-6.34 libdrm-intel1 2.4.24+git20110307.3b04c736-0ubuntu0sarvatt xserver-xorg-video-intel 2:2.14.901+git20110316.ee740778-0ubuntu0sarvatt It looks like a newer versions of cairo can't trigger this problem on setups that have problems with the older cairo. I tried cairo.git on the previously mentioned Fedora 15 LiveCD and it caused a segfault in one of the i915 libraries (but X was seemingly unharmed). However, an older cairo caused the kernel errors and a lock-up. Newer cairo has also become much more difficult to test on older systems due to its dependency on a new pixman. I should also note that stock Ubuntu Natty alphas do not show the problem with a cairo.git but gcc refuses to compile my older cairo on Ubuntu Natty. Hello I've just got the same bug: Jul 11 01:07:36 valkyrie kernel: [ 5646.216051] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Jul 11 01:07:36 valkyrie kernel: [ 5646.216104] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 227670 at 227669, next 227671) Jul 11 01:07:36 valkyrie kernel: [ 5646.216205] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000 Jul 11 01:07:36 valkyrie kernel: [ 5646.216912] [drm] Changing LVDS panel from (+hsync, +vsync) to (+hsync, -vsync) Jul 11 01:07:36 valkyrie kernel: [ 5646.217072] ------------[ cut here ]------------ Jul 11 01:07:36 valkyrie kernel: [ 5646.217089] WARNING: at drivers/gpu/drm/i915/intel_display.c:1079 intel_enable_pipe+0x111/0x160() Jul 11 01:07:36 valkyrie kernel: [ 5646.217095] Hardware name: Vostro1310 Jul 11 01:07:36 valkyrie kernel: [ 5646.217101] PLL state assertion failure (expected on, current off) Jul 11 01:07:36 valkyrie kernel: [ 5646.217105] Modules linked in: hidp bnep rfcomm sdhci_pci iwl3945 sdhci btusb bluetooth firewire_ohci iwl_legacy dell_wmi wmi Jul 11 01:07:36 valkyrie kernel: [ 5646.217136] Pid: 735, comm: kworker/u:2 Not tainted 2.6.39-gentoo-r2 #1 Jul 11 01:07:36 valkyrie kernel: [ 5646.217142] Call Trace: Jul 11 01:07:36 valkyrie kernel: [ 5646.217155] [<c1035ab8>] ? warn_slowpath_common+0x78/0xb0 Jul 11 01:07:36 valkyrie kernel: [ 5646.217165] [<c12c18e1>] ? intel_enable_pipe+0x111/0x160 Jul 11 01:07:36 valkyrie kernel: [ 5646.217174] [<c12c18e1>] ? intel_enable_pipe+0x111/0x160 Jul 11 01:07:36 valkyrie kernel: [ 5646.217182] [<c1035b83>] ? warn_slowpath_fmt+0x33/0x40 Jul 11 01:07:36 valkyrie kernel: [ 5646.217191] [<c12c18e1>] ? intel_enable_pipe+0x111/0x160 Jul 11 01:07:36 valkyrie kernel: [ 5646.217200] [<c12c6ad0>] ? intel_crtc_mode_set+0xab0/0x1b60 Jul 11 01:07:36 valkyrie kernel: [ 5646.217211] [<c129d097>] ? drm_vblank_off+0x57/0x1b0 Jul 11 01:07:36 valkyrie kernel: [ 5646.217219] [<c12bfe77>] ? i8xx_enable_fbc+0x167/0x1c0 Jul 11 01:07:36 valkyrie kernel: [ 5646.217227] [<c12bea4a>] ? i965_update_wm+0x19a/0x220 Jul 11 01:07:36 valkyrie kernel: [ 5646.217236] [<c12c2045>] ? i9xx_crtc_disable+0xc5/0x130 Jul 11 01:07:36 valkyrie kernel: [ 5646.217245] [<c129565b>] ? drm_crtc_helper_set_mode+0x26b/0x3a0 Jul 11 01:07:36 valkyrie kernel: [ 5646.217258] [<c1295814>] ? drm_helper_resume_force_mode+0x84/0x270 Jul 11 01:07:36 valkyrie kernel: [ 5646.217268] [<c12ac335>] ? i915_reset+0x245/0x420 Jul 11 01:07:36 valkyrie kernel: [ 5646.217276] [<c12b0447>] ? i915_error_work_func+0xc7/0x120 Jul 11 01:07:36 valkyrie kernel: [ 5646.217286] [<c104a2b5>] ? process_one_work+0x105/0x350 Jul 11 01:07:36 valkyrie kernel: [ 5646.217293] [<c12b0380>] ? i915_get_vblank_counter+0xb0/0xb0 Jul 11 01:07:36 valkyrie kernel: [ 5646.217302] [<c104a8b1>] ? worker_thread+0x121/0x3c0 Jul 11 01:07:36 valkyrie kernel: [ 5646.217310] [<c104a790>] ? rescuer_thread+0x290/0x290 Jul 11 01:07:36 valkyrie kernel: [ 5646.217317] [<c104de74>] ? kthread+0x74/0x80 Jul 11 01:07:36 valkyrie kernel: [ 5646.217325] [<c104de00>] ? kthread_worker_fn+0x170/0x170 Jul 11 01:07:36 valkyrie kernel: [ 5646.217335] [<c15a8436>] ? kernel_thread_helper+0x6/0xd Jul 11 01:07:36 valkyrie kernel: [ 5646.217341] ---[ end trace b94f1897cc05f11d ]--- Jul 11 01:07:36 valkyrie kernel: [ 5646.321139] ------------[ cut here ]------------ Before this bug occured I've only surfed web with one of webkit based browsers. No suspend-resume, no 3D graphics and desktop effects are off. I'm using Gentoo with: Vostro 1310 laptop 00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c) 00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c) linux-2.6.39.2 xorg-server-1.10.2 mesa-7.10.2, classic mode xf86-video-intel-2.15.0-r1 (yes, gentoo patched) libdrm-2.4.26 dri2proto-2.6 Reproduceablity: Yes, this bug happens *sometimes*. It may pass a week before appear or appear in several hours. Sorry, I'll attach i915_error_state later. Anything else to add? I can build debug kernel and do some debugging over ssh, what should I be looking for? Thanks, Andrey Andrey, it is highly unlikely you have the same bug, so please when you come to attach your error-state can you do so in new bug report. And can you include the full dmesg along with Xorg.log? Thanks. Sitosfe, I hope your still having fun. :) As we made no progress on identifying the crash scenario and it appears to have resolved itself by cairo doing things differently, I can only hope that the bug has been magically fixed with all the mesa/i965 work... |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 38541 [details] dmesg output Description of the problem: When running cairo-perf-trace on the firefox-talos-svg benchmark X starts becoming slow before locking up completely on a desktop with a 965G. Lots of [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung messages appear in dmesg. The machine has to be rebooted to get a working X back - restarting X just results in a black screen with a cursor on top. Steps to reproduce: 1. Run CAIRO_TEST_TARGET=gl ./cairo-perf-trace -r -v -i3 benchmark/firefox-talos-svg Expected result: Benchmark to run to completion, X to be responsive throughout. Actual result: X becomes increasingly slower before becoming frozen with an unchanging picture on the screen. Messages like: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 1820 at 1797) appear frequently in dmesg after the test starts. How reproducible is the problem: It is reproducible every time. Version information: Fedora 13 x86_64 00:02.0 VGA compatible controller: Intel Corporation 82G965 Integrated Graphics Controller (rev 02) xorg-x11-server-Xorg-1.8.2-3.fc13.x86_64 xorg-x11-drv-intel-2.11.0-5.fc13.x86_64 Kernel: 2.6.36-rc3+8554048070906579ec9fa19ac381deddd2d7b155 from drm-intel-fixes (Hangchecks are also triggered with Fedora 13's 2.6.34.6-47.fc13.x86_64)