Bug 30083

Summary:	GL cairo-perf-trace oops
Product:	DRI	Reporter:	Sitsofe Wheeler <sitsofe>
Component:	DRM/Intel	Assignee:	Chris Wilson <chris>
Status:	CLOSED FIXED	QA Contact:
Severity:	normal
Priority:	medium
Version:	XOrg git
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:

Description Sitsofe Wheeler 2010-09-08 02:39:29 UTC

Description of the problem:
When running cairo-perf-trace on the firefox-talos-svg benchmark X starts
becoming slow before locking up completely on a desktop with a 965G. Lots of
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
messages appear in dmesg and eventually kernel BUG at drivers/gpu/drm/i915/i915_gem.c:2566 (which is "BUG_ON(obj->read_domains & I915_GEM_GPU_DOMAINS)" here) is hit.

Steps to reproduce:
1. Run
CAIRO_TEST_TARGET=gl ./cairo-perf-trace -r -v -i3 benchmark/firefox-talos-svg

Expected result:
Benchmark to run to completion, X to be responsive throughout.

Actual result:
X becomes increasingly slower before becoming frozen with an unchanging picture
on the screen. Eventually the following appears in dmesg:

[  322.322899] ------------[ cut here ]------------
[  322.322929] kernel BUG at drivers/gpu/drm/i915/i915_gem.c:2566!
[  322.322957] invalid opcode: 0000 [#1] SMP 
[  322.322982] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.0/usb5/5-2/devnum
[  322.323002] CPU 1 
[  322.323002] Pid: 3141, comm: lt-cairo-perf-t Not tainted 2.6.36-rc3+ #116 DG965WH/        
[  322.323002] RIP: 0010:[<ffffffff812a990a>]  [<ffffffff812a990a>] i915_gem_object_bind_to_gtt+0x295/0x2ed
[  322.323002] RSP: 0018:ffff8801292e3bc8  EFLAGS: 00010206
[  322.323002] RAX: ffff8800bcb3e290 RBX: ffff880037ab4200 RCX: ffff880000010000
[  322.323002] RDX: ffff880037ab4290 RSI: ffff8801286eccb8 RDI: ffff8801286eccc8
[  322.323002] RBP: ffff8801292e3c08 R08: 000000000000077f R09: 0000000000000246
[  322.323002] R10: 0000000000080000 R11: 000000000a3a7000 R12: 0000000000000000
[  322.323002] R13: ffff88012b086800 R14: ffff88012aad13a0 R15: 0000000000001000
[  322.323002] FS:  00007fbecbbe1840(0000) GS:ffff880001c80000(0000) knlGS:0000000000000000
[  322.323002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  322.323002] CR2: 00007fbec81d5000 CR3: 0000000037912000 CR4: 00000000000006e0
[  322.323002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  322.323002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  322.323002] Process lt-cairo-perf-t (pid: 3141, threadinfo ffff8801292e2000, task ffff88011a6dca60)
[  322.323002] Stack:
[  322.323002]  ffff88012aad1300 00001200292e3dd8 ffff8801292e3c18 ffff880037ab4200
[  322.323002] <0> ffff88012b086800 0000000000000000 ffff880037ab4200 0000000000000087
[  322.323002] <0> ffff8801292e3c38 ffffffff812aa9e9 ffffc9001c303a80 0000000000000000
[  322.323002] Call Trace:
[  322.323002]  [<ffffffff812aa9e9>] i915_gem_object_pin+0xe0/0x171
[  322.323002]  [<ffffffff812ad3d7>] i915_gem_do_execbuffer+0x514/0xe4b
[  322.323002]  [<ffffffff810c164f>] ? __vmalloc_node+0x7d/0x8c
[  322.323002]  [<ffffffff812acec1>] ? drm_malloc_ab+0x49/0x4b
[  322.323002]  [<ffffffff812addd7>] i915_gem_execbuffer2+0xc9/0x128
[  322.323002]  [<ffffffff8128cae5>] drm_ioctl+0x280/0x35f
[  322.323002]  [<ffffffff811c2a75>] ? inode_has_perm+0x75/0x8b
[  322.323002]  [<ffffffff812add0e>] ? i915_gem_execbuffer2+0x0/0x128
[  322.323002]  [<ffffffff811c2b2a>] ? file_has_perm+0x9f/0xc1
[  322.323002]  [<ffffffff810e2fa2>] do_vfs_ioctl+0x4aa/0x4f9
[  322.323002]  [<ffffffff810e3042>] sys_ioctl+0x51/0x77
[  322.323002]  [<ffffffff81001feb>] system_call_fastpath+0x16/0x1b
[  322.323002] Code: 00 00 00 49 89 96 a8 13 00 00 49 81 c6 a0 13 00 00 48 89 83 98 00 00 00 4c 89 b3 90 00 00 00 48 89 10 f7 43 6c be ff ff ff 74 04 <0f> 0b eb fe f7 43 70 be ff ff ff 74 04 0f 0b eb fe 44 8b b3 d8 
[  322.323002] RIP  [<ffffffff812a990a>] i915_gem_object_bind_to_gtt+0x295/0x2ed
[  322.323002]  RSP <ffff8801292e3bc8>
[  322.395033] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung

How reproducible is the problem:
It is reproducible every time.

Version information:
Fedora 13 x86_64
00:02.0 VGA compatible controller: Intel Corporation 82G965 Integrated Graphics
Controller (rev 02)
xorg-x11-server-Xorg-1.8.2-3.fc13.x86_64
xorg-x11-drv-intel-2.11.0-5.fc13.x86_64
mesa-dri-drivers-7.8.1-8.fc13.x86_64
mesa-libGL-7.8.1-8.fc13.x86_64
Kernel: 2.6.36-rc3+8554048070906579ec9fa19ac381deddd2d7b155 from
drm-intel-fixes

Additional information:
The dmesg output of Bug #30082 ( https://bugs.freedesktop.org/attachment.cgi?id=38541 ) shows the messages leading up to this error.

Comment 1 Chris Wilson 2010-09-30 10:15:57 UTC

Fixed in drm-intel-next. After a hang, we need to fixup the read/write domain.

commit 535d8a3c89e4d7bcc1a32625b96ebd179dcf5ba8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Sep 30 15:08:57 2010 +0100

    drm/i915: Force the domain to CPU on unbinding whilst wedged.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30083
    Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Comment 2 Sitsofe Wheeler 2010-10-02 10:56:59 UTC

OK something different happens with -drm-intel-next ( ae681d969ac0946e09636f2bef7a126d73e1ad6b ):

drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 708 at 695, next 709)
[drm:i915_do_wait_request] *ERROR* something (likely vbetool) disabled interrupts, re-enabling
[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 715 at 695, next 716)
[drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[drm:i915_reset] *ERROR* Failed to reset chip.
lt-cairo-perf-t[2690]: segfault at 0 ip 00007ff30caa3128 sp 00007fff821cbb00 error 6 in i965_dri.so[7ff30ca6f000+237000]

Should the GPU be becoming wedged?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.