I'm running Fedora Rawhide - I'm also using git builds for drm & xf86-intel driver and vanilla kernel. With all combinations I'm always getting deadlock of GPU while doing pretty simple operation like resizing glxgears windows.
I start glxgears - drag the bottom right corner and start to resize the window - I'm using Metacity Window manager with opaque resize. After few sec of resizing it deadlocks my T61 - 4GB, GMA965. and GPU has to be reset via setpci trick.
Sometimes early after Xorg start I see this message in log:
[drm:i915_gem_object_bind_to_gtt] *ERROR* GTT full, but LRU list empty
[drm:i915_gem_object_pin] *ERROR* Failure to bind: -12
[drm:i915_gem_evict_something] *ERROR* inactive empty 1 request empty 1 flushing empty 1
Looks like some counter are broken ?
When I run vanilla kernel - I could sometimes find this message log:
kernel BUG at drivers/gpu/drm/i915/i915_gem.c:1655!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
Modules linked in: usbhid hid nls_iso8859_1 nls_cp1250 vfat fat mmc_block fuse ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc rfcomm sco l2cap autofs4 sunrpc ipv6 binfmt_misc dm_mirror dm_region_hash dm_log dm_mod kvm_intel kvm i915 drm i2c_algo_bit uinput arc4 ecb cryptomgr aead snd_hda_codec_analog crypto_blkcipher crypto_hash snd_hda_intel snd_hda_codec snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss crypto_algapi sdhci_pci snd_mixer_oss i2c_i801 sr_mod rtc_cmos btusb bluetooth sdhci iwl3945 snd_pcm mac80211 lib80211 cdrom rtc_core i2c_core mmc_core rtc_lib cfg80211 thinkpad_acpi rfkill backlight evdev led_class snd_timer psmouse serio_raw iTCO_wdt iTCO_vendor_support nvram e1000e intel_agp snd soundcore snd_page_alloc battery ac button uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: microcode]
Pid: 11755, comm: glxgears Not tainted 2.6.29 #49 6464CTO
RIP: 0010:[<ffffffffa02e5b49>] [<ffffffffa02e5b49>] i915_gem_object_get_fence_reg+0x769/0x800 [i915]
RSP: 0018:ffff880130a67c98 EFLAGS: 00010202
RAX: ffff8801310ae840 RBX: ffff8801310ae9c0 RCX: 000000000000001e
RDX: ffff8800a78c1000 RSI: 0000000000000086 RDI: ffffffff8079d310
RBP: ffff880130a67cd8 R08: 000000000001564f R09: 0000000000000001
R10: 000000007fffffff R11: 0000000000000000 R12: ffff88013ac97000
R13: 0000000000000001 R14: ffff88013ac971b0 R15: ffff880139f57000
FS: 00007f15d8d346f0(0000) GS:ffff88013b803f80(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000001e15008 CR3: 00000001308f5000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process glxgears (pid: 11755, threadinfo ffff880130a66000, task ffff8801314f0000)
ffff88012fea8480 ffff88012fea8300 ffffffff80703868 0000000000000000
ffff88012fea8480 ffff88012fea8300 ffff880139f57000 ffff8801312f5860
ffff880130a67d08 ffffffffa02e6c6a ffff8801310aea80 ffff88012fea8300
[<ffffffffa02e6c6a>] i915_gem_object_pin+0xda/0x180 [i915]
[<ffffffffa02e7069>] i915_gem_execbuffer+0x209/0xe60 [i915]
[<ffffffffa02ae661>] drm_ioctl+0x101/0x330 [drm]
[<ffffffffa02e6e60>] ? i915_gem_execbuffer+0x0/0xe60 [i915]
[<ffffffff80529019>] ? trace_hardirqs_off_thunk+0x3a/0x6c
Code: 00 00 74 7c 3d d2 29 00 00 90 0f 84 98 00 00 00 8b 42 5c 83 ca ff c1 e8 09 0f bc c0 0f 44 c2 41 81 c8 00 10 00 00 e9 56 fe ff ff <0f> 0b eb fe 48 c7 c7 dd 97 2f a0 31 c0 e8 79 0c 24 e0 49 8b 06
RIP [<ffffffffa02e5b49>] i915_gem_object_get_fence_reg+0x769/0x800 [i915]
I assume this problem is possibly fixed within linux-next which is used in for rawhide kernels? (i.e. I do not get this error with rawhide kernel)
Another weird thing which could be possibly related is that:
/proc/dri/0/gem_objects gives me different number of objects compared with the amount of drm objects visible by filecache module (external kernel patch, that allows to see all objects located inside tmpfs) difference is in range of couple objects usually - i.e. 4780 != 4788 - but it is a difference - it looks like drm kernel module doesn't properly acount its objects?
This (similar) thing happened to me too.
I start glxgears - I use awesome WM - it starts fullscreen, I resize it with the mouse. X freezes almost imidiatelly. The mouse still moves, but the keyboard is dead, the screen doesn't refresh either.
dmesg and Xorg.0.log don't report anything. I can't switch to a console (ctrl-alt-f1) but network is working.
intel driver 22.214.171.1242 (using UXA)
kernel 126.96.36.199 with KMS enabled
I use ArchLinux and awesome.
Forgot to say, Thinkpad X60s 945GM
Yes - it's same issue - sometimes mouse stays movable - but GPU is definitely dead. SysRQ+K could be used and setpci + vbetool could be used to reinitialize it. I think it's the result of accessing wrong pixmap during the resize - it could be an issue with incorrect Mesa handling of the resizing windows - but for now my bet goes on incorrect handling directly inside gpu kernel driver.
I've not mentioned in my initial comment - I'm using UXA acceleration - but it should be obvious from kernel backtrace.
Adding new comment - it looks like that currently the latest Fedora Rawhide kernel 188.8.131.52-54.fc11 (which is using drm-next intel driver) is lock-free for this test case - so most probably drm-next patches are fixing this GPU deadlock - I hope they get into vanilla kernel soon.
I assume this bugzilla could be closed - I have not seen the problem for quite some time.
Looks like a fence starvation issue that has been fixed for a long time.