Bug 23116 - [i915] Occasional X freezes / GPU lockups
Summary: [i915] Occasional X freezes / GPU lockups
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: x86 (IA32) Linux (All)
: medium critical
Assignee: Chris Wilson
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-03 10:57 UTC by Robert Huitl
Modified: 2009-10-27 04:56 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Output of intel_gpu_dump (149.55 KB, application/x-gzip)
2009-08-03 10:57 UTC, Robert Huitl
no flags Details
an archive of logs and GPU dumps collected from a few lockups (964.63 KB, application/octet-stream)
2009-10-01 08:04 UTC, Michal Suchanek
no flags Details

Description Robert Huitl 2009-08-03 10:57:50 UTC
Created attachment 28313 [details]
Output of intel_gpu_dump

I get X/GPU lockups, about once every couple of days. Most of the time I can still move
the mouse, sometimes it freezes. I didn't find a way to trigger the problem, it doesn't seem to be related to a particular application. When X locks up, I can still SSH into the machine, but only a reboot fixes the graphics.

My configuration:
- GFX hardware: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics
Controller (rev 03) (Thinkpad X41)
- Gentoo, 32 bit kernel + userland
- xorg-server-1.6.2-r1
- mesa-7.5-r1
- xf86-video-intel-2.7.99.902-r1
- Kernel 2.6.30.2, KMS enabled, additional patches applied:
   i915: Save/restore cursor state on suspend/resume.
   i915: add ignore lvds quirk info for AOpen Mini PC
   i915: apply G45 vblank count code to all G4x chips and fix max_frame_count
   i915: avoid non-atomic sysrq execution
   i915: Skip lvds with Aopen i945GTt-VFA
   i915: Hook connector to encoder during load detection (fixes tv/vga detect)
   i915: initialize fence registers to zero when loading GEM
   i915: Set SSC frequency for 8xx chips correctly

There are no suspicious messages in dmesg, syslog, Xorg.0.log or
~/.xsession-errors when the freeze occurs. Xorg backtrace looks like the one in
 bug 21249:

#0  0xffffe424 in __kernel_vsyscall ()
#1  0xb7ac0719 in ioctl () from /lib/libc.so.6
#2  0xb797fb68 in drm_intel_gem_bo_map_gtt () from /usr/lib/libdrm_intel.so.1
#3  0xb7910f31 in ?? () from /usr/lib/xorg/modules/drivers//intel_drv.so
#4  0x083c5858 in ?? ()
#5  0x00000000 in ?? ()

I attached the output of intel_gpu_dump.
Comment 1 Robert Huitl 2009-09-09 06:29:57 UTC
I updated my system a couple of days ago, the problem persists. Current configuration:

- xorg-server-1.6.3.901-r1
- mesa-7.5.1
- xf86-video-intel-2.8.1
- libdrm-2.4.11
- Kernel 2.6.31-rc9, KMS enabled, additional patches applied:
   drm/i915: increase default latency constant (v2 w/comment)
   drm/i915: Unref old_obj on get_fence_reg() error path

Backtrace as above:
#0  0xffffe424 in __kernel_vsyscall ()
#1  0xb7d1a719 in ioctl () from /lib/libc.so.6
#2  0xb7b14b68 in drm_intel_gem_bo_map_gtt () from /usr/lib/libdrm_intel.so.1
#3  0xb7aa4f91 in ?? () from /usr/lib/xorg/modules/drivers//intel_drv.so
#4  0x0a4bb630 in ?? ()
#5  0x00000000 in ?? ()

I couldn't get the intel_gpu_dump output this time, the machine locked up when I tried to get the dump.

Please tell me what I can do to track down this problem and if you need more information.
Comment 2 Michal Suchanek 2009-10-01 08:04:27 UTC
Created attachment 29979 [details]
an archive of logs and GPU dumps collected from a few lockups

I see occasional lockups on mac mini [945GME] with intel driver 2.8 and X server 1.6.

I am not sure if KMS is enabled, I use 2.6.30 kernel but did not do anything special to enable/disable it.
Comment 3 Robert Huitl 2009-10-12 06:32:36 UTC
Some more information about that bug. Today I restarted the X server as it always fills up swap memory after a couple of days (its VIRT size easily grows above 1 GB, most of that memory shows up as being swapped out in 'top'). This makes the system very unresponsive and I thought I might be able to avoid the GPU lockup by restarting X, as I got the impression that memory pressure is a key component to trigger it.

But after only two hours of light usage the GPU locked up on me again, even though the X server was "fresh" and did not consume excess amounts of memory.

I'd also like to mention that I'm regularly using suspend to RAM and xrandr with the VGA output. I can't say for sure if I had lockups without having used xrandr, all I can say is that all the recent lockups occurred with two outputs (the built-in display and a VGA screen attached).

System configuration:
- xorg-server-1.6.3.901-r1
- mesa-7.5.1
- xf86-video-intel-2.8.1
- libdrm-2.4.11
- Kernel 2.6.31, KMS enabled, no additional patches applied
Comment 4 Michal Suchanek 2009-10-13 09:08:02 UTC
For me lockups happen with 

X server 1.6.4
Mesa 7.0.3
libdrm 2.4.14
video-intel 2.9.0
linux 2.6.30, no KMS (broken on this kernel and hardware)

plus some earlier versions.
Comment 5 Daniel Kahn Gillmor 2009-10-23 13:22:16 UTC
I'm also seeing this behavior on an eeePC 900 running debian testing with the following versions:

0 dkg@pip:~$ uname -a
Linux pip 2.6.30-2-686 #1 SMP Sat Sep 26 01:16:22 UTC 2009 i686 GNU/Linux
0 dkg@pip:~$ dpkg -l xserver-xorg-video-intel xserver-xorg-core libgl1-mesa-dri libgl1-mesa-glx libglu1-mesa libdrm2 libdrm-intel1
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
||/ Name           Version        Description
+++-==============-==============-============================================
ii  libdrm-intel1  2.4.14-1+b1    Userspace interface to intel-specific kernel
ii  libdrm2        2.4.14-1+b1    Userspace interface to kernel DRM services -
ii  libgl1-mesa-dr 7.6-1          A free implementation of the OpenGL API -- D
ii  libgl1-mesa-gl 7.6-1          A free implementation of the OpenGL API -- G
ii  libglu1-mesa   7.6-1          The OpenGL utility library (GLU)
ii  xserver-xorg-c 2:1.6.4-2      Xorg X server - core server
ii  xserver-xorg-v 2:2.9.0-1      X.Org X server -- Intel i8xx, i9xx display d
0 dkg@pip:~$ lspci -s 00:02
00:02.0 VGA compatible controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 04)
00:02.1 Display controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 04)
0 dkg@pip:~$ 


here are the kern.log messages related to i915 since the reboot before last (that run hung):

Oct 22 17:19:37 pip kernel: [   90.809022] [drm:i915_gem_detect_bit_6_swizzle] *ERROR* Couldn't read from MCHBAR.  Disabling tiling.
Oct 22 17:19:37 pip kernel: [   90.809058] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
Oct 22 17:19:47 pip kernel: [  100.243856] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
Oct 22 17:37:13 pip kernel: [ 1146.669106] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
Oct 22 19:44:00 pip kernel: [ 8753.300564] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
Oct 22 21:46:10 pip kernel: [14378.165746] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
Oct 23 02:44:54 pip kernel: [30531.722601] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
Oct 23 11:06:04 pip kernel: [31446.830591] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
Oct 23 15:36:34 pip kernel: [47676.624077] [drm:i915_gem_idle] *ERROR* hardware wedged
Oct 23 15:38:40 pip kernel: [   84.678737] [drm:i915_gem_detect_bit_6_swizzle] *ERROR* Couldn't read from MCHBAR.  Disabling tiling.
Oct 23 15:38:40 pip kernel: [   84.678772] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
Oct 23 15:38:50 pip kernel: [   94.428043] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0


Similarly to Robert Huitl (comment 3), i also regularly suspend-to-RAM, and attach/detach several different external monitors regularly via the VGA output with xrandr.

If i can help debug this somehow, please let me know.  it's pretty frustrating.
Comment 6 Chris Wilson 2009-10-26 14:29:13 UTC
0x7d000006: 3DSTATE_MAP_STATE
0x022a7ecc:      0x00000003:    mask
0x022a7ed0:      0x05380000:    map 0 MS2
0x022a7ed4:      0x00000194:    map 0 MS3
0x022a7ed8:      0x0fe00000:    map 0 MS4
0x022a7edc:      0x057d4000:    map 1 MS2
0x022a7ee0:      0x000de584:    map 1 MS3
0x022a7ee4:      0x6fe00000:    map 1 MS4

This is suspicious as it implies that the texture does not comply with the size restrictions imposed by the hardware. I think this bug should be fixed with 

commit 465a4ab416b2e5ad53b96702720331a44fffa2fe
Author: Eric Anholt <eric@anholt.net>
Date:   Wed Aug 12 19:29:31 2009 -0700

    Align the height of untiled pixmaps to 2 lines as well.
    
    The 965 docs note, and it's probably the case on 915 as well, that the
    2x2 subspans are read as a unit, even if the bottom row isn't used.  If
    the address in that bottom row extended beyond the end of the GTT, a
    fault could occur.
    
    Thanks to Chris Wilson for pointing out the problem.

Comment 7 Daniel Kahn Gillmor 2009-10-26 14:53:57 UTC
My lockups are coming on a system (as documented) running xserver-xorg-video-intel 2:2.9.0-1 (debian package version number), which appears to contain the changeset you referenced (465a4ab416b2e5ad53b96702720331a44fffa2fe), according to /usr/share/doc/xserver-xorg-video-intel/changelog.gz

So if that's the fix, then maybe i'm seeing a different bug?  Should i open a new report?  I'm installing intel-gpu-tools now so that i might be able to get a gpu dump next lockup.
Comment 8 Chris Wilson 2009-10-26 14:59:56 UTC
(In reply to comment #7)
> So if that's the fix, then maybe i'm seeing a different bug?  Should i open a
> new report?  I'm installing intel-gpu-tools now so that i might be able to get
> a gpu dump next lockup.

I'm pretty confident that the original bug is due to the invalid texture size. So please file a new bug as soon as you grab a gpu dump, thanks.

Comment 9 Michal Suchanek 2009-10-27 04:56:44 UTC
dumps from 2.9.0 driver attached in bug 24753


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.