Summary: | [i915] Occasional X freezes / GPU lockups | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Robert Huitl <freedesktop> | ||||||
Component: | Driver/intel | Assignee: | Chris Wilson <chris> | ||||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||
Severity: | critical | ||||||||
Priority: | medium | CC: | dkg, eric, hramrach, peter.hutterer | ||||||
Version: | git | ||||||||
Hardware: | x86 (IA32) | ||||||||
OS: | Linux (All) | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
Robert Huitl
2009-08-03 10:57:50 UTC
I updated my system a couple of days ago, the problem persists. Current configuration: - xorg-server-1.6.3.901-r1 - mesa-7.5.1 - xf86-video-intel-2.8.1 - libdrm-2.4.11 - Kernel 2.6.31-rc9, KMS enabled, additional patches applied: drm/i915: increase default latency constant (v2 w/comment) drm/i915: Unref old_obj on get_fence_reg() error path Backtrace as above: #0 0xffffe424 in __kernel_vsyscall () #1 0xb7d1a719 in ioctl () from /lib/libc.so.6 #2 0xb7b14b68 in drm_intel_gem_bo_map_gtt () from /usr/lib/libdrm_intel.so.1 #3 0xb7aa4f91 in ?? () from /usr/lib/xorg/modules/drivers//intel_drv.so #4 0x0a4bb630 in ?? () #5 0x00000000 in ?? () I couldn't get the intel_gpu_dump output this time, the machine locked up when I tried to get the dump. Please tell me what I can do to track down this problem and if you need more information. Created attachment 29979 [details]
an archive of logs and GPU dumps collected from a few lockups
I see occasional lockups on mac mini [945GME] with intel driver 2.8 and X server 1.6.
I am not sure if KMS is enabled, I use 2.6.30 kernel but did not do anything special to enable/disable it.
Some more information about that bug. Today I restarted the X server as it always fills up swap memory after a couple of days (its VIRT size easily grows above 1 GB, most of that memory shows up as being swapped out in 'top'). This makes the system very unresponsive and I thought I might be able to avoid the GPU lockup by restarting X, as I got the impression that memory pressure is a key component to trigger it. But after only two hours of light usage the GPU locked up on me again, even though the X server was "fresh" and did not consume excess amounts of memory. I'd also like to mention that I'm regularly using suspend to RAM and xrandr with the VGA output. I can't say for sure if I had lockups without having used xrandr, all I can say is that all the recent lockups occurred with two outputs (the built-in display and a VGA screen attached). System configuration: - xorg-server-1.6.3.901-r1 - mesa-7.5.1 - xf86-video-intel-2.8.1 - libdrm-2.4.11 - Kernel 2.6.31, KMS enabled, no additional patches applied For me lockups happen with X server 1.6.4 Mesa 7.0.3 libdrm 2.4.14 video-intel 2.9.0 linux 2.6.30, no KMS (broken on this kernel and hardware) plus some earlier versions. I'm also seeing this behavior on an eeePC 900 running debian testing with the following versions: 0 dkg@pip:~$ uname -a Linux pip 2.6.30-2-686 #1 SMP Sat Sep 26 01:16:22 UTC 2009 i686 GNU/Linux 0 dkg@pip:~$ dpkg -l xserver-xorg-video-intel xserver-xorg-core libgl1-mesa-dri libgl1-mesa-glx libglu1-mesa libdrm2 libdrm-intel1 Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad) ||/ Name Version Description +++-==============-==============-============================================ ii libdrm-intel1 2.4.14-1+b1 Userspace interface to intel-specific kernel ii libdrm2 2.4.14-1+b1 Userspace interface to kernel DRM services - ii libgl1-mesa-dr 7.6-1 A free implementation of the OpenGL API -- D ii libgl1-mesa-gl 7.6-1 A free implementation of the OpenGL API -- G ii libglu1-mesa 7.6-1 The OpenGL utility library (GLU) ii xserver-xorg-c 2:1.6.4-2 Xorg X server - core server ii xserver-xorg-v 2:2.9.0-1 X.Org X server -- Intel i8xx, i9xx display d 0 dkg@pip:~$ lspci -s 00:02 00:02.0 VGA compatible controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 04) 00:02.1 Display controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 04) 0 dkg@pip:~$ here are the kern.log messages related to i915 since the reboot before last (that run hung): Oct 22 17:19:37 pip kernel: [ 90.809022] [drm:i915_gem_detect_bit_6_swizzle] *ERROR* Couldn't read from MCHBAR. Disabling tiling. Oct 22 17:19:37 pip kernel: [ 90.809058] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0 Oct 22 17:19:47 pip kernel: [ 100.243856] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0 Oct 22 17:37:13 pip kernel: [ 1146.669106] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0 Oct 22 19:44:00 pip kernel: [ 8753.300564] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0 Oct 22 21:46:10 pip kernel: [14378.165746] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0 Oct 23 02:44:54 pip kernel: [30531.722601] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0 Oct 23 11:06:04 pip kernel: [31446.830591] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0 Oct 23 15:36:34 pip kernel: [47676.624077] [drm:i915_gem_idle] *ERROR* hardware wedged Oct 23 15:38:40 pip kernel: [ 84.678737] [drm:i915_gem_detect_bit_6_swizzle] *ERROR* Couldn't read from MCHBAR. Disabling tiling. Oct 23 15:38:40 pip kernel: [ 84.678772] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0 Oct 23 15:38:50 pip kernel: [ 94.428043] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0 Similarly to Robert Huitl (comment 3), i also regularly suspend-to-RAM, and attach/detach several different external monitors regularly via the VGA output with xrandr. If i can help debug this somehow, please let me know. it's pretty frustrating. 0x7d000006: 3DSTATE_MAP_STATE 0x022a7ecc: 0x00000003: mask 0x022a7ed0: 0x05380000: map 0 MS2 0x022a7ed4: 0x00000194: map 0 MS3 0x022a7ed8: 0x0fe00000: map 0 MS4 0x022a7edc: 0x057d4000: map 1 MS2 0x022a7ee0: 0x000de584: map 1 MS3 0x022a7ee4: 0x6fe00000: map 1 MS4 This is suspicious as it implies that the texture does not comply with the size restrictions imposed by the hardware. I think this bug should be fixed with commit 465a4ab416b2e5ad53b96702720331a44fffa2fe Author: Eric Anholt <eric@anholt.net> Date: Wed Aug 12 19:29:31 2009 -0700 Align the height of untiled pixmaps to 2 lines as well. The 965 docs note, and it's probably the case on 915 as well, that the 2x2 subspans are read as a unit, even if the bottom row isn't used. If the address in that bottom row extended beyond the end of the GTT, a fault could occur. Thanks to Chris Wilson for pointing out the problem. My lockups are coming on a system (as documented) running xserver-xorg-video-intel 2:2.9.0-1 (debian package version number), which appears to contain the changeset you referenced (465a4ab416b2e5ad53b96702720331a44fffa2fe), according to /usr/share/doc/xserver-xorg-video-intel/changelog.gz So if that's the fix, then maybe i'm seeing a different bug? Should i open a new report? I'm installing intel-gpu-tools now so that i might be able to get a gpu dump next lockup. (In reply to comment #7) > So if that's the fix, then maybe i'm seeing a different bug? Should i open a > new report? I'm installing intel-gpu-tools now so that i might be able to get > a gpu dump next lockup. I'm pretty confident that the original bug is due to the invalid texture size. So please file a new bug as soon as you grab a gpu dump, thanks. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.