Bug 20560

Summary: [945GM] intel driver intermittent hangs on 2D desktop, with XAA/EXA/UXA
Product: xorg Reporter: Thomas Orgis <sobukus>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: high CC: airlied, bugzilla, cworth, dang, eric, freedesktop.org, Justus-bulk, kmaraas, lenar, lubos.kolouch, mozilla_bugs, sveinung84
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 20893    
Attachments:
Description Flags
Xorg log up to freeze
none
Xorg log for the different UXA freeze
none
dmesg up to freeze directly when starting with XAA
none
Dump of the hanged GPU
none
dumps of (sort of) frozen GPU with XAA and UXA
none
recent dump from UXA freeze (just by opening web-browser and scrolling a bit around, no 3D effects
none
contents of debug/dri directory for the recent UXA freeze
none
new dump and debugfs contents
none
intel_gpu_dump from my hung box
none
dmesg after running intel_gpu_dump
none
Xorg log
none
kernel-2.6.30-rc8-git6 dump
none
gentoo-sources-2.6.30, KMS enabled
none
crash with 2.7.99.901.tar.bz2
none
output of intel_gpu_dump
none
pre-lockup intel_gpu_dump (with compositing enabled)
none
post-lockup intel_gpu_dump (with compositing enabled)
none
GPU dump after recent freeze with driver 2.9.0
none
Fresh GPU dump after a freeze with unchanged setup today.
none
Xorg log with driver 2.12.0, detected hung GPU
none
Defect in font rendering, damaged "a". none

Description Thomas Orgis 2009-03-09 04:48:08 UTC
Created attachment 23687 [details]
Xorg log up to freeze

System environment: 
 -- chipset: 945GMA
 -- system architecture: 64bit (x86_64, core2duo)
 -- xf86-video-intel: 2.6.2
 -- xserver: 1.6.0 
 -- mesa: 7.3
 -- libdrm: 2.4.5 
 -- kernel: 2.6.28.5 (vanilla + patch for /dev/toshiba)
 -- Linux distribution: Source Mage
 -- Machine or mobo model: Toshiba Portege R500
 -- Display connector: Laptop LCD
 
 Reproducing steps:

Use the system for a while, plain window manager without compositing.
No need for 3D apps to be active... no significant relation to high system load but also not ruled out (last occurence was while booting up a KVM virtual machine).
After a day or so (system is not shut down, only put to S3 in between), X freezes.
Mouse cursor moves but no reaction to input events. CPU is idling.

Additional info:

I am able to log into the box via ssh (konsole switching does work...).
GDB shows that for the Xorg process:
0x00002afcfa73bc17 in ioctl () from /lib/libc.so.6
(gdb) bt
#0  0x00002afcfa73bc17 in ioctl () from /lib/libc.so.6
#1  0x00002afcfb8b2163 in drmIoctl () from /usr/lib/libdrm.so.2
#2  0x00002afcfb8b2466 in drmCommandNone () from /usr/lib/libdrm.so.2
#3  0x00002afcfbf4bcd8 in I830BlockHandler ()
   from /usr/lib/xorg/modules/drivers//intel_drv.so
#4  0x00000000005306c8 in AnimCurScreenBlockHandler ()
#5  0x00000000004faece in compBlockHandler ()
#6  0x000000000044bfab in BlockHandler ()
#7  0x00000000004e9171 in WaitForSomething ()
#8  0x00000000004483f0 in Dispatch ()
#9  0x000000000042e6bd in main ()
(gdb) q
Comment 1 Thomas Orgis 2009-03-09 05:02:55 UTC
I can get to a similar freeze/crash right away when setting AccelMethod to XAA.
Xorg fails to get me to a desktop, letting me see that in dmesg:

[drm] Initialized i915 1.6.0 20080730 on minor 0
[drm:i915_setparam] *ERROR* unknown parameter 4
[drm:i915_getparam] *ERROR* Unknown parameter 6
[drm:i915_getparam] *ERROR* Unknown parameter 6

And a very similar backtrace in gdb:

0x00007fbb30515c17 in ioctl () from /lib/libc.so.6
(gdb) bt
#0  0x00007fbb30515c17 in ioctl () from /lib/libc.so.6
#1  0x00007fbb2f37b163 in drmIoctl () from /usr/lib/libdrm.so.2
#2  0x00007fbb2f37b466 in drmCommandNone () from /usr/lib/libdrm.so.2
#3  0x00007fbb2ec90cd8 in I830BlockHandler ()
   from /usr/lib/xorg/modules/drivers//intel_drv.so
#4  0x00000000005306c8 in AnimCurScreenBlockHandler ()
#5  0x00000000004faece in compBlockHandler ()
#6  0x000000000044bfab in BlockHandler ()
#7  0x00000000004e9171 in WaitForSomething ()
#8  0x00000000004483f0 in Dispatch ()
#9  0x000000000042e6bd in main ()

Perhaps that's a viable shortcut to the same issue?
Comment 2 Thomas Orgis 2009-03-09 05:06:24 UTC
Oh, and after one freeze, killing X and then starting another one (without XAA) freezes the at the same point, stumbling over the GPU in the bad state, I guess.
Comment 3 Thomas Orgis 2009-03-09 07:05:22 UTC
OK, next iteration. I upgrade to linux-2.6.29-rc7, because people noted that the drm version of 2.6.28 may be too old (as the intel driver complained).
Now I don't have notes about missing drm functionality any more, but I managed to freeze rather quickly by running glxgears.
That is using UXA now... one CPU core is busy with glxgears still running, but X being stuck there:

0x00007f6ae6419c17 in ioctl () from /lib/libc.so.6
(gdb) bt
#0  0x00007f6ae6419c17 in ioctl () from /lib/libc.so.6
#1  0x00007f6ae4966fed in drm_intel_gem_bo_exec ()
   from /usr/lib/libdrm_intel.so.1
#2  0x00007f6ae4b89f5f in intel_batch_flush ()
   from /usr/lib/xorg/modules/drivers//intel_drv.so
#3  0x00007f6ae4bc9f93 in I830DRI2CopyRegion ()
   from /usr/lib/xorg/modules/drivers//intel_drv.so
#4  0x00007f6ae4e1419c in DRI2CopyRegion ()
   from /usr/lib/xorg/modules/extensions//libdri2.so
#5  0x00007f6ae4e149fb in ProcDRI2Dispatch ()
   from /usr/lib/xorg/modules/extensions//libdri2.so
#6  0x00000000004486c4 in Dispatch ()
#7  0x000000000042e6bd in main ()


So far I have the same lowlevel freeze with XAA, EXA and UXA ... sound at least consistent. And now that I can trigger it rather quickly, one can perhaps fix it.
Comment 4 Thomas Orgis 2009-03-09 07:07:35 UTC
Ok, last comment for now: I was able to kill glxgears from ssh and the desktop resumed to be usable. So not _that_ fatal this time. 
Comment 5 Ethan "eekee" Grammatikidis 2009-03-09 07:30:47 UTC
Very similar symptoms here. The mouse pointer moves but most windows don't receive key or mouse events. One window continues to receive key events and depending on window manager design it may be possible to switch desktops which does change the focused window. Quitting the focused program also causes focus to switch to another window. Any form of mouse-based focus control ceases to work. I'm sorry but I don't remember if I thought of trying alt-tab.

I did not test many server-side settings, mainly concentrating on input drivers. It made little difference whether I allowed evdev configured by hal, evdev with manual configuration, or the old keyboard and mouse drivers.

Often the console was left in an unusable state after I killed X. This may be a different bug, I'm not sure.

Unpredictable:
Once the X server lasted > 1.5 days before exhibiting this problem. Once it occurred within 3 minutes of starting: the only client was a single xterm, no window manager had yet been started and the xterm unluckily did not get focus. The MTBF for me seems to be somewhat less than a day.

Failure can occur with anything from a single xterm to heavy use (multiple heavyweight OpenGL apps + Firefox with dozens of tabs + many terminals).

Failure does NOT affect operation of any app except in that it can't get input. The one app stuck with keyboard focus behaves entirely as normal, whether it's an xterm or a powerful OpenGL app.

At no point have I used a compositing WM or other compositing manager when testing for this problem. Window managers tested are WindowMaker (most extensively), FVWM, and Sawfish.

The one thing that seems to make the bug less likely to manifest is fewer focus changes. I tried running the majority of my applications in a standalone VNC server (Xvnc from the TightVNC package), with a VNC viewer and 1 or 2 OpenGL apps connected to the main X server.


System:

xorg-server 1.5.3

2 graphics cards:
    nVidia Geforce 8600GT
    nVidia GeForce 7600 GS
        (Note: the 7600 card is detected but unusable for display. O.o)

Intel Core 2 Quad CPU

Pure x86_64 install, with a pure i686 chroot.

Kernel is Linux 2.6.26.2, 64-bit with 32-bit emulation.

X Server is 64-bit.

Note: it makes no difference whether running applications are 32-bit or 64-bit.

That's all I can think of to report. I will try xorg-server 1.6 and (attempt to) run gdb on it if I can do so without breaking my brain. I have too many other projects on the go, and I rely on leaving windows open so I don't forget where I've got to in each. (A fragile work system, I know, but I can't find anything else that works for me.)
Comment 6 Ethan "eekee" Grammatikidis 2009-03-09 07:45:11 UTC
DRM:
My kernel has all DRM options enabled as modules. I didn't think to check if any are loaded. I don't think it's relevant when using nvidia driver anyway. I don't recall whether xorg-server was built with options which require mesa with DRM or not. I do recall testing every possible combination of xorg-server with mesa; that was hard.
Comment 7 Lubos Kolouch 2009-03-09 14:07:25 UTC
I think it is similar to #20520, I have this issue as well.
Comment 8 Michael Fu 2009-03-09 18:31:37 UTC
Does

Option "EXANoComposite" "True"

help?
Comment 9 Eric Anholt 2009-03-09 21:54:44 UTC
Ethan: You have a GPU hang.  Please bring it up with nvidia, not a bug report on Intel hardware.

Thomas: Could you attach dmesg as well?
Comment 10 Thomas Orgis 2009-03-10 02:14:24 UTC
OK... I shall test with EXANoComposite Option, using UXA?
I guess this would not change the situation for XAA, where it froze right away.
Also I'm confused about who does what compositing stuff (as I thought you need a compositing window manager for that).
Anyhow, I can try.

As for the dmesg: Is this requesting a general dmesg of a system startup to determine the system config or about specific messages regarding the GPU freeze, as there are none of the latter...?

PS: At leat with the VESA driver, I didn't have a freeze yet... just headache from very fuzzy, low and horizontally stretched resolution (which reminds me of a wish to have letterboxed video modes, like 1024x768 or 640x480 scaled to the max height, but leaving black bars left and right). 
Comment 11 Gordon Jin 2009-03-10 02:25:28 UTC
(In reply to comment #10)
> OK... I shall test with EXANoComposite Option, using UXA?

using EXA.

> I guess this would not change the situation for XAA, where it froze right away.
> Also I'm confused about who does what compositing stuff (as I thought you need
> a compositing window manager for that).
> Anyhow, I can try.

No, the "composite" in EXANoComposite doesn't mean compositing window manager.
 
> As for the dmesg: Is this requesting a general dmesg of a system startup to
> determine the system config or about specific messages regarding the GPU
> freeze, as there are none of the latter...?

for the GPU freeze.

Comment 12 Thomas Orgis 2009-03-10 03:07:58 UTC
Hm, before reading your last comment, I tried UXA again, with that EXANoComposite Option ... which may be a no-op there and indeed showed no difference.
What I have, though, is more insight in the freeze I get when running glxgears with UXA for 15 seconds or so.
It's not glxgears! Regardless of the type of apps I start (or no at all? didn't try that), X11 freezes after a certain time with UXA.

It is something different... I am able to switch to another VT, but switching back shows only black.
In fact, I don't have to run any apps and wait, I can just swtich to VT and try to switch back -- I have the same state.
This is in dmesg when switchig VTs:

[drm] Initialized drm 1.1.0 20060810
pci 0000:00:02.0: power state changed by ACPI to D0
pci 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
pci 0000:00:02.0: setting latency timer to 64
[drm] Initialized i915 1.6.0 20080730 on minor 0
[drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
[drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
CE: hpet increasing min_delta_ns to 15000 nsec
[drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
[drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
[drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
mtrr: no MTRR for e0000000,10000000 found


The last line happens when killing X via Crtl+C from the console where I started it. There is also a corresponding message on the console about failed MTRR setting with invalid argument.
So, I cannot yet confirm if the original bug is there with UXA -- another one is quicker. I cannot tell if its related... seems like Xorg is confused with the setup of the output ports (pipes) of the video chip...

I am attaching a Xorg log of a VT switching run.

I will try EXA now for a bit with the compositing disabled.
If I can see something in dmesg, I will show...
Comment 13 Thomas Orgis 2009-03-10 03:12:40 UTC
Created attachment 23718 [details]
Xorg log for the different UXA freeze

A log for the differing freeze that happens quickly with UXA.
This is the corresponding gdb trace:

#0  0x00002b5c302f5633 in __select_nocancel () from /lib/libc.so.6
#1  0x00000000004e91eb in WaitForSomething ()
#2  0x00000000004483f0 in Dispatch ()
#3  0x000000000042e6bd in main ()
Comment 14 Eric Anholt 2009-03-10 12:16:45 UTC
the whole dmesg, at the time of the problem.
Comment 15 Thomas Orgis 2009-03-10 16:06:51 UTC
Created attachment 23741 [details]
dmesg up to freeze directly when starting with XAA

Attaching a dmesg from triggering the freeze via trying to use XAA and then simply startx ... gdb backtrace shows the ioctl() in DRM code as seen before.
Comment 16 Thomas Orgis 2009-03-10 16:10:09 UTC
A note: Running EXA without composite now, not a freeze for the past day yet.
Also, running glxgears with DRI doesn't seem to be a problem (didn't run that through the day, though). 
Comment 17 Lubos Kolouch 2009-03-10 23:33:47 UTC
For me turning off composite + EXA fixed the freezes as well. 

But I wonder if that is the permanent solution ;)
Comment 18 Thomas Orgis 2009-03-13 04:10:36 UTC
I guess that is not surprising, but indeed I am more and more assured that the NoComposite option qualifies as workaround.
No freeze this week (with that option).

But I strongly suppose that this is not the developer's intention (especially the sluggish 2D performance of my intel chip and bad responsiveness of the desktop).
So, I'm tuned for any news / requests to test things.
Comment 19 Thomas Orgis 2009-03-18 10:00:52 UTC
Ping.

No freeze with the workaround, with daily usage and no reboots (just ACPI S3).
Anything to test?
Comment 20 Lubos Kolouch 2009-03-20 00:00:39 UTC
Same thing here... is there anything to test so that this issue is resolved?
Comment 21 Thomas Orgis 2009-04-01 03:52:49 UTC
Another data point: I had a freeze even with the workaround!

Yesterday, X stopped reacting, mouse still moved.
Console switching was not possible, another machine for ssh debugging not available. Sysrq-REISUB worked for getting a clean reboot, though.
Also, I remember a similar freeze (but console switching was possible) with a 32bit Debian/Sidux install on another partition on the laptop... Xorg hanging in a kernel call. This was with xorg-server 1.4.2, intel driver 2.3.2 .

Isn't there an angle to attack this bug? It is really frustrating to have no stable graphics system for Linux on this machine. The VESA driver would be a working fallback, if it supported the widescreen resolution...
Comment 22 Thomas Orgis 2009-04-01 04:11:01 UTC
Correction: Seems like VESA driver _does_ support my 1280x800 mode.
So I am using the generic VESA driver now, lacking any 2D or 3D acceleration, hoping that it is at least stable:-/
Comment 23 Lubos Kolouch 2009-04-01 04:24:51 UTC
I think this might be duplicate of #20520
Comment 24 Carl Worth 2009-04-16 14:35:41 UTC
Hi Thomas and Lubos,

I know it's not much fun for you, but I'm delighted that you have such a
repeatable way to cause your GPU to hang.

For this case, we've developed a new tool that will provide us ver useful
information for debugging the hang. The tool is called intel_gpu_dump and is
contained within the intel-gpu-tools repository which you can obtain as
follows:

git clone git://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools

(Eric has been threatening to make a tar-file release of that, so if getting it
via git is a problem, let me know and I'll pester him to do that.)

The one trick to getting intel_gpu_dump to work is that it requires some code
only very recently added to the i915 kernel driver. The easiest way to get this
is with the recently released 2.6.30-rc2 version of Linux. If you can run that,
then running intel_gpu_dump should give a nice dump of the commands
most-recently submitted to the GPU. (If the GPU is not hung, the output is
often almost empty, but when it's hung, then you should definitely get some
output.)

If you could run that tool and send us the output, then that will be very
helpful for us to identify and fix the bug.

Thanks,

-Carl
Comment 25 Lubos Kolouch 2009-04-17 11:39:42 UTC
Carl,

Happy to help but no luck with this.

I got git-sources-2.6.30-rc2-git3, built it, but it crashes
at the beginning of booting when trying to mount the XFS / partition.

How to do?
Comment 26 Lubos Kolouch 2009-04-18 06:48:44 UTC
Update: I could finally boot into the kernel, but I cannot so far reproduce the problem with the 2.6.30 kernel... I tried to hibernate/resume about 50 times over 8 hours, but it did not freeze.

I will keep on trying...
Comment 27 Lubos Kolouch 2009-04-20 22:30:31 UTC
OK, I got it... pleae find attached the dump while the GPU was hanged... what else can I try?
Comment 28 Lubos Kolouch 2009-04-20 22:32:18 UTC
Created attachment 24985 [details]
Dump of the hanged GPU
Comment 29 Thomas Orgis 2009-04-23 02:56:12 UTC
Created attachment 25056 [details]
dumps of (sort of) frozen GPU with XAA and UXA

Here now my dumps for the issues I can trigger reliably (without waiting for days), using kernel 2.6.30-rc3 and udated libdrm 2.4.9:

- Startup with XAA selected, no X session coming up, nothing to see.
- Run with UXA, starting some app (glxgears, but as it seems it does not matter what app), freezing screen after some time (under a minute).

In the second case, it is possible to switch back to console and actually get the text display. The X11 VT remains black on switching back.

Anyhow, I triggered the freezes (or whatever one calls the XAA thing) and collected dumps via SSH. One for XAA, 3 consecutive ones, taken from the same situation, just in a series to show that there is still something happening, for UXA.
Comment 30 Thomas Orgis 2009-04-29 03:19:27 UTC
Ping. Is the debug info useful?

(Man, I'm lucky that I can switch between internal/external video via ACPI, lacking such facility in the VESA Xorg driver...)
Comment 31 Carl Worth 2009-04-29 09:24:12 UTC
(In reply to comment #30)
> Ping. Is the debug info useful?

Hi Thomas,

Thanks so much for collecting and posting the dumps. And yes, they should be very useful. I just haven't had the chance to look at them closely yet, but I hope to be able to do so today.

Stay tuned,

-Carl
Comment 32 Thomas Orgis 2009-04-30 01:26:07 UTC
I have to add something disturbing: Yesterday, I used the laptop for a presentation, with VESA driver and the beamer connected to the VGA output, running at XGA resolution (1024x768). The LCD was turned off (laptop was switched to external output only).
The presentation consistet of a PDF file shown via kpdf... after the first part finished (about 1h45min), I wanted to load another file and observed that the screen is frozen! But this time including the mouse cursor (I guess it's drawn per software with VESA driver). Had to reboot using SysRq, no other machine for SSH debugging there.

The next cycle froze again, this time after 15 minutes or so, not more.
In both cases the machine has been left alone for some time and we then wanted to change to the next slide / scroll down in the PDF, observing that the display is frozen.

I worked without issues for many days now using the VESA driver and the internal display... and suddenly, I manage to get a freeze.
Is it likely that this is a bug in the VESA driver? Or is it likely that there is just something wrong with how Toshiba soldered this machine together?
Can it be that a certain device on the VGA output freezes the GPU? That seems gross.

Well, this is the VESA driver after all, but the same chip... so perhaps this is relevant here.
Comment 33 Eric Anholt 2009-05-12 16:42:56 UTC
OK, of those dumps with UXA, the GPU always looks like it's caught up.  If those dumps were with X hung looking like a GPU hang (mouse moves but nothing responds), could you tar up copies of the rest of /debug/dri/0/* along with the dump next time it happens so I can take a look at them together?
Comment 34 Thomas Orgis 2009-05-14 14:55:10 UTC
Created attachment 25871 [details]
recent dump from UXA freeze (just by opening web-browser and scrolling a bit around, no 3D effects
Comment 35 Thomas Orgis 2009-05-14 14:55:55 UTC
Created attachment 25872 [details]
contents of debug/dri directory for the recent UXA freeze
Comment 36 Thomas Orgis 2009-05-14 15:00:52 UTC
As indicated in the description of the two new attachments:
I finally had some time (during the night) to provoke the freeze again and document it.
With UXA, it's very reliable. Starting glxgears and waiting for some seconds is fine, but, just to show that it's just about any acceleration: I created this freeze by opening a web page in the browser and scroll a bit up and down.

Such a stable freezer must be discoverable... but can one also say something about the disturbing freeze in VESA mode? The recollection is mysterious and I didn't reproduce it yet... but I am really confident that I did not use the intel driver at that time.
Am I facing two bugs, one in the intel driver, on in Xorg in general? One in the hardware?
Experience taught me that a symptom is usually caused by several bugs, fixing one seldom was enough:-/

Comment 37 Thomas Orgis 2009-05-19 03:09:17 UTC
Created attachment 25993 [details]
new dump and debugfs contents

I tried the new driver 2.7.1 with kernel 2.6.30-rc2 (plus that recent patch about buffer prefetch in GEM code, but which may only be relevant for KMS?).

The immanent freeze after starting X11 with UXA seems to be gone now, but I managed to hose the setup quickly by playing with xrandr for switching on/off external VGA, with resolution changes.

Also, I think I got it freezing by rotating the screen back to normal from left, but my memory is more clear about certain applications (gtk2 and Qt, fltk2, motif apps not being hit) got massive issues with pixmaps and fonts after rotating back to normal.
Pixmaps just black&white distortet lines/structures, text only hintet at by some stray line segments.

Well, the display bugs may relate to brokenes of the toolkits or the Xserver with the Xrandr fun instead of the intel driver... but freezing on output changes is a different matter.
And most importantly for this bug: I jsut got hit by the freeze-after-some time that I initially reported again.

Anyhow, I attached now debugging material from output switching... the recent freeze hitting me unprepared without a second box to do debugging from (ssh).
Comment 38 Thomas Orgis 2009-05-19 04:07:17 UTC
I also find lots of these in my log (dmesg), from the time of experimenting with the intel driver:

May 19 11:24:12 [kernel] X:5679 freeing invalid memtype e0d92000-e0d93000
May 19 11:24:12 [kernel] X:5679 freeing invalid memtype e0d94000-e0d95000
May 19 11:24:12 [kernel] X:5679 freeing invalid memtype e0d96000-e0d97000
May 19 11:24:12 [kernel] X:5679 freeing invalid memtype e0d98000-e0d99000
May 19 11:24:12 [kernel] X:5679 freeing invalid memtype e0d9a000-e0d9b000
May 19 11:24:12 [kernel] X:5679 freeing invalid memtype e0d9c000-e0d9d000
May 19 11:24:12 [kernel] X:5679 freeing invalid memtype e0d9e000-e0d9f000
May 19 11:24:12 [kernel] X:5679 freeing invalid memtype e0da0000-e0da1000
May 19 11:24:12 [kernel] X:5679 freeing invalid memtype e0da2000-e0da3000
May 19 11:24:12 [kernel] X:5679 freeing invalid memtype e0dac000-e0dad000

Does not sound that nice...
Comment 39 Lubos Kolouch 2009-05-19 04:58:21 UTC
With 2.6.30-rc6-git3 it behaves differently to me... after resume from hibernation the desktop freezes as usually (including keyboard), but the mouse is still working and the cursor moving.
Comment 40 Thomas Orgis 2009-05-31 02:42:43 UTC
Ping.

Sorry for being noisy, but just want to indicate that I'm still interested in getting this resolved... the original freeze still being there, apparently.
Comment 41 Daniel Gryniewicz 2009-06-06 07:40:04 UTC
Created attachment 26491 [details]
intel_gpu_dump from my hung box

I got this hang on my system.  Here's a dump, and a dmesg (including an oops from intel_gpu_dump, which may invalidate the dump) will be attached next.

Vitals:
0:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)

kernel: 2.6.30-rc7-git6
xorg-server: 1.6.1.901-r3 (I can get you the list of patches that constitute -r3)
xf86-video-intel: 2.7.1
mesa: 7.4.2
libdrm: 2.4.11
intel-gpu-tools: 1.0.1
evdev: 2.2.1
xf86-input-synaptics: 1.1.2

This seems to happen to me every day or 2, so I can easily check things and install things.  I'll also attach my xorg.conf, and I have Xorg.0.log, but there's nothing interesting in there that I can see.
Comment 42 Daniel Gryniewicz 2009-06-06 07:40:32 UTC
Created attachment 26492 [details]
dmesg after running intel_gpu_dump
Comment 43 Daniel Gryniewicz 2009-06-06 07:40:50 UTC
Created attachment 26493 [details]
Xorg log
Comment 44 Thomas Orgis 2009-06-07 14:08:50 UTC
FYI... I am running a bastard of current Xorg stuff with old Mesalib (7.1) and the intel driver version 2.4.1 since a week.
That combination has been stable for me so far.
It's excruciating slow (could be that this increases with time... it didn't occur to me to be that bad a few days + suspend cycles ago), but it works, also with external VGA (not perfectly... ).

Needless to say, I am not really happy with this "solution", but I need to get work done on that machine... still hoping that we can get this freeze resolved.
Comment 45 Lubos Kolouch 2009-06-09 13:50:42 UTC
It happens to me even with git-sources-2.6.30-rc8-git6... annoying as hell...
Comment 46 Lubos Kolouch 2009-06-10 10:24:10 UTC
Created attachment 26642 [details]
kernel-2.6.30-rc8-git6 dump
Comment 47 Lubos Kolouch 2009-06-13 02:53:58 UTC
Created attachment 26748 [details]
gentoo-sources-2.6.30, KMS enabled

ping?
Comment 48 Thomas Orgis 2009-06-13 04:05:36 UTC
Should the new test release 2.7.99-something that has been spottet by people on the dev mailing list (I'm not on that list, so I didn't get it first-hand) apply here?
After reading the post, I guess so.
Lubos: Do you have time to test that dev release of the intel driver?
Comment 49 Lubos Kolouch 2009-06-13 06:12:29 UTC
Thomas, do you have the exact link? I don't see it in any gentoo overlay :(
Comment 50 Thomas Orgis 2009-06-13 06:32:19 UTC
I got the hint from a related bug discussion:

http://bugs.archlinux.org/task/14594

It poins to that one:

http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/snapshot/xf86-video-intel-2.7.99.901.tar.bz2
Comment 51 Lubos Kolouch 2009-06-13 07:26:48 UTC
OK, got it, will test and report back...
Comment 52 Lubos Kolouch 2009-06-15 03:36:47 UTC
Created attachment 26807 [details]
crash with 2.7.99.901.tar.bz2
Comment 53 Lubos Kolouch 2009-06-15 03:37:15 UTC
(In reply to comment #50)
> I got the hint from a related bug discussion:
> 
> http://bugs.archlinux.org/task/14594
> 
> It poins to that one:
> 
> http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/snapshot/xf86-video-intel-2.7.99.901.tar.bz2
> 

Lasted a bit longer, but still NG :(
Comment 54 Eric Anholt 2009-06-30 18:27:08 UTC
Comment on attachment 26807 [details]
crash with 2.7.99.901.tar.bz2

Marking Lubos's dump as obsolete since it's not about this bug (doesn't match Thomas's at all).
Comment 55 Lenar Lõhmus 2009-08-06 01:03:53 UTC
Please see https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/385232, which could be the same problem. There I've attached couple of dumps taken during freeze and after freeze and also written down some thoughts.
Comment 56 Vladimir Volovich 2009-08-16 05:15:17 UTC
Created attachment 28663 [details]
output of intel_gpu_dump

I also see such hangs from time to time, when mouse could be moved, but otherwise it looks frozen (keyboard's CAPS led cannot be switched too); but i can ssh to the machine.

When such hang occurs, i always see at the end of the "dmesg" output the message such as this:

[310494.410338] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 1

following recommendation at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=525231
i've made a dump using intel_gpu_dump which i'm attaching here.

my system environment:
-- chipset: G33
-- system architecture: x86_64
-- xf86-video-intel/xserver/mesa/libdrm version:
   xserver-xorg-video-intel: 2.8.0
   xorg-server 2:1.6.3-1
   mesa: 7.5-3 (OpenGL version string: 1.4 Mesa 7.5.1)
   libdrm: 2.4.12-1
-- kernel version: 2.6.30-1-amd64
-- Linux distribution: debian sid
-- Machine or mobo model: DG33TL
-- Display connector: DVI
Comment 57 Thomas Orgis 2009-08-26 04:52:42 UTC
Ping?

Is there some indication of this specific (!) hang being prevented by a new driver version?
I cannot just test each new revision to discover during a talk that my presentation freezes... I am still running xorg 1.6 with intel driver 2.4 here, which is stable, but I suspect I may have less trouble fighting with xrandr over my modes by using a driver from more recent times... or have decent 2D performance, for a start.

But if it's unstable, it is worse than anything else:-(
Comment 58 Eric Anholt 2009-09-02 15:28:49 UTC
Comment on attachment 25872 [details]
contents of debug/dri directory for the recent UXA freeze

this tar file didn't work because sysfs files have 0 length so tar didn't read anything -- need to copy to a temp directory first.

Thomas, could you try again with a current driver stack and post a new dump?  The output of intel_gpu_dump these days should be enough, without making a tarball.
Comment 59 Gordon Jin 2009-09-14 20:36:29 UTC
unblocking Q3 release, but still a high priority bug.
Comment 60 Thomas Orgis 2009-09-15 04:42:18 UTC
Sorry, I really need to dig up some time where I can allow me to break my setup on the laptop and make it freeze again for debugging... It need to change versions of intel driver and mesalib ... and change back, without having my Xorg setup broken for future work.
Comment 61 Kjartan Maraas 2009-09-17 04:31:43 UTC
Seeing a similar hang here, but I get a hard lockup when I try to run intel_gpu_dump :-/

xorg-x11-server-Xorg-1.6.99.901-2.fc12.i686
xorg-x11-drv-intel-2.8.0-13.20090909.fc12.i686
Kernel 2.6.31-rc9 based with drm from -next

Graphics chip
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)
Comment 62 Eric Webb 2009-09-22 20:31:07 UTC
I just wanted to chime in on this.  I am experiencing the same.  now i know to try the intel_gpu_dump program when this happens next time.

is there anything else we can do to help, or any updates?  very frustrating problem as you might imagine.  cheers.
Comment 63 Justus-bulk 2009-10-09 23:52:31 UTC
I have this issue too, and it persists after upgrading to the following configuration:

915GM, i686, video-intel driver 2.9.0, xserver 1.6.4, kernel 2.6.30.8 (from Debian unstable; most of the rest of the system is Debian testing, KDE 4.3.1)

Activating compositing appears to make things worse, leading to lockups sooner than with compositing turned off.

Thanks to a cron job, I attach two intel_gpu_dumps, one from about
five minutes before my latest lock-up (with compositing turned on),
the other from about five minutes after.  I can give you lots more
pre-lockup dumps, or provoke lockups and send post-lockup dumps, if it helps.

I hope this will help resolve this issue! Let me know if I can be of
further help.
Comment 64 Justus-bulk 2009-10-09 23:55:20 UTC
Created attachment 30243 [details]
pre-lockup intel_gpu_dump (with compositing enabled)
Comment 65 Justus-bulk 2009-10-09 23:56:17 UTC
Created attachment 30244 [details]
post-lockup intel_gpu_dump (with compositing enabled)
Comment 66 Eric Anholt 2009-10-19 13:08:49 UTC
(this bug is still blocked on response from Thomas.  If you aren't Thomas, please open your own bug)
Comment 67 Thomas Orgis 2009-10-20 06:44:54 UTC
I am taking chances now with this setup:

xf86-video-intel 2.9.0
xorg-server 1.6.3
mesalib 7.5.1
linux 2.6.30-rc2 with that GEM patch (hm, yeah, I might want to install some more current non-rc release)

I am not forcing any options on the driver, and a look at Xorg.log suggests that it is using UXA.

I am running this since, hm, Sunday. One session that is put on ACPI S3 between usage intervals. I set up a script to take a GPU dump each minute... so when something happens, we should get a picture.

I might try to fiddle with VGA output switching later today, to force a crash. But, well, only time can tell if the initial bug of freezing "some time" strikes again.
Comment 68 Justus-bulk 2009-10-29 00:41:22 UTC
This bug looks very much like Bug #24753. I continue my interaction over there since this bug is about my exact GPU (i915).
Comment 69 Thomas Orgis 2009-11-02 01:23:28 UTC
No new freeze so far. Though I had two reboots or so due to running out of battery (yeah, a warning light for low battery just under our palm on the laptop front is really useful...). Is that good? I mean, I would like to _know_ that it is fixed, not just guess.

I also tinkered with VGA output on/off and resolution changes, played some video... what still keeps be being suspicious is ongoing reports of people on http://bugs.archlinux.org/task/14594 (I'm not on Arch, but they got nice bug tracker activity about this issue).

I will report when I find something... but when can you tell that a nondeterministic bug is gone?
Comment 70 Thomas Orgis 2009-12-06 10:47:12 UTC
OK, long time no see, but now it just happened again on me. Freeze.

This is my rather special version setup (it came from supporting the stable intel driver version 2.4, only necessary update for testing new version):

xorg-server 1.6.3
mesalib 7.5.1
intel driver 2.9.0
linux 2.6.30rc2 with patches for /dev/toshiba and intel GEM fix

I am attaching a gpu dump taken after the freeze via ssh. I am not sure if it will contain the proper information, as I mounted debugfs after the freeze -- but hopefully it does.
Comment 71 Thomas Orgis 2009-12-06 10:50:13 UTC
Created attachment 31783 [details]
GPU dump after recent freeze with driver 2.9.0
Comment 72 Thomas Orgis 2009-12-09 11:55:37 UTC
Created attachment 31892 [details]
Fresh GPU dump after a freeze with unchanged setup today.

And yet again, I got stuck by the freeze. This time the classic scheme of hitting me while trying to scroll down in the web browser (konqueror 3.5.10).
Incidentally, this was on a website relating to a search for a way to get a Thinkpad with a screen like my Toshiba... man, the Thinkpad just worked... and had a trackpoint... I wonder if we get that intel driver / GPU bug resolved before I get totally sick of this beautiful and slick machine because of it's nauseating shortcomings.

Well, I'm off to downgrade to intel driver 2.4 again...
Comment 73 Eric Anholt 2009-12-10 14:05:38 UTC
Those two new dumps show different behavior to your problem back in may, and definitely strange and interesting behavior.  An excerpt:

ACTHD: 0xff8d8000
EIR: 0x00000000
EMR: 0xffffffff
ESR: 0x00000001
PGTBL_ER: 0x00000000
IPEHR: 0x0a8b4000
IPEIR: 0x00000000
INSTDONE: 0x7fffffc1
batchbuffer at 0x0aa05000:
0x0aa05000:      0x54300004: XY_COLOR_BLT (rgb enabled, alpha enabled, dst tile 0)
0x0aa05004:      0x03f01400:    format 8888, pitch 5120, clipping disabled
0x0aa05008:      0x002502e0:    (736,37)
0x0aa0500c:      0x002602e1:    (737,38)
0x0aa05010:      0x0a8b4000:    offset 0x0a8b4000
0x0aa05014:      0xffc0c0c0:    color
0x0aa05018:      0x54300004: XY_COLOR_BLT (rgb enabled, alpha enabled, dst tile 0)
....
0x0aa0528c:      0x03f01400:    format 8888, pitch 5120, clipping disabled
0x0aa05290:      0x002402e1:    (737,36)
0x0aa05294:      0x00250353:    (851,37)
0x0aa05298:      0x0a8b4000:    offset 0x0a8b4000
0x0aa0529c:      0xff8d8d8d:    color
0x0aa052a0:      0x54300004: XY_COLOR_BLT (rgb enabled, alpha enabled, dst tile 0)

Note how IPEHR is suspiciously the offset of a buffer that's been blitted to, and ACTHD looks like a page-aligned version of the color used in some of the blits.  The leftovers in other batches in there show that the batches generally were full of these blits.  So it looks like the GPU got to execute a bit of leftover junk in a batch that should have been other commands.  Do we have some lengths of packets wrong?  Or do we have some caching going wrong and we're executing a bit of a previous batch in a new one?

Not sure what's going on exactly, but this is definitely an interesting bug.  Since it appears to be all in 2D and he's been working in the area, passing it off to ickle.

Comment 74 Chris Wilson 2010-01-05 03:15:03 UTC
(In reply to comment #73)
> Those two new dumps show different behavior to your problem back in may, and
> definitely strange and interesting behavior.  An excerpt:

[snip]

> Note how IPEHR is suspiciously the offset of a buffer that's been blitted to,
> and ACTHD looks like a page-aligned version of the color used in some of the
> blits.  The leftovers in other batches in there show that the batches generally
> were full of these blits.  So it looks like the GPU got to execute a bit of
> leftover junk in a batch that should have been other commands.  Do we have some
> lengths of packets wrong?  Or do we have some caching going wrong and we're
> executing a bit of a previous batch in a new one?

Wow, that is an interesting dump. Oh, if only the GPU had a stack of ACTHD/BBADDR... I have had suspicions in the past that we may have executed beyond BATCH_BUFFER_END. Hmm.

Just out of curiosity, what is generating all that flood filling of a large area using 1x1 fills?
Comment 75 Thomas Orgis 2010-01-05 06:08:09 UTC
(In reply to comment #74)

> Just out of curiosity, what is generating all that flood filling of a large
> area using 1x1 fills?

Scrolling in a web page with konqueror 3.5.10, I presume... well, that is the only hint I have. Not sure if scrolling via the touchpad's border gives one-pixel increments.

PS: I indeed am working on a Thinkpad X200 now... getting too annoyed with the lack of a TrackPoint and proper Keyboard in the Toshiba (Please, somebody produce a ThinkPad with a transflective LED screen like the Toshiba...).
I still have the bugged laptop and will continue to use it (outside, when the sun shines...), but you won't get new info from me without prompting for test runs.
The good side is that I don't rely solely on that machine anymore and thus could devote it totally to some Xorg driver test runs for some time.
Comment 76 Thomas Orgis 2010-06-25 23:02:13 UTC
Hello? Any hint that this issue might be resolved in a current driver version?
I am running this bastard setup of xorg 1.6 and intel driver 2.4.1 that's stable but still not really satisfactory. It's summer again and I want to use the Toshiba machine for it's sunlight-friendliness.

An ubuntu 10.04 install on a pen drive didn't show the freeze yet, but I did not run it for that long... and mainly displaying DVB video, perhaps that's safe workload.
I really do not feel good just upgrading the driver and hoping that the next freeze will hit me soon or never. My computer needs to be reliable -- I don't want to have my slides freezing on me during a talk/seminar again!

So... considering all the debugging info, is there something in driver 2.11 that is supposed to fix this? I don't have the faith anymore that such bugs just resolve itself through waiting:-/
Sorry for the bitterness, it's real.
Comment 77 Chris Wilson 2010-06-26 01:00:06 UTC
Sorry Thomas, I've read and reviewed this bug several times over the last couple of months hoping for a definitive answer. My gut feeling is that I have altered the drawing code sufficiently [to reduce the impact of relocations] that the hangs that are plaguing you should no longer be reproducible with 2.12.0. But I haven't knowingly fixed it.
Comment 78 Thomas Orgis 2010-07-13 16:09:13 UTC
Well... I upgraded my Xorg install, KMS and all... and it seemed to work rather well for a while. But then this: Suddenly my terminals (rxvt-unicode) were missing the filled cursor (it just blackened the characters), command line editing became fuzzy.

I wouldn't immediately correlate that with Xorg breakage, weren't there this message in Xorg.0.log at around the same time, apparently:

(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(EE) intel(0): Detected a hung GPU, disabling acceleration.

I guess there's indeed some causality... the sudden removal of acceleration breaking the rendering for my terminals. Well, I suppose this can be considered progress -- at least it's not frozen. I am attaching the log file ... interested in any comments on your side...
Comment 79 Thomas Orgis 2010-07-13 16:10:34 UTC
Created attachment 37004 [details]
Xorg log with driver 2.12.0, detected hung GPU
Comment 80 Chris Wilson 2010-07-13 16:21:17 UTC
So still broken. If you get a chance to update to a post 2.6.34 kernel (and if you hibernate at all, a post-2.6.35-rc4 kernel is highly recommended) can you please upload the /sys/kernel/debug/dri/0/i915_error_state after the GPU hangs. Hopefully, the automatic error capture will give a better insight into the bug.
Comment 81 Chris Wilson 2010-07-16 03:54:06 UTC
Dave Airlie and Adam Jackson have been hitting an issue with XY_SRC_COPY_BLT on 945GM and reduced to a simple x11perf -copypixwin500. Thomas, can you confirm that this is a reproducible trigger?
Comment 82 Thomas Orgis 2010-07-18 06:53:19 UTC
Created attachment 37173 [details]
Defect in font rendering, damaged "a".

I cannot test a new kernel just yet, but I want to share another observation: Font display is buggy with the current setup. I observed distortions during scrolling, which didn't persist... so hard to describe. But what I can easily describe and also show is some issue with font rendering.

At random, some letter seems to be rendered badly and will be damaged for an application during its lifetime. I am attaching an example of "a" being damaged in Claws-Mail. Saw the same effect in Firefox with "e".

The hung GPU didn't occur again yet.
Comment 83 Thomas Orgis 2010-07-18 07:00:15 UTC
(In reply to comment #81)
> Dave Airlie and Adam Jackson have been hitting an issue with XY_SRC_COPY_BLT on
> 945GM and reduced to a simple x11perf -copypixwin500. Thomas, can you confirm
> that this is a reproducible trigger?

At first I thought not... but then... letting the test run longer ... so that it gives me the line

   8000 reps @   3.7410 msec (   267.0/sec): Copy 500x500 from pixmap to window

and the still waiting a bit. Then hitting Ctrl+C some times on the console to get it done... and bang, I got my weird console again, without visible cursor and marking of text resulting in straight black instead of black text surrounded by green (inverted colors).

(EE) intel(0): Detected a hung GPU, disabling acceleration.
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error
(WW) intel(0): i830_uxa_prepare_access: gtt bo map failed: Input/output error

[lots of the last line repeated]
Comment 84 Dave Airlie 2010-07-18 16:48:01 UTC
 x11perf -copywinpix500

also might make it happen even faster, at least it seems to here.
Comment 85 Dave Airlie 2010-07-20 17:49:08 UTC
Patches sent to Linus today in drm-fixes branch

should resolve this issue.

944001201ca0196bcdb088129e5866a9f379d08c drm/i915: enable low power render writes on GEN3 hardware.
45503ded966c98e604c9667c0b458d40666b9ef3 drm/i915: Define MI_ARB_STATE bits

are the commit ids.
Comment 86 Thomas Orgis 2010-07-31 05:57:15 UTC
I installed linux-2.6.35-rc6 now. Before, on the 2.6.33.3 kernel, I verified that

x11perf -repeat 100 -copywinpix500

eventually kicks out my video chip (not during the first 5 runs). One the new kernel I had it runnig for perhaps 30 repetitions and nothing happened. So I am hesitatingly confirming that this specific bug has been fixed. Didn't observe the garbled fonts so far, neither.

Time needs to show if the initial issue, the freeze after a few hours or days of usage, is really fixed. Well, I might try if running the Homeworld demo in wine does freeze my X11 with the new kernel, too... just noticed that today.

Anyhow, thanks for working on this! I'll try to report again after some weeks or so (the bug is old enough anyway) ... and hopefully I'll report that I got no freezes. Off now to figuring out why Xorg refuses to pick up the synaptics driver from the HAL config... I hate touchpads, even more so if not properly configured:-/
Comment 87 Chris Wilson 2010-07-31 15:06:50 UTC
On Sat, 31 Jul 2010 05:57:17 -0700 (PDT), bugzilla-daemon@freedesktop.org wrote:
> https://bugs.freedesktop.org/show_bug.cgi?id=20560
> 
> --- Comment #86 from Thomas Orgis <sobukus@sourcemage.org> 2010-07-31 05:57:15 PDT ---
> I installed linux-2.6.35-rc6 now. Before, on the 2.6.33.3 kernel, I verified
> that
> 
> x11perf -repeat 100 -copywinpix500
> 
> eventually kicks out my video chip (not during the first 5 runs). One the new
> kernel I had it runnig for perhaps 30 repetitions and nothing happened. So I am
> hesitatingly confirming that this specific bug has been fixed. Didn't observe
> the garbled fonts so far, neither.

Thanks for taking the time to do the testing. Time will tell if it we have
indeed fixed the other mysterious glitches.
 
> Time needs to show if the initial issue, the freeze after a few hours or days
> of usage, is really fixed. Well, I might try if running the Homeworld demo in
> wine does freeze my X11 with the new kernel, too... just noticed that today.

Hmm, be careful here as that mixes in mesa and a whole different can of
worms and set of unknowns...
 
> Anyhow, thanks for working on this! I'll try to report again after some weeks
> or so (the bug is old enough anyway) ... and hopefully I'll report that I got
> no freezes. Off now to figuring out why Xorg refuses to pick up the synaptics
> driver from the HAL config... I hate touchpads, even more so if not properly
> configured:-/

My first guess is that you accidentally changed the input API version on
the server, but not updated the drivers, so X is refusing to load the old
synaptics driver. Check Xorg.log for an obvious error.
Comment 88 Thomas Orgis 2010-08-01 03:53:02 UTC
(In reply to comment #87)
> Thanks for taking the time to do the testing. Time will tell if it we have
> indeed fixed the other mysterious glitches.

Here's hoping;-)

Short on the off-topic things:
>> if running the Homeworld demo in
> > wine does freeze my X11 with the new kernel, too
> Hmm, be careful here as that mixes in mesa and a whole different can of
> worms and set of unknowns...

Well, the new kernel fixed that particular one... performance is awful, but that's probably not the fault of the intel driver as such. A different topic.

> > Xorg refuses to pick up the synaptics driver
> My first guess is that you accidentally changed the input API version on
> the server

It was the kernel/HAL not giving the input.touchpad capability to the touchpad, but input.mouse instead. After rebooting with the old kernel and going back to the new one, the touchpad suddenly gets input.touchpad set and all is fine. That is scary, but also, out of scope of this bug report.
Comment 89 Chris Wilson 2010-08-06 11:51:26 UTC
Thomas, I'm going to mark this as being the fatal BLT. Please do open new bugs for any issue so that we can look at those afresh (without being distracted by the huge thread here).

Given the era of Homeworld, I would have thought that would have been playable on i945, so it might be worth profiling to see if there is any low hanging fruit. And I definitely can't help with input issues! Sorry.

Thanks for the bug report and for all the testing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.