Bug 22904 - 865G freeze with Intel driver 2.8.0
865G freeze with Intel driver 2.8.0
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Driver/intel
git
x86 (IA32) Linux (All)
: high critical
Assigned To: Carl Worth
Xorg Project Team
: NEEDINFO
: 23088 23365 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-07-23 00:27 UTC by René
Modified: 2009-10-20 07:31 UTC (History)
12 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Configuration that works on my 865G (849 bytes, text/plain)
2009-07-23 00:27 UTC, René
no flags Details
Old X server log with the latest "freezing" start at the end (22.95 KB, text/plain)
2009-07-23 00:31 UTC, René
no flags Details
X server log for the start that works using configuration in attachment #27935 (13.21 KB, text/plain)
2009-07-23 00:34 UTC, René
no flags Details
Configuration that makes server freeze (2.42 KB, application/octet-stream)
2009-07-23 00:57 UTC, René
no flags Details
intel_gpu_dump.txt for the frozen server (135.32 KB, application/x-gzip)
2009-07-23 02:28 UTC, René
no flags Details
Diagnostic and configuration files for the UXA freeze with driver 2.7.1 (141.66 KB, application/x-compressed-tar)
2009-07-23 03:10 UTC, René
no flags Details
Diagnostic and configuration files for the EXA freeze with driver 2.7.1 (6.23 KB, application/x-compressed-tar)
2009-07-23 03:38 UTC, René
no flags Details
Latest configuration and log that works - 2.7.1 with XAA, kernel 2.6.30.2 (7.29 KB, application/x-compressed-tar)
2009-08-03 06:20 UTC, René
no flags Details
intel_gpu_dump.txt.bz2 (96.23 KB, application/octet-stream)
2009-09-02 12:55 UTC, Thomas Bettler
no flags Details
Xorg.0.log (20.47 KB, application/octet-stream)
2009-09-02 12:57 UTC, Thomas Bettler
no flags Details
intel_gpu_dump_200909022254.txt.bz2 (83.97 KB, application/octet-stream)
2009-09-02 13:57 UTC, Thomas Bettler
no flags Details
intel_gpu_dump_200909031844.txt.bz2 (101.61 KB, application/octet-stream)
2009-09-03 09:56 UTC, Thomas Bettler
no flags Details
and the according backtrace... (7.40 KB, text/plain)
2009-09-03 09:59 UTC, Thomas Bettler
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description René 2009-07-23 00:27:57 UTC
Created attachment 27935 [details]
Configuration that works on my 865G

Using OpenSUSE 11.2 Factory M3+, kernel 2.6.30.2 and latest updates from the X11:Xorg repository for it, X server freezes after a few seconds using the configuration file produced by "X -configure" (loading dri2 and so on).
After the freeze, only the mouse works, but the X server is unusable and doesn't respond on any action. This happened also already in KDM, if I do repeatedly some actions there. The system behind is stable and if I switch to console mode in time no crash or freeze can be recognized. WHat I did last time was:
1. KDM logon starting ICEWM (not KDE) as a normal user
2. Launch XTerm
3. 'su -' in xterm
4. From that XTerm, start KDE systemsettings (for the root user)
The systemsettings tools opens but after that there is no more response to input actions, although the windows and look of the X server isn't broken.

If I use the xorg.conf in attachment, which is produced during booting the OpenSUSE installation CD, X server is stable, but there seems not to work acceleration, especially moving windows is slow.

I'll add some attachments for this.
Comment 1 René 2009-07-23 00:31:25 UTC
Created attachment 27936 [details]
Old X server log with the latest "freezing" start at the end

You see it's loading DRI2 and other modules which might cause that freeze.
Comment 2 René 2009-07-23 00:34:06 UTC
Created attachment 27938 [details]
X server log for the start that works using configuration in attachment #27935 [details]

Sorry, it loads dri2, too, so the freeze must come from another difference.
Comment 3 Gordon Jin 2009-07-23 00:46:10 UTC
Please provide intel_gpu_dump output referring to http://intellinuxgraphics.org/intel-gpu-dump.html.

What's the previous intel driver version ever working for you? (the first attachment from installation CD uses vesa driver instead of intel driver)
Comment 4 René 2009-07-23 00:57:16 UTC
Created attachment 27940 [details]
Configuration that makes server freeze
Comment 5 René 2009-07-23 01:02:48 UTC
(In reply to comment #3)

> What's the previous intel driver version ever working for you? (the first
> attachment from installation CD uses vesa driver instead of intel driver)
> 

2.7.1 using EXA acceleration worked for me.

Comment 6 Gordon Jin 2009-07-23 01:46:39 UTC
(In reply to comment #5)
> (In reply to comment #3)
> 
> > What's the previous intel driver version ever working for you? (the first
> > attachment from installation CD uses vesa driver instead of intel driver)
> > 
> 
> 2.7.1 using EXA acceleration worked for me.
> 

How about 2.7.1 with UXA?
Comment 7 René 2009-07-23 02:28:58 UTC
Created attachment 27943 [details]
intel_gpu_dump.txt for the frozen server
Comment 8 René 2009-07-23 03:03:31 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #3)
> > 
> > > What's the previous intel driver version ever working for you? (the first
> > > attachment from installation CD uses vesa driver instead of intel driver)
> > > 
> > 
> > 2.7.1 using EXA acceleration worked for me.
> > 
> 
> How about 2.7.1 with UXA?
> 

Freezes also, I'll attach the diagnostic and configuration files. Using 2.7.1 and EXA the server works, but from my feeling it is slow.
Comment 9 René 2009-07-23 03:10:37 UTC
Created attachment 27945 [details]
Diagnostic and configuration files for the UXA freeze with driver 2.7.1
Comment 10 René 2009-07-23 03:38:14 UTC
Created attachment 27946 [details]
Diagnostic and configuration files for the EXA freeze with driver 2.7.1

There is some interesting news:
Still using driver 2.7.1, I made some experiments with the server configuration and got a freeze of the same kind (from outside, after a few seconds and actions) also with the EXA acceleration.

I exchanged only the x11-xorg-driver-video package, but left the other packages, als libdrm, libpciaccess and so on as they were before (OpenSUSE 11.2 factory versions). I'm not sure, whether there is a compatibility problem. Just wanted to give a hint that the problem might be even somewhere else outside the driver, but this decision I will leave on you :-)

See attachements for log and configuration.
Comment 11 René 2009-07-29 02:43:27 UTC
I continued some testing with this:

I'm now on self-compiled driver 2.7.1 and libdrm 2.4.12.

It seems definitive from my point of view - the freeze happens with both acceleration methods, UXA and EXA, in driver 2.7.1, and UXA in 2.8.0.
(wanted to try also XAA, but XAA doesn't come up at all with 2.7.1 for some buffer allocation problem.)

IMHO, the problem seems to come rather from the kernel part - drm-intel in 2.6.30.2, right?
Comment 12 René 2009-07-29 03:43:02 UTC
For completing this:
The problem couldn't even be solved with the latest kernel 2.6.31-rc4 (OpenSUSE Kernel:/Head repository for OpenSUSE factory, driver 2.8.0 and libdrm 2.4.12.

It seems like I'm banned to use the vesa driver for now until this gets solved.
Comment 13 Carl Worth 2009-07-31 11:06:14 UTC
René,

Thank you very much for your bug report. I greatly appreciate you trying the various versions of the driver, the various acceleration architectures, and the various kernels.

The bug seems pervasive enough that I should hopefully not have any difficult reproducing it with an 865 here. I'll try that as soon as I can, and let you know what I can come up with.

-Carl
Comment 14 René 2009-08-03 06:20:05 UTC
Created attachment 28301 [details]
Latest configuration and log that works - 2.7.1 with XAA, kernel 2.6.30.2

Thanks Carl.
I thought it could be helpful if you would also have a configuration that runs:
I gave XAA with driver 2.7.1, kernel 2.6.30.2 a second try. Now it works and this is the latest driver version and configuration I get working at the moment.
Comment 15 René 2009-08-05 03:48:30 UTC
I got to retract what I've said about 2.7.1 and XAA using kernel 2.6.30.2. Even this combination is fairly unstable. I get a black screen instead of kdm, an only each second start (using the reset button) worked.
After the latest update to OpenSUSE 11.2 Milestone 5 there is kernel 2.6.31-rc4. Using XAA with this kernel it gets more unstable than before and freezes also as initially reported for UXA.
For me it seems that this problem is independend of the acceleration method which is used, but most probably somewhere in a kernel part of the driver.

If you want to reproduce it try KDM and KDE4 (not necessary to allow desktop effects a'la Compiz). KDE4 does something which let X freeze using the 2.8.0 or 2.7.1 driver after a few seconds or sometimes a few minutes.

There just one driver working for me again now: vesa.
Comment 16 Wang Zhenyu 2009-08-17 18:23:49 UTC
*** Bug 23088 has been marked as a duplicate of this bug. ***
Comment 17 Thomas Bettler 2009-08-28 14:38:04 UTC
me too.
I got the same kind of random freezes with my 865G under 2.6.30 using intel 2.7.1 in exa mode.
same symptoms, like screen hangs but ssh still works.

seems very hard to locate the issue:
- kernel?
- X?
- intel driver?
- KDE3? / KDE4?
- anything else?

helping ideas would be really appreciated.

Thomas
Comment 18 Chris Wilson 2009-08-28 14:47:05 UTC
Hi Thomas,
  if you can also ssh in and grab a gpu dump when the server hangs, that would be useful. See http://intellinuxgraphics.org/intel-gpu-dump.html for instructions. The gpu dump René captured is most curious...
Comment 19 René 2009-08-31 00:35:02 UTC
(In reply to comment #18)
> Hi Thomas,
>   if you can also ssh in and grab a gpu dump when the server hangs, that would
> be useful. See http://intellinuxgraphics.org/intel-gpu-dump.html for
> instructions. The gpu dump René captured is most curious...
> 

... but it was done in the same way - logged on to ssh on the host with the hanging X from a different machine and made the dump.

BTW: Nothing new even with the combination of driver 2.8.1 and kernel 2.6.31-rc7, the hang still occurs "reliably" after a few actions and seconds playing with the KDE4 systemsettings tool from within the ICEWM window manager (which is my "reference test".
Comment 20 Thomas Bettler 2009-09-02 12:55:25 UTC
Created attachment 29116 [details]
intel_gpu_dump.txt.bz2

gpu dump by ssh using>>>
kernel: 2.6.30
KMS: off
xf86-video-intel: 2.7.1
xorg-server: 1.6.3.901
mesa: 7.5
libdrm: 2.4.12

Result: Screen hangs exept mouse pointer. (Even Ctrl+Alt+Back has no effect)

Thanks for any helping hints.
Comment 21 Thomas Bettler 2009-09-02 12:57:05 UTC
Created attachment 29117 [details]
Xorg.0.log

and this was logged.
Comment 22 Thomas Bettler 2009-09-02 13:57:57 UTC
Created attachment 29119 [details]
intel_gpu_dump_200909022254.txt.bz2

and right here another dump.
Comment 23 Thomas Bettler 2009-09-03 09:56:25 UTC
Created attachment 29180 [details]
intel_gpu_dump_200909031844.txt.bz2
Comment 24 Thomas Bettler 2009-09-03 09:59:46 UTC
Created attachment 29181 [details]
and the according backtrace...

dmesg shows nothing exeptional.
Xorg.log same as above...
Comment 25 René 2009-09-08 01:33:46 UTC
Does anyone have a workaround or a good xorg.conf for i865G on a 2.6.31 kernel except using the VESA driver until the bug gets fixed? The VESA driver comes with many other rendering problems and side effects, for instance in Eclipse, and is, of course, very slow.
Comment 26 Carl Worth 2009-09-11 13:43:50 UTC
I'm very happy to report that thanks to Eric Anholt, we now have what we believe
is a fix for these annoying 865 hange.

René, can you attempt the kernel patch posted by Eric here and report back
whether the bug is fixed?

http://lists.freedesktop.org/archives/intel-gfx/2009-September/004122.html

Thanks,

-Carl
Comment 27 Carl Worth 2009-09-11 13:48:07 UTC
*** Bug 23365 has been marked as a duplicate of this bug. ***
Comment 28 Carl Worth 2009-09-11 13:51:42 UTC
*** Bug 22947 has been marked as a duplicate of this bug. ***
Comment 29 Carl Worth 2009-09-17 11:42:49 UTC
We've gotten several out-of-band confirmations of Eric's patch fixing
865 GPU hangs. So I'm going to mark this bug fixed now.

René, if you test the newer kernel and find the bug still present, please
feel free to reopen the bug.

Thanks for all the help!

-Carl
Comment 30 Carl Michal 2009-09-17 16:56:27 UTC
This patch does fix the problem for me.  Thanks.

BUT:  If I compile the i915 driver as a module, clflush_cache_range is an unresolved symbol ??

I had to compile the driver (and agp driver) in to the kernel for this to resolve.

Should I open a new bug for this?
Comment 31 Carl Worth 2009-09-17 18:32:25 UTC
(In reply to comment #30)
> This patch does fix the problem for me.  Thanks.

Excellent. Thanks for the confirmation.

> BUT:  If I compile the i915 driver as a module, clflush_cache_range is an
> unresolved symbol ??
> 
> I had to compile the driver (and agp driver) in to the kernel for this to
> resolve.
> 
> Should I open a new bug for this?

No need to open a new bug. Brice noticed this as well, reported it, and proposed a fix here:

http://lists.freedesktop.org/archives/intel-gfx/2009-September/004128.html

I believe Eric has already implemented this so it will be going out this way with kernel releases (if it hasn't started appearing already).

-Carl
Comment 32 Stefan Dirsch 2009-09-26 02:30:30 UTC
Apparently it has been committed to Linus' git kernel tree

  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

by the following commit:

 commit e517a5e97080bbe52857bd0d7df9b66602d53c4d
 Author: Eric Anholt <eric@anholt.net>
 Date:   Thu Sep 10 17:48:48 2009 -0700

  agp/intel: Fix the pre-9xx chipset flush.
    
  Ever since we enabled GEM, the pre-9xx chipsets (particularly 865) have had
  serious stability issues.  Back in May a wbinvd was added to the DRM to
  work around much of the problem.  Some failure remained -- easily visible
  by dragging a window around on an X -retro desktop, or by looking at 
  bugzilla.
    
  The chipset flush was on the right track -- hitting the right amount of
  memory, and it appears to be the only way to flush on these chipsets, but the
  flush page was mapped uncached.  As a result, the writes trying to clear the
  writeback cache ended up bypassing the cache, and not flushing anything!  The
  wbinvd would flush out other writeback data and often cause the data we wanted
  to get flushed, but not always.  By removing the setting of the page to UC
  and instead just clflushing the data we write to try to flush it, we get the
  desired behavior with no wbinvd.
    
  This exports clflush_cache_range(), which was laying around and happened to
  basically match the code I was otherwise going to copy from the DRM.
  
  Signed-off-by: Eric Anholt <eric@anholt.net>
  Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
  Cc: stable@kernel.org
Comment 33 René 2009-10-20 07:31:15 UTC
Sorry for the late answer:
Eric's fit hit the bull's-eye, the following combination works for me in OpenSUSE 11.2 RC1 factory:
- kernel 2.6.31.3
- xorg-server 1.6.5
- intel video driver 2.8.99.902
Previous xorg server and video driver version back to 2.7.1 might also work, the kernel patch made the change.