Bug 14464 - [G33, 945GM]X locks up when using EXA, is fine for XAA
Summary: [G33, 945GM]X locks up when using EXA, is fine for XAA
Status: RESOLVED DUPLICATE of bug 17638
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: 7.3 (2007.09)
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Wang Zhenyu
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 13493 15000 18858
  Show dependency treegraph
 
Reported: 2008-02-11 14:18 UTC by Alan W. Irwin
Modified: 2009-04-06 20:09 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
log file with exa with a backtrace that was caught (76.72 KB, text/plain)
2008-02-11 14:19 UTC, Alan W. Irwin
no flags Details
xorg.conf configuration file used for creating the lockup (2.53 KB, text/plain)
2008-02-11 14:22 UTC, Alan W. Irwin
no flags Details
Log and Backtrace from 945GM (51.33 KB, text/plain)
2008-02-19 10:05 UTC, Johannes Engel
no flags Details
dmesg file generated (just) after the crash for g33 chipset box (23.73 KB, text/plain)
2008-02-20 03:38 UTC, Alan W. Irwin
no flags Details
Current output from "ps auxww" for g33 chipset box giving typical idle desktop job mix (16.13 KB, text/plain)
2008-02-20 03:40 UTC, Alan W. Irwin
no flags Details
results of startx >& startx.out3 (5.48 KB, text/plain)
2008-04-02 14:53 UTC, Alan W. Irwin
no flags Details
X log file (271.38 KB, text/plain)
2008-04-02 14:54 UTC, Alan W. Irwin
no flags Details
compressed (gzip) .xsession-errors file for all of 6-day test (184.69 KB, text/plain)
2008-04-02 14:57 UTC, Alan W. Irwin
no flags Details
X log file that caught backtrace for lockup after 6 hours (79.13 KB, text/plain)
2008-08-18 12:22 UTC, Alan W. Irwin
no flags Details
compressed .xsession-errors starting with 17-day x session that worked and finished by 6-hour xsession that locked up (475.17 KB, text/plain)
2008-08-18 12:31 UTC, Alan W. Irwin
no flags Details
Xorg.0.log of my Debian_Lenny_i386@DELL_OptiPlex_330 (34.85 KB, text/plain)
2009-03-19 14:36 UTC, Roman Danilov
no flags Details

Description Alan W. Irwin 2008-02-11 14:18:50 UTC
For the latest Intel driver (packaged for Debian unstable as xserver-xorg-video-intel version 2:2.2.0.90-3) startx seems to work well and give me a reliable KDE 2D desktop with 3D games such as foobillard working well also for XAA.  When I switched to the default EXA as an experiment, the KDE desktop worked well for about an hour with no bad symptoms then froze with random colours on the screen.  The only way I could regain control of my mouse and monitor was to do reboot.  BTW, a warm reboot (shutdown -r now) worked for this purpose rather than having to go to a power-off state with (shutdown -h now).

The log file (to be attached along with xorg.conf) caught a Backtrace that might be useful for diagnosing the problem.

System environment:

Chipset: g33 (ASUS P5K-V MB)
System architecture: x86_64
Debian unstable package versions:
xf86-video-intel: packaged as xserver-xorg-video-intel version 2:2.2.0.90-3
xserver: packaged as xserver-xorg version 1:7.3+10
mesa: a number of different mesa-related packages with version 7.0.2-4
drm: packaged as libdrm2 version 2.3.0-4
kernel version: 2.6.23-1-amd64
Linux distribution: Combination of Debian unstable (for X and kernel) and
Debian testing
Machine or mobo model: ASUS P5K-V

Reproduce steps: light use (two quick 3D games played, then editing a single file, then nothing) of the KDE desktop for roughly an hour, then lockup.  The lockup actually occurred when I was away from my desk when KDE was doing virtually nothing.
Comment 1 Alan W. Irwin 2008-02-11 14:19:53 UTC
Created attachment 14275 [details]
log file with exa with a backtrace that was caught
Comment 2 Alan W. Irwin 2008-02-11 14:22:56 UTC
Created attachment 14276 [details]
xorg.conf configuration file used for creating the lockup

If you uncomment

#       Option          "AccelMethod" "XAA"

in this file, all is well.
Comment 3 Jesse Barnes 2008-02-11 15:01:55 UTC
Maybe the lockup occurred when KDE blanked the screen?  Does it only occur after running 3D applications?
Comment 4 Alan W. Irwin 2008-02-11 17:27:24 UTC
> Maybe the lockup occurred when KDE blanked the screen?

To give more information, my KDE screensaver was/is disabled.  However, control centre ==> peripherals ==> display ==> power control has display power management enabled with standby after: disabled; suspend after: set to 1 minute; and power off (actually deepest suspend state) after: set to 10 minutes.

Also, the first symptom I noticed after being away from my desk was my monitor was not in deepest suspend state like it normally would have been with no input for 10 minutes, and when I clicked on a menu it went into berserk mode with random colours X lockup etc.

Thus, I thought the above supposition might be correct, but it is not that simple.  I just tried a 1 minute test now, and KDE put the monitor into suspend mode like it was supposed to with no problems.  So something I did during the hour leading up to the lockup (for example the two
3D games I played which was your second supposition) may have contributed, but the actual problem occurred for an idle desktop.

Note, I am willing to run all the tests you would like to get to the bottom of this. However, I hope you can replicate this issue on your own g33 equipment (if you have access to that) or on different Intel chipsets so the tests I have to do are reduced as much as possible.  The reason I say that is I am doing my own software development and research with this desktop and someone else is
also using the machine remotely so the reboots that are necessary when the tests fail interfere with the normal fairly heavy use.  Fortunately, sticking with XAA and not trying to gain access to the console (see bug 14430) means I don't have to reboot and allows us to just carry on with our work.
Comment 5 Johannes Engel 2008-02-19 10:05:13 UTC
Created attachment 14418 [details]
Log and Backtrace from 945GM

I can confirm that behaviour from my 945GM.
Actually I am testing drm kernel modules from git (TTM) up to now without any result.
Comment 6 Johannes Engel 2008-02-19 10:07:00 UTC
Removing [g33] since it seems to concern also different chips.
Changing severity to major since EXA is default.
Comment 7 Alan W. Irwin 2008-02-19 11:13:28 UTC
Tagging subject with [G33, 945GM] to be specific about the chipsets reported so far. According to http://en.wikipedia.org/wiki/Intel_GMA, both chipsets have GMA's which are in the "i915 family".  I wonder if anybody with "i965 family" chipsets [G965, 965GM, 960GL, G35, G45] have experienced similar problems?
Comment 8 Gordon Jin 2008-02-19 18:42:56 UTC
(In reply to comment #5)
> Actually I am testing drm kernel modules from git (TTM) up to now without any
> result.

Do you mean it's ok if using TTM?
Comment 9 Gordon Jin 2008-02-19 18:45:14 UTC
Reporters, could you try providing more info according to http://intellinuxgraphics.org/how_to_report_bug.html? e.g. dmesg, the detailed way to reproduce (which specific apps)?
Comment 10 Johannes Engel 2008-02-19 23:27:56 UTC
As far as I am concerned the bug happens not quite regularly. The only thing I know so far to reproduce it is use EXA and use the PC, for example surfin the internet. It won't take longer than 15 minutes until it crashes. 
I will try to get a dmesg output afterwards for which I will need a second PC since the first one is not usable after crashing. ;) Maybe ssh still works.
I installed drm kernel modules from git yesterday some minutes before leaving the bureau and did not experience a crash so far. But I will test further...
Comment 11 Gordon Jin 2008-02-20 00:15:17 UTC
(In reply to comment #10)
> I will try to get a dmesg output afterwards for which I will need a second PC
> since the first one is not usable after crashing. 

Or you can find the info in /var/log/messages. That's similar to dmesg while always kept there even for crash.


Comment 12 Johannes Engel 2008-02-20 01:16:55 UTC
I'm sorry, but that seems not to give anything usable:

Feb 19 18:32:09 wmaz5 gconfd (engel-3685): Signal 15 received, shutdown
Feb 19 18:32:09 wmaz5 kdm[13737]: X server for display :0 terminated unexpectedly
Feb 19 18:32:09 wmaz5 kernel: klauncher[2764]: segfault at b7f5fb04 eip b769e258 esp bf9f8d20 error 7
Feb 19 18:32:09 wmaz5 gconfd (engel-3685): terminate

So nothing about kernel messages.
Comment 13 Alan W. Irwin 2008-02-20 03:30:23 UTC
> Reporters, could you try providing more info according to
> http://intellinuxgraphics.org/how_to_report_bug.html? 

> e.g. dmesg

Sorry, I forgot to save dmesg at the time, but FWIW I have attached the version generated by the necessary reboot right after the detailed
the problem occurred.

> the detailed way to reproduce (which specific apps)?

It was for an idle KDE desktop.  Of course for that case, there are typically a lot of standard KDE apps running.  What I have right now is pretty typical of what I was running during the crash so I will give you the current output of "ps auxww" to answer this question as best I can.  

I cannot imagine how it would make any difference, but in the interests of full disclosure, this is a two-user system.  "irwin" is me running locally with the local Debian unstable X server for which I reported the crash. "barbara" is my wife running KDE clients on my machine but with a different X server on another machine (i.e., she is running an X-terminal).  Her KDE desktop was not affected by the X-server crash on my machine (except that I had to reboot which disrupted her work).  Because two people's work get disrupted by any reboots, this is an inconvenient test environment, but I am certainly happy to do what I can to test (once) further release candidates (or new kernel modules, see below) that purport to solve this issue.

> I installed drm kernel modules from git yesterday some minutes before leaving
> the bureau and did not experience a crash so far. But I will test further...

That sounds promising since it appears you have at least gone about a day without issues which is much better than the 15 minutes before a crash that you had before.  I will be watching closely here to see if you go several more days without problems since in that case I would be willing to try a new kernel drm module.  If/when that time comes, I will need detailed instructions on where to get it/how to build it since it has been at least 4 years since I built my own kernels or kernel modules, and a lot has changed since that Linux kernel era.
Comment 14 Alan W. Irwin 2008-02-20 03:38:47 UTC
Created attachment 14442 [details]
dmesg file generated (just) after the crash for g33 chipset box
Comment 15 Alan W. Irwin 2008-02-20 03:40:19 UTC
Created attachment 14443 [details]
Current output from "ps auxww" for g33 chipset box giving typical idle desktop job mix
Comment 16 Johannes Engel 2008-02-20 04:34:13 UTC
I'm sorry, but the problem also happens using the new drm modules from git. :(
Comment 17 Alan W. Irwin 2008-02-20 09:33:25 UTC
> I'm sorry, but the problem also happens using the new drm modules from git. :(

Sorry that didn't work out, but thanks, Johannes, for making this test.  That leaves the urgent and obvious question why the Intel guys have been unable to verify this problem yet. (Gordon remarked in another forum that he could not reproduce this bug with G33 equipment, and I presume he or someone else in the Intel driver group are trying hard to verify with 945GM, with no "success" yet.)  For example, you have tested the cutting-edge drm modules case, but I wonder if the Intel driver group are using some other cutting-edge component of the kernel or X (e.g., EXA) for their tests rather than more standard latest release versions of everything such as the mix of software versions I mentioned above that you get from Debian unstable.  I hope Gordon comments on this question.
Comment 18 Gordon Jin 2008-02-22 00:28:54 UTC
(In reply to comment #17)
> > I'm sorry, but the problem also happens using the new drm modules from git. :(
> 
> Sorry that didn't work out, but thanks, Johannes, for making this test.  That
> leaves the urgent and obvious question why the Intel guys have been unable to
> verify this problem yet. (Gordon remarked in another forum that he could not
> reproduce this bug with G33 equipment, and I presume he or someone else in the
> Intel driver group are trying hard to verify with 945GM, with no "success"
> yet.)  For example, you have tested the cutting-edge drm modules case, but I
> wonder if the Intel driver group are using some other cutting-edge component of
> the kernel or X (e.g., EXA) for their tests rather than more standard latest
> release versions of everything such as the mix of software versions I mentioned
> above that you get from Debian unstable.  I hope Gordon comments on this
> question.
> 

My team usually tests on master branch of git for xf86-video-intel/xserver/mesa/drm, usually with the default option (EXA is the default option now), on almost all platforms post i915.

When release (like 2.2.0.90) comes, we'll transition to the "more standard latest release versions of everything" for a relatively short period.

We can't reproduce this problems (and #14430) in both of above configurations. This is not very surprising because most people using 2.2.0.90 are also not complaining about this while the reproducing environment seem common for people to meet.

We never test the driver versions shipped from OS distribution. If you suspect the driver from Debian and the upstream one has different impact on this problem, maybe you can try the upstream one, or you can ask other Debian users if they also encounter it.
Comment 19 Wang Zhenyu 2008-03-25 20:53:04 UTC
Is this still valid? 

irwin, we just have 2.2.99.901 release, I don't know when it would hit debian sid or not. But you can just down it from http://xorg.freedesktop.org/archive/individual/driver/xf86-video-intel-2.2.99.901.tar.gz,
then "./configure --prefix=/usr; make; make install" (make sure xorg-dev is installed.)

I have run in kde on my G33 without problem. Any special application did you run? 
like 3d game, movie play or sth.
Comment 20 Alan W. Irwin 2008-03-25 21:34:25 UTC
Debian unstable (sid) has a track record of quickly following your driver releases so I am going to wait until your release hits sid before doing a further test of EXA.

To answer your other question my notes above said light KDE desktop use which for me is typically switching between xterms on different desktops and using the command line on those to develop programmes.  I probably was also running konqueror at the time of the crash as well.  I actually don't think this has much to do with any particular application.  Instead, my guess is there is some incompatibility between your driver and released versions of the X server that don't have all the capabilities that are in the git version of the X server that you tend to test with.  Thus, I suspect this bug will disappear once X becomes less of a moving target so that such incompatibilities are less likely, and I will keep testing your new driver versions (when they first hit sid) to discover whether that hypothesis is correct and reporting the results here.
Comment 21 Wang Zhenyu 2008-03-25 22:04:19 UTC
If there's any incompatibility with our driver for xserver from at least 1.3, then we're failing some place. 

Just build a driver shouldn't put much burden on you, if you could help to verify this one and bug #14430, that'll be great.

Comment 22 Julien Cristau 2008-03-26 06:10:42 UTC
On Tue, Mar 25, 2008 at 21:34:26 -0700, bugzilla-daemon@freedesktop.org wrote:

> --- Comment #20 from irwin@beluga.phys.uvic.ca  2008-03-25 21:34:25 PST ---
> Debian unstable (sid) has a track record of quickly following your driver
> releases so I am going to wait until your release hits sid before doing a
> further test of EXA.
> 
2.2.99.901 is available in experimental
(http://packages.debian.org/experimental/xserver-xorg-video-intel).
Unfortunately it's not yet built on amd64, but that should be fixed soon
(hopefully by tomorrow).

Cheers,
Julien
Comment 23 Alan W. Irwin 2008-03-26 09:15:56 UTC
I just noticed that Debian experimental package myself.  I won't wait for the binary for AMD64 since I can build that myself from the source package using debuild.  However, the tests are going to take some time because both problems now take some considerable (several days) "light desktop use" before they show themselves.

My previous reluctance to build was partly caused by my dislike for putting anything other than debian packaged results into /usr, but debuild produces debian packages so that is not an issue with it.  The other problem is this system is a production system for two users as I have explained before so I am willing to do tests periodically (such as right before final releases as now), but not too much more often than that.
Comment 24 Alan W. Irwin 2008-03-27 18:07:31 UTC
I have now started the EXA test which has now run for ~1 half hour without problems. If I avoid switching to the Linux console (which has already produced a problem, see http://bugs.freedesktop.org/show_bug.cgi?id=14430) it may take a a few more hours until it freezes if past experience is any guide.  OTOH, it might be fine indefinitely after that, but we will see.

Present system environment used for this EXA test (note most Debian X-related software has been considerably updated since the initial report):

Chipset: g33 (ASUS P5K-V MB)
System architecture: x86_64
Debian unstable package versions:
xf86-video-intel: packaged as xserver-xorg-video-intel version 2:2.2.99.901-1
(from Debian experimental)
xserver: packaged as xserver-xorg version 1:7.3+10 (X.Org X Server 1.4.0.90)
mesa: a number of different mesa-related packages with version 7.0.3~rc2-1
drm: packaged as libdrm2 version 2.3.0-4
kernel version: 2.6.24-1-amd64
Linux distribution: Combination of Debian experimental (for Intel driver)
Debian unstable (for X and kernel) and Debian testing
Machine or mobo model: ASUS P5K-V
Comment 25 Alan W. Irwin 2008-03-28 14:53:03 UTC
EXA is still working for me after ~20 hours which is much better than before.

Johannes, if you also get improved results with 2.2.99.901 perhaps it is time to close this bug (assuming either of us can open it again if the EXA problems reappear).

Comment 26 Johannes Engel 2008-03-29 12:12:12 UTC
It is indeed better, but for me the problem still exists, especially this jittering.
The screen also sometimes turns black and never wakes up (when the screensaver turns backlight off, there seems to be no possibility to awake it again except restarting).
Comment 27 Johannes Engel 2008-03-29 12:13:33 UTC
Ups, sorry, wrong bug. :(
This one is OK for me. :)
Comment 28 Alan W. Irwin 2008-03-29 13:17:07 UTC
Since Johannes nor me sees this bug any more, will close.
Comment 29 Wang Zhenyu 2008-03-30 18:33:06 UTC
Thanks for testing, and how about bug #14430 status?
Comment 30 Alan W. Irwin 2008-03-30 19:24:46 UTC
> Thanks for testing

You are most welcome, and BTW, I am still using (2D desktop use + foobillard 3D game + tvtime for watching TV + wxvlc for watching DVD's) EXA without problems so long as I don't attempt to switch to console (see following response to your next question).

> how about bug #14430 status?

That is still an issue.  See my latest remarks at http://bugs.freedesktop.org/show_bug.cgi?id=14430
Comment 31 Alan W. Irwin 2008-04-02 14:50:58 UTC
I am reopening this bug because the problem (freeze with random colours on the screen and keyboard/mouse not working so the only recovery method was a remote login and warm [shutdown -r now] reboot) manifested again.  This time, however, it took 6 days of my standard light 2D desktop use (mostly command-line work using many xterms and konqueror web browsing with the occasional foobillard
3D game and TV watching with tvtime) before the problem occurred.  That is definitely an improvement, but I am wondering if the Intel developers are just not seeing this bug because they don't test their device for weeks at a time. Note in the past I have run XAA continuously for months with no such problems so it appears EXA has a way to go to achieve that sort of stability.

Attachments for the X log, startx output, and .xsession-errors to come.

This just-completed test was for 2:2.2.99.901-1, but I notice Debian experimental now has 2:2.2.99.902-1 so I will try using that for my next test.
Comment 32 Alan W. Irwin 2008-04-02 14:53:50 UTC
Created attachment 15632 [details]
results of startx >& startx.out3
Comment 33 Alan W. Irwin 2008-04-02 14:54:36 UTC
Created attachment 15633 [details]
X log file
Comment 34 Alan W. Irwin 2008-04-02 14:57:27 UTC
Created attachment 15634 [details]
compressed (gzip) .xsession-errors file for all of 6-day test
Comment 35 Wang Zhenyu 2008-04-02 18:57:20 UTC
what about your mesa version? Can you run foobillard massively to show if that could cause the crash?

And how about removing your sdvo card and use VGA only?
Comment 36 Alan W. Irwin 2008-04-02 21:07:43 UTC
Actually, I do enjoy playing foobillard against the AI so I probably run it 5-10 times per day.  I look forward to telling my boss I must run it more!  (Just kidding, I am my own boss.)  

Seriously, though, foobillard was not running when the crash occurred so I doubt it had much to do with it.  But to answer your specific question, the mesa version is still 7.0.3~rc2-1 as reported here earlier for the start of this test 6 days ago.  The crash occurred doing something extremely ordinary with the KDE desktop.  I think it was a switch from one xterm to another, but I cannot be sure since I do that hundreds of times a day in an instinctive way without thinking much about it.  It's also possible I was clicking on the KDE kicker applet that allows you to switch between the various KDE virtual desktops (another instinctive thing I do many times each day).

> And how about removing your sdvo card and use VGA only?

You must be confusing me with someone else whose bug reports you are monitoring.  That's completely understandable because that monitoring of bug reports must be a huge (but necessary) job.

Just to remind you, I do run "VGA only" because I don't have an SDVO card.  Also, I am using a CRT monitor (Sony trinitron Multscan g200 bought in 2001) so an SDVO card wouldn't make sense in any case.

BTW, so far so good today with 2:2.2.99.902-1, but it will take 6 days to confirm it is better than 2:2.2.99.901-1 (if that turns out to be the case).
Comment 37 Michael Fu 2008-07-03 20:16:53 UTC
irwin,

Do you still experience this from time to time? almost 3 months has past away after your last comment, so I just want to double check the result... thanks.
Comment 38 Alan W. Irwin 2008-07-03 21:53:03 UTC
Last lockup was about a month or so ago, but I didn't quite know what to report.

The problem is there is a fair amount of churn in the driver version that the Intel team recommends for testing, and those recommendations are followed pretty closely by the Debian packagers.  Thus, I see pretty much see the same churn in the Debian unstable version which is the one I use. If I stick with one driver version nobody is interested in the result when I get a lockup after a month of use because that version is no longer cutting edge.

So what I have decided to do is leave this bug open as long as I keep seeing occasional lockups as a signal that EXA is still not as stable as XAA (where I never saw any lockups).  Also, if I get a lockup for the current cutting-edge driver rather than some older version I will mention that specifically as well.
Comment 39 Alan W. Irwin 2008-07-12 10:12:18 UTC
Just got another EXA lockup.  This time after only 3 days from when I executed startx. This timing is quite unusual.  Usually it takes much longer than that to see the EXA instability with the Intel driver.  This is for Debian unstable
xserver-xorg-video-intel, Version 2:2.3.2-2, and xserver-xorg-core, Version 2:1.4.2-1.  Right in the middle of googling for something with konqueror the screen froze with an arbitrary random colour pattern, and the only way to regain control was to ssh in from another computer and reboot.

This is a "for the record" report.  I doubt anybody can actually do much about the EXA instability with the intel driver until the X churn settles down a lot so that everybody has a better chance of being on the same page; the Intel driver developers have a chance to use their product with no changes or exiting from X for more than a week; etc.
Comment 40 Wang Zhenyu 2008-07-27 22:12:34 UTC
A better evaluation of this I think is to run with "DRI" off option for testing, which only enable EXA but disable all 3D dri rendering to show if 3D dri driver bug caused hw hang. irwin, you don't use 3D much, do you?

Current we don't have mechanism for debug which client corrupted the hw.
Comment 41 Alan W. Irwin 2008-07-28 08:55:34 UTC
> A better evaluation of this I think is to run with "DRI" off option for
> testing, which only enable EXA but disable all 3D dri rendering to show if 3D
> dri driver bug caused hw hang. irwin, you don't use 3D much, do you?

I play foobillard (3D billards) perhaps 5 times per day (for a few minutes each time) as relaxion from my work.  That's the only 3D app I run so my percentage of time spent running 3D apps is extremely low.  Also, I never see these lockups when running foobillard.  It's always in the middle of some mundane desktop task such as switching from one xterm to another or switching from one desktop to another.  So my best guess is that if you turn off 3D, you will still see the bug.  N.B.  you will normally have to wait for a relatively long time to see the issue.  For example, I haven't had a lockup since the
last report here (although I have had a number of reboots due to new kernel versions, power outages, and the like so it hasn't been continuous testing for all that time).
 
> Current we don't have mechanism for debug which client corrupted the hw.

Perhaps it is not any particular client, but instead some more fundamental problem like a memory leak?  I just had a look at memory consumption with top and VIRT and RES are respectively 437m and 49m for Xorg.  I rebooted only 4 days ago.  Those memory consumption numbers seem excessive to me since they start out much smaller just after such a reboot.  I have 2GB of memory on this machine so I am in no danger of running out of memory at the moment, but I will keep monitoring this situation to see whether the Xorg memory use continues to grow and ultimately an OOM condition might be the cause of the issues that I see.
Comment 42 Jesse Barnes 2008-07-31 13:16:02 UTC
Another thing you could try that might narrow things down for us is to disable render acceleration using the "ExaNoComposite" option in your xorg.conf file (Option "ExaNoComposite" "true" in your intel driver section).  That would tell us if our render code is broken or if it's the 3D driver causing trouble.
Comment 43 Alan W. Irwin 2008-07-31 16:20:59 UTC
I now think an OOM condition is probably a long shot.  The key number is RES which starts out near 10MB and grows to ~50MB after a week or so of use.  If that is a memory leak it is a slow one which will take a while to bring on an OOM condition for my 2GB of memory.  However, I will attempt to record RES each day for Xorg to keep track of how much it is growing for that application.
Comment 44 Alan W. Irwin 2008-07-31 16:36:07 UTC
> Another thing you could try that might narrow things down for us is to disable
render acceleration using the "ExaNoComposite" option in your xorg.conf file
(Option "ExaNoComposite" "true" in your intel driver section).  That would tell
us if our render code is broken or if it's the 3D driver causing trouble.

OK, I have (re-) started X with that option set to true.  I don't notice any change in speed for my 2D desktop or for FooBillard so I will try to keep this test going as long as it takes (barring some power outage or some other emergency) for my normal desktop usage.  It should be a while before I report results since this bug takes a long time to manifest itself (if it is going to do so at all with that option set).
Comment 45 Alan W. Irwin 2008-08-18 12:17:14 UTC
Just got another EXA lockup.  This time after only 6 hours from a cold start (shutdown -h now). This timing is quite unusual.  Usually it takes much longer than that to see the EXA instability with the Intel driver.  For example, just prior to this event I had a stretch of 17 days of continuous use with no issues before I powered off because of threatening thunder storms.  Both the 17-day good result and the present bad result were with Option "ExaNoComposite" "true". The present problem is for Debian unstable
xserver-xorg-video-intel Version 2:2.3.2-2+lenny2, and xserver-xorg-core Version 2:1.4.2-3.  The problem occurred right in the middle of viewing the userfriendly comic strip.  The screen froze with an arbitrary random colour pattern, and the only way to regain control was to ssh in from another computer and reboot.  I will add the relevant .xsession-errors and X log.  The latter caught a back trace.
Comment 46 Alan W. Irwin 2008-08-18 12:22:26 UTC
Created attachment 18357 [details]
X log file that caught backtrace for lockup after 6 hours
Comment 47 Alan W. Irwin 2008-08-18 12:31:07 UTC
Created attachment 18363 [details]
compressed .xsession-errors starting with 17-day x session that worked and finished by 6-hour xsession that locked up
Comment 48 Alan W. Irwin 2008-09-03 14:09:47 UTC
Just got another EXA lockup.  This time after 15 days of use from a warm start
(shutdown -r now).  This observed longer stability from a warm start compared to the previous lockup after only 6 hours from a cold start might be significant, but it might also be by chance.

This test and IIRC some previous ones were with the Debian 2.6.25 kernel for amd64 hardware. Since I had to reboot anyway to get out of this lockup, I took the opportunity to update my kernel to 2.6.26-1-amd64 from Debian testing so that is the kernel that will be used for my further long-term EXA stability tests.

My production desktop use is pretty standard.  Therefore, I assume the reason the Intel team hasn't seen this long-term EXA instability bug yet is they normally don't have the opportunity to use X for many days in a row without exiting from it for X/kernel/GEM/DRM/Mesa, etc., updates.
Comment 49 Michael Fu 2008-09-25 20:21:03 UTC
irwin, how about turning off DRI as zhenyu suggested in comment# 40? Have ever tried that? just like turning off composite, it help to verify if the issue is in another stage of EXA. thanks.
Comment 50 Alan W. Irwin 2008-09-26 01:41:50 UTC
> irwin, how about turning off DRI as zhenyu suggested in comment# 40?

I will miss foobillard, but in the interests of trying to narrow this down, I will try turning off DRI as soon as my current long-term test (which may go another week or two) is terminated by an X restart for another reason (power outage or Debian unstable driver or kernel update) or by an X lockup.

I strongly suspect DRI won't make any difference, though, because none of
my X lockups have been during my games of foobillard (my only DRI use).  I test foobillard a lot by running through a 5-minute game, 5 or so times a day, but I never run into trouble during those DRI "tests".  Instead, the typical scenario
is I get a lockup when I switch KDE desktops after a week or two of use.
Comment 51 Alan W. Irwin 2008-11-18 14:35:22 UTC
2008-11-18: X lockup after 37 days of continuous KDE 2D desktop use with DRI disabled.  The previous results (with foobillard played several times per day with DRI enabled) were as follows:

2008-08-18: lockup after ~6 hours
2008-09-02: lockup after 15 days
2008-09-27: lockup after 18 days
2008-09-30: lockup after 3 days

So the present result shows that disabling DRI appears to increase the reliability of X with EXA somewhat on Intel hardware, but the result is still nowhere near the rock solid X we had in the old days where we could boast of up-times of many months.

I will now start another long-term X test with DRI still disabled for the Debian unstable version of Kernel, drm, X, and Intel driver to help see whether the present 37 days is typical or a fluke. Is there any other test I can perform to help you track down this elusive bug?
Comment 52 Alan W. Irwin 2008-11-23 19:43:07 UTC
Another lockup without DRI after only 5 days this time.  So it appears I can demonstrate rather quick instability even with DRI turned off.  This lack of long-term stability is a big concern.  Are you guys running any tests like that yourselves?

I am starting the test again (using Debian unstable kernel -2.6.26 and X software packages as usual) now under the same conditions except this time I am trying it from a cold start (shutdown -h).  Is there any other test I can run to help you guys track this down?

Comment 53 Wang Zhenyu 2008-12-07 21:28:38 UTC
This needs to be retested against current git master or xf86-video-intel-2.6-branch. Thanks.
Comment 54 Alan W. Irwin 2008-12-08 10:18:34 UTC
> This needs to be retested against current git master or
xf86-video-intel-2.6-branch. Thanks.

I have long since decided not to test the absolute cutting edge for two reasons.  (1) This is a production box so I cannot afford for it to go down very often.  It is used heavily both by me and my wife for software development.  Her use is via an X-terminal so she uses X-clients like KDE but not any of the X server components on this box that are locking up for me with my direct use.  (2) There is so much churn in the absolute cutting edge right now in all X server components (such as the kernel, drm, mesa, and X server) that long-term stability tests just don't make sense unless there is some way to speed them up (see below).

Under these circumstances it seems best for me to test slightly away from the absolute cutting edge.  Thus, I will continue to test the Debian unstable version of kernel, drm, mesa, X server, and Intel driver.  Those X components aren't changing very rapidly at the moment with the Debian release freeze, but as soon as Debian is released they should get much closer to the cutting edge again.

I think the proper way for the Intel driver team to resolve this issue is to implement a "long-term" test.  The idea would be to capture your own keystrokes for say ~50 days of development activity using some major desktop such as KDE or GNOME. Then, the test would be to run all those keystrokes for a single cutting edge version of all X components to see when you trigger an X lockup for a particular hardware combination.   Once you captured the keystrokes, the tough part would be to try and figure out a way to emulate interactions with external sites (such as uploading files, posting to websites such as this one, and posting to mailing lists) without actually doing such activity.  

Of course, once you set up such a test, it would not take that long to actually run it since it would be done in in computer time rather than the human time required to make all those keystrokes.  Such a test would be a much better way to get at this bug (or set of bugs) than simply relying on my anecdotal evidence.  However, from the evidence that I have seen, I can virtually guarantee any such "long-term" test you tried would turn up X lockups for my Intel g33 chipset (and possibly for other Intel chipsets as well), and generating such lockups under controlled conditions would be a good first step toward dealing with this bug.

Comment 55 Wang Zhenyu 2009-02-08 19:22:21 UTC
sorry for long delay, how about current status? You may try current releases for xf86-video-intel (2.6.1), kernel (2.6.28.x), drm (2.4.4), mesa (7.3). 
Comment 56 Alan W. Irwin 2009-02-08 23:14:22 UTC
Debian is in freeze so its testing/unstable version I use for testing has fallen well behind.  kernel-2.6.26, drm-2.3.1, mesa-7.0.3, X server 1.4.2, Intel driver-2.3.2. No variation I tried gave me long-term stability on EXA for those package versions for g33 so now I am trying XAA to make sure that is stable at least.  So far my first such test is up to 16 days without issues.

I will get back to testing EXA on versions which are much more interesting to you when they become available on Debian unstable after Lenny is released.
Comment 57 Alan W. Irwin 2009-02-24 16:37:00 UTC
XAA test now in 33rd day without problems with my normal desktop use (including foobillard, the one 3D game I play a lot) with dri turned on.  Therefore, it looks like this is the way to go for a stable Intel g33 experience on the recently released Debian Lenny which is my current X stack.  Debian testing and unstable currently have an X stack similar/identical to Debian Lenny, but that will change much closer to the bleeding edge as soon as the experimental X packages are submitted to Debian unstable.  At that point, I will start testing EXA again, but meanwhile this current XAA test result is a good benchmark for the type of stability I hope to see with Intel g33 hardware and EXA in the future.
Comment 58 Roman Danilov 2009-03-19 14:36:47 UTC
Created attachment 24052 [details]
Xorg.0.log of my Debian_Lenny_i386@DELL_OptiPlex_330

I have this bug on my desktop system.

H/W: DELL OptiPlex 330, BIOS Rev A06

$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation 82G33/G31 Express Integrated Graphics Controller (rev 0a)

S/W:

$ uname -a
Linux xxx.xxx.xxx.xxx 2.6.26-1-686 #1 SMP Sat Jan 10 18:29:31 UTC 2009 i686 GNU/Linux

$ cat /etc/debian_version
5.0

D/E is Gnome, no 3d games. It lockups when some windows are open and I try to minimize/restore a window.

I have this bug on my desktop system.

H/W: DELL OptiPlex 330, BIOS Rev A06

$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation 82G33/G31 Express Integrated Graphics Controller (rev 0a)

S/W:

$ uname -a
Linux xxx.xxx.xxx.xxx 2.6.26-1-686 #1 SMP Sat Jan 10 18:29:31 UTC 2009 i686 GNU/Linux

$ cat /etc/debian_version
5.0

D/E is Gnome, no 3d games. It lockups when some windows are open and I try to minimize/restore a window.
Comment 59 Wang Zhenyu 2009-04-06 20:09:38 UTC
As Carl is looking into this problem, and provided debug instruction for this on #17638, mark this as dup.


*** This bug has been marked as a duplicate of bug 17638 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.