Bug 24789 - [i855GM bisected] Freeze shortly after X startup
Summary: [i855GM bisected] Freeze shortly after X startup
Status: RESOLVED DUPLICATE of bug 27187
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: medium critical
Assignee: Chris Wilson
QA Contact: Xorg Project Team
URL: https://bugs.launchpad.net/ubuntu/+so...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-29 09:28 UTC by Geir Ove Myhr
Modified: 2010-05-30 08:47 UTC (History)
25 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dri_debug-2009-10-10.tar.gz (97.82 KB, application/x-gzip)
2009-10-29 09:28 UTC, Geir Ove Myhr
no flags Details
Output of intel_gpu_dump (105.06 KB, application/x-gzip)
2009-10-29 09:29 UTC, Geir Ove Myhr
no flags Details
Output of dmesg (62.04 KB, text/plain)
2009-10-29 09:30 UTC, Geir Ove Myhr
no flags Details
Xorg.0.log (19.42 KB, text/x-log)
2009-10-29 09:30 UTC, Geir Ove Myhr
no flags Details
Current set of logs and buffer dumps (243.59 KB, application/octet-stream)
2009-10-31 16:46 UTC, Robert Fendt
no flags Details
Updated set of logs and buffer dumps (222.18 KB, application/x-gtar)
2009-11-10 13:00 UTC, Robert Fendt
no flags Details
Batchbuffer dumps after X froze while playing Mah Jong. (128.22 KB, application/x-compressed-tar)
2010-01-18 10:06 UTC, Juho-Mikko Pellinen
no flags Details
All the debug info I could gather. (251.44 KB, application/x-compressed-tar)
2010-01-20 09:12 UTC, Juho-Mikko Pellinen
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Geir Ove Myhr 2009-10-29 09:28:08 UTC
Created attachment 30807 [details]
dri_debug-2009-10-10.tar.gz

Forwarding a bug report from ubuntu user kalessin:

[Problem]
X server freezes shortly after startup.

[Original bug report]
Summary: X server freezes shortly after startup. The problem is similar to several problems already reported with the intel driver, but opening a separate bug report (for the time being) since none of the existing entries quite matches my configuration. Feel free to mark it as duplicate if you think it is.

System: Kubuntu 'Karmic', updated yesterday (2009-10-09). Buffer dump is attached.

Description: The X server locks up shortly after it is started, no matter what I do. Usually this happens when the login procedure is just about finished, but I have also seen it after KDM waiting for some time for the login credentials. Mouse pointer remains functional, apart from that complete lock (esp. keyboard). Magic sys'req key still works, but when I try SReq-(R,E,I), KDM seems to try to restart X, resulting in a *really* complete lockup (even my ssh diagnostics session dies).

Driver version: 2.9.0
X.Org X Server: 1.6.3
Comment 1 Geir Ove Myhr 2009-10-29 09:29:20 UTC
Created attachment 30808 [details]
Output of intel_gpu_dump
Comment 2 Geir Ove Myhr 2009-10-29 09:30:02 UTC
Created attachment 30809 [details]
Output of dmesg
Comment 3 Geir Ove Myhr 2009-10-29 09:30:31 UTC
Created attachment 30810 [details]
Xorg.0.log
Comment 4 Robert Fendt 2009-10-31 16:46:11 UTC
Created attachment 30873 [details]
Current set of logs and buffer dumps

I have re-tested today with current mainline kernel and newest available X.org drivers (from 'xorg-edgers' PPA in Ubuntu). No apparent change, other than now I cannot read from /sys/kernel/debug/dri/0/i915_regs without causing the system to lock immediately.

Summary of current test configuration:
-distribution: Kubuntu 'Karmic', updated 2009-10-31
-graphics chipset i855GM
-architecture: i386, Pentium M
-Centrino chipset
-driver package: xserver-xorg-video-intel 2:2.9.0-1ubuntu2
-kernel version: mainline, 2.6.32-020632rc5-generic
Comment 5 legolas558 2009-11-05 15:07:29 UTC
downstream issue at Arch Linux: http://bugs.archlinux.org/task/16974
Comment 6 Robert Fendt 2009-11-10 13:00:35 UTC
Created attachment 31101 [details]
Updated set of logs and buffer dumps

Slightly updated test case (hopefully not too many changed unknowns, had to do an system update):

-mainline kernel 2.6.32-020632rc6-generic, rc5 is still on the machine if wanted
-current X.org drivers from 'xorg edgers' PPA: 2:2.9.0+git20091106.dbb68168-0ubuntu0tormod

I have also enabled the Option DebugFlushCaches in xorg.conf. Updated batchbuffer dumps and logs are attached.

I do not know if this is relevant, but the X freeze produces lots of repeated lines in dmesg like this:

--------------------snip!--------------------
[  180.008194] [drm] DAC-6: set mode 640x480 0
[  180.498662] [drm] DAC-6: set mode 640x480 0
[  288.356030] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[  288.356040] render error detected, EIR: 0x00000000
[  288.356044] i915: Waking up sleeping processes
[  288.356055] [drm:i915_wait_request] *ERROR* i915_wait_request returns -5 (awaiting 6680 at 6679)
[  288.356381] [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
[  288.356574] [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
[  288.356618] [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
[  288.356656] [drm:i915_gem_execbuffer] *ERROR* Execbuf while wedged
...
--------------------snip!--------------------
Comment 7 nepo 2009-11-18 02:56:20 UTC
I do experience exactly the same probs, freezing shortly before or after login, mouse is still movable. My OpenSuse 11.2 runs on a Fujitsu-Siemens M7400 Laptop with the Intel 855GM graphics.
If info is needed from my side, please tell me which logs etc can be helpful.

I'm interested in a solution, too :-)

Best,
Daniel
Comment 8 legolas558 2009-11-18 04:48:04 UTC
@mitropaman: we have the same hardware (mine is a clone of yours), but there is no solution right now. I guess that these bugs will be fixed by intel developers sooner or later but it might be a few months, a year, or never, since this hardware is going to be unsupported because of age...

Yes, it's not nice, since our laptop has always ran smoothly, but that's reality :(

Also, I am not able to get a working Xorg development stack running because of other kernel KMS issues regarding the i915 implementation...so debugging is harder.

Only workaround for now is to use the old drivers and to boot a kernel without KMS
Comment 9 Geir Ove Myhr 2009-11-19 05:57:14 UTC
Some updated information from downstream: 
- The kernel parameter acpi=off stops the freezes. I don't know exactly what is turned off with this. At least it turns off KMS, so kalessin had to revert to the karmic -intel driver whick has UMS support to test this. 
- The kernel parameter nomodeset still produces freezes, so the fact that it stops with acpi=off is not due to using UMS instead of KMS.

We have several 855GM freeze bug reports in Launchpad (https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bugs?field.tag=855gm%20freeze&field.tags_combinator=ALL) and many af them have batchbuffer dumps attached. A few of them have also been reported here already.
Comment 10 racoonator 2009-11-25 12:24:21 UTC
Hello,

I've tried the Ubuntu 9.10 on my Asus A3N laptop which have a 855GM chipset inside.

Same problem for me, X freezes quickly, just the mouse stil move.

Can't find any messages in the log.

I've tried to pass all the possible parameters to the kernel at boot time, just acpi=off stop freezes, but in this case the Wifi doesn't works.

Regards,

Stéphane
Comment 11 Evgeny Khrustov 2009-11-26 00:17:35 UTC
Stéphane,
As workaround you may use vesa driver with "nomodeset" kernel option. this workaround works for me on such notebook
Comment 12 nepo 2009-11-26 22:53:46 UTC
Dear Intel-Team,

just want to kindly ask what your assessment of this bug is and - if possible - if you can roughly roughly estimate how long it might take to solve it. I know the hardware is old, but there seem to be some more people
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/447892
having the same probs.

Thanks!
D.
Comment 13 racoonator 2009-12-07 11:56:28 UTC
Dear Intel-Team,

Like mitropaman said, this bugs seems to affect lot of people even if the hardware is a little old. I've found many bug reports about the freezes with 855gm.

My laptop is old, but I hope one day I'll be able to run recent browser like firefox 3.5 included in the most recent linux distribution. 

I know it's the good time of year to have this dream, maybe an old man dressed in red will give us soon the solution. :-)  It'll be so great !

More seriously, I know this is a hard task to debug this kind of program, so I'm always ready to help if needed.

Regards,

Stéphane

Comment 14 Peter 2009-12-12 05:22:34 UTC
00:02.0 VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 01)
00:02.1 Display controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 01)

media-libs/mesa-7.5.2
x11-base/xorg-server-1.7.3
x11-drivers/xf86-video-intel-2.9.1
kernels from 2.6.29 to 2.6.32

System freeze and there is nothing I've managed to do: even sysrq with remote network console shows nothing. Some time ago it worked with acpi=off (or more exactly with pci=noacpi), but with updated software versions intel driver just freeze system completely (ssh hangs too).

Current workaround - use vesa driver.
Comment 15 Vladimir 2009-12-15 06:46:21 UTC
Hello.

openSuse 11.2 on ASUS A3L with  Intel 852GM

(--) PCI:*(0:0:2:0) 8086:3582:1043:1712 rev 2
(--) PCI: (0:0:2:1) 8086:3582:1043:1712 rev 2

X server freezes after 30s- 1 min startup.
Help only acpi=off

Driver version: 2.9.1	
X.Org X Server: 1.6.5
Comment 16 Vasyl Demin 2010-01-01 06:33:11 UTC
Same problem on laptop HP Compaq nx9020
Display controller: Intel Corporation 82852/855GM Integrated Graphics
Device (rev 02)

OS: Arch Linux i686
kernel-2.6.32.2
xorg-server-1.7.3.902
xf86-video-intel-2.9.1
mesa-7.7
KMS is off

System freeze immediately after X startup. I didn't find Xorg.0.log. Apparently, he didn't have time to create.
Comment 17 Geir Ove Myhr 2010-01-03 08:15:27 UTC
I have built some test kernels in order to bisect it down. I have had three people (not yet including the original reporter) verify that that the bad commit is 

From: Eric Anholt <eric@anholt.net>
Date: Thu, 10 Sep 2009 17:48:48 -0700
Subject: [PATCH] agp/intel: Fix the pre-9xx chipset flush.

commit e517a5e97080bbe52857bd0d7df9b66602d53c4d upstream.

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.31.y.git;a=commit;h=7abf3aa8294d95e3f0be375f30e8d933f874ada0

The ubuntu kernel packages are available at http://www.kvante.info/855GMfreeze/. The two packages that are before and after the bad commit are
      linux-image-2.6.31.1-855gmtest599-git522bb74_gomyhr1_i386.deb
      linux-image-2.6.31.1-855gmtest600-git7abf3aa_gomyhr1_i386.deb
Comment 18 Peter 2010-01-04 00:26:41 UTC
Hm, but I don't see Author of commit anywhere here... Eric, please, take a look on comment #17.
Comment 19 Juho-Mikko Pellinen 2010-01-04 09:18:28 UTC
I'm having similar problems with a Asus M5N (Pentium 4M, 855GM) and tried to test the kernel packages from http://www.kvante.info/855GMfreeze/. But sadly those require CPU which supports PAE and Pentium4M does not support that.
If Geir Ove Myhr could produce the kernels 599 and 600 without PAE I could test those.

When starting in recovery-mode as "netroot" and then executing "startx", X will freeze after showing the desktop for ~5 seconds, but mouse keeps moving. I can then switch back to VT1 and kill the running X.
Booting normally causes a bad freeze before network interfaces come up.

I'm currently running up-to-date Lucid Lynx with the hope that this bug will be fixed soonish.
Comment 20 Geir Ove Myhr 2010-01-05 05:22:17 UTC
(In reply to comment #19)
> If Geir Ove Myhr could produce the kernels 599 and 600 without PAE I could test
> those.

I have packaged the kernels without PAE and put them at http://www.kvante.info/855GMfreeze/ too:
      linux-image-2.6.31.1-855gmtest599-git522bb74_gomyhr2nopae_i386.deb
      linux-image-2.6.31.1-855gmtest600-git7abf3aa_gomyhr2nopae_i386.deb
Comment 21 legolas558 2010-01-07 21:05:04 UTC
I have tried mainline 2.6.33 with "patch -p1 -R" of patch specified in comment 17 and it gets even worse, since it crashes when loading modules, possibly the one for the framebuffer console, so I can't even start Xorg
Comment 22 HP Charles 2010-01-08 07:01:59 UTC
(In reply to comment #9)
> Some updated information from downstream: 
> - The kernel parameter acpi=off stops the freezes. I don't know exactly what is
> turned off with this. At least it turns off KMS, so kalessin had to revert to
> the karmic -intel driver whick has UMS support to test this. 
> - The kernel parameter nomodeset still produces freezes, so the fact that it
> stops with acpi=off is not due to using UMS instead of KMS.

I have similar problem on FreeBSD 8 with xf86-video-intel-2.7.1 on an atom based machine with "<Intel 82945G (945G GMCH) SVGA controller>"

Machine description (in french) http://www.ldlc.com/fiche/PB00089880.html

Symptom : Freeze for 2/3 mn at
  - xdm start
  - session start
  - graphic output using shm

Tried :
  - use vesa mode : it works but I can't use the 1910x1080 resolution : unusable
  - downgrade xf86-video-intel to 2.6 branch : same symptom
  - disable shm with 
 Section "Extensions"
      Option "MIT-SHM" "Disable"
 EndSection
 in /etc/X11/xorg.conf : same symptom
  - disable acpi (hint.acpi.0.disabled="1" in /boot/loader.conf) : it works but I can't use the multi thread capability of my Atom230 which is a pity on this machine.
Comment 23 Juho-Mikko Pellinen 2010-01-08 07:14:43 UTC
I just tried the new nopae-kernels, but neither of those did improve anything.
I had updated my Lucid Lynx prior and I got new kernel (2.6.32-9) with which I
got my short time record of 9 minutes without crashing. (Recovery-mode,
netroot, startx as root).

After crash I were able to switch back to VT1 and I saw the error message
"Input/Output error" outputted by startx to the shell screen. I couldn't
reproduce the error yet.
Comment 24 Geir Ove Myhr 2010-01-10 14:03:41 UTC
From more testing downstream it seems that the story of this bug is already well documented at bug 21826. The freeze that most people experience today is triggered by the patch http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.31.y.git;a=commit;h=7abf3aa8294d95e3f0be375f30e8d933f874ada0 (already mentioned in comment #17). That this commit caused frequent freezes is clear from comments #26 and #27 in bug 21826. 

Bug 21826 was originally about infrequent freezes that happened before that commit. It has also been observed downstream that reverting to a kernel before that commit stops the immediate freezes, but it will still freeze. 

In ubuntu we currently have 8 open bug reports [1] for what seems to be this same problem on 855GM. In total they have 54 unique subscribers, which makes it one of the highest profile graphics issues in the ubuntu bug tracker. 

So far we have identified the commit that started the freezes, we have batchbuffer dumps attached here and many more in the downstream bug reports. Some people have tried to downgrade the intel driver to see if it would stop the freezes, but the results have been negative so far. Is there anything more we can do without knowing the details about how this hardware works?

[1]: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bugs?field.tag=855gm%20freeze&field.tags_combinator=ALL
Comment 25 legolas558 2010-01-11 10:10:40 UTC
I have tried reverting the patch of that commit but things do not improve, I currently cannot use any recent intel driver and I am stuck with version 2.3.2-3 (and the appropriate Xorg and other libraries). Also, framebuffer console + KMS creates a black screen (turned off); if inertia does his job, I'll probably end up using outdated intel drivers forever or change hardware.
Comment 26 Juho-Mikko Pellinen 2010-01-18 10:03:59 UTC
I followed the instructions at https://wiki.ubuntu.com/X/Troubleshooting/Freeze#How%20to%20Get%20a%20Batchbuffer%20Dump%20%28-intel%20only%29 and http://intellinuxgraphics.org/how_to_report_bug.html and I got the attached batchbuffer dumps.

In my case, the mouse cursor is also frozen when X freezes.

System environment:
-- chipset:  855GM
-- system architecture: 32-bit
-- xf86-video-intel: 2:2.9.1-1ubuntu1
-- xserver: 2:1.7.3.902-1ubuntu8
-- mesa: 7.7-0ubuntu6
-- libdrm: 2.4.17-0ubuntu1
-- kernel: 2.6.32-10-generic
-- Linux distribution: Ubuntu Lucid Lynx 10.04 dev branch
-- Machine or mobo model: Asus M5000 (laptop)
-- Display connector: LVDS (VGA available, but not used)
     LVDS has the native resolution of 1024x768

Reproducing steps:
Log in to X. Either wait for variable time, browse the system menus or start some application.

SSH works until I try to copy the files from /sys/kernel/debug/dri/0/ .
At that time the system crashes so badly that even sysrq stops responding.
The SSH-sessions die at the same time.

Additional info:

Without logging in to X, the computer is fully stable.
Comment 27 Juho-Mikko Pellinen 2010-01-18 10:06:36 UTC
Created attachment 32696 [details]
Batchbuffer dumps after X froze while playing Mah Jong.
Comment 28 Juho-Mikko Pellinen 2010-01-20 09:12:48 UTC
Created attachment 32745 [details]
All the debug info I could gather.

My system info once again.

One more crash-report with all the files from /sys/kernel/debug/dri/0/ I could gather (i915_regs crashes the computer).

System environment:

-- chipset: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
-- system architecture: i686
-- xf86-video-intel: 2:2.9.1-1ubuntu1
-- xserver:2:1.7.3.902-1ubuntu9
-- mesa: 7.7-0ubuntu7
-- libdrm: 2.4.17-0ubuntu1
-- kernel: 2.6.32-10-generic
-- Linux distribution: Ubuntu 10.04 development branch
-- Machine or mobo model: Asus M5000 (P4M-based)
-- Display connector: LVDS (VGA available, not used)

Reproducing steps:

Just browse the system-menus of Gnome.
Comment 29 Michal Nowak 2010-02-12 13:28:31 UTC
Just to let anyone know that I have been pointed out to workaround yesterday.

I filed bug http://bugzilla.kernel.org/show_bug.cgi?id=15248 that KMS is breaking booting shortly after start. I was suggested to try a one-liner patch in comment #5 (http://bugzilla.kernel.org/show_bug.cgi?id=15248#c5) which I believe works around the issue - tested with 2.6.33-rc6.

Boot running OK and even X starts fine (xorg intel drive 2.9.1), with 2.10.1 I verified Xv works (my former experience with this version + 2.6.31.x was that the combination is rather unstable - X crashing every 5 minutes).

Considering update to kernel 2.6.{32,33} is doable sooner or later, is there anything to be fixed in intel xorg driver regarding this issue?
Comment 30 legolas558 2010-02-12 17:50:33 UTC
sorry but the bug is not fixed for me, and we have the exact same hardware; I have added proper information on the kernel bug you specified ([1])

[1] http://bugzilla.kernel.org/show_bug.cgi?id=15248
Comment 31 Michal Nowak 2010-02-15 02:00:38 UTC
(In reply to comment #30)
> sorry but the bug is not fixed for me, and we have the exact same hardware; I
> have added proper information on the kernel bug you specified ([1])
> 
> [1] http://bugzilla.kernel.org/show_bug.cgi?id=15248
> 

Did you tried it *without* 'nomodeset' and *with* latest intel xorg driver?
Comment 32 legolas558 2010-02-15 03:58:08 UTC
(In reply to comment #31)
> (In reply to comment #30)
> > sorry but the bug is not fixed for me, and we have the exact same hardware; I
> > have added proper information on the kernel bug you specified ([1])
> > 
> > [1] http://bugzilla.kernel.org/show_bug.cgi?id=15248
> > 
> 
> Did you tried it *without* 'nomodeset' and *with* latest intel xorg driver?
> 

Sure, I am not *that* bad
Comment 33 Michal Nowak 2010-02-15 04:11:20 UTC
OK, just asking because you originally wrote otherwise: "System is booted with 'nomodeset nofb'...", if I understand it correctly. I am pretty surprised the patch does not work for your box since they are equal... Hmm.
Comment 34 legolas558 2010-02-15 05:16:29 UTC
(In reply to comment #33)
> OK, just asking because you originally wrote otherwise: "System is booted with
> 'nomodeset nofb'...", if I understand it correctly. I am pretty surprised the
> patch does not work for your box since they are equal... Hmm.
> 

Yes it was my mistake, I ought to say *without* 'nomodeset nofb'.

I currently can't understand if this is a kernel or Xorg issue, looks more like a kernel issue. I have provided relevant information in:

http://bugzilla.kernel.org/show_bug.cgi?id=15248 Associated kernel bug
http://bugzilla.kernel.org/attachment.cgi?id=25021 My 'lspci -v'
http://bugzilla.kernel.org/attachment.cgi?id=25049 My kernel .config
http://bugzilla.kernel.org/attachment.cgi?id=25047 dmesg
http://bugzilla.kernel.org/attachment.cgi?id=25048 Xorg output

Please note in last attachment, Xorg output, that Xorg.0.log is not produced because of a hard kernel crash, so on Xorg's side it would be interesting to know how Xorg is causing this hard kernel crash (perhaps by activating DRM code sections?) I have ran Xorg with '-verbose -verbose -verbose -keeptty' but did not get more output on stdout/stderr (since Xorg.0.log generation is not working at all)

Any help is welcome - thanks
Comment 35 Michal Nowak 2010-02-15 07:50:04 UTC
The only possibility how to strip down the issue looks to me if you install the same software as I did, perhaps Fedora includes additional patch? I have Fedora 12 with most recent xorg-x11-drv-intel [1], Fedora 13 kernel [2] (patch it yourself) and libdrm-2.4.17-1.fc12.

[1] http://koji.fedoraproject.org/koji/packageinfo?packageID=7794
[2] http://koji.fedoraproject.org/koji/packageinfo?packageID=8
[3] http://koji.fedoraproject.org/koji/buildinfo?buildID=149580
Comment 36 legolas558 2010-02-15 08:37:25 UTC
(In reply to comment #35)
> The only possibility how to strip down the issue looks to me if you install the
> same software as I did, perhaps Fedora includes additional patch? I have Fedora
> 12 with most recent xorg-x11-drv-intel [1], Fedora 13 kernel [2] (patch it
> yourself) and libdrm-2.4.17-1.fc12.
> 
> [1] http://koji.fedoraproject.org/koji/packageinfo?packageID=7794
> [2] http://koji.fedoraproject.org/koji/packageinfo?packageID=8
> [3] http://koji.fedoraproject.org/koji/buildinfo?buildID=149580
> 

I am gonna give FC12 a try, since it most probably isn't due simply to different kernel .config
Comment 37 legolas558 2010-02-16 01:05:28 UTC
1) the stock FC12/FC13 kernels already have some patch which fixes the kernel for this hardware, since I can boot successfully with 'nomodeset' and start Xorg, while on my vanilla 2.6.32/2.6.33 I can't boot at all without 'nomodeset'

2) the FC13 kernel (2.6.33-rc8) + 855nolid.patch (J.Barnes small patch about LID on 855GM) can boot but Xorg crashes before the loading screen finishes loading, so I never reach the login manager

So now I think it is necessary to identify which Red Hat patches are fixing kernel's DRM so that recent Xorg can work, perhaps I'll need another bug tracker

Anyone?
Comment 38 legolas558 2010-02-17 04:12:57 UTC
bug is fixed for me when using these patches with latest linus tree (2.6.33-rc8):

- http://bugzilla.kernel.org/attachment.cgi?id=25019 - 855nolid.patch by jbarnes
- http://bugzilla.kernel.org/attachment.cgi?id=25084 - drm-intel-big-hammer.patch from FC13 kernel patches

Now only a garbled fonts issue remains, but no more crashes up to now (had 2 hours of uptime already)
Comment 39 Michal Nowak 2010-02-17 04:37:07 UTC
I'd appreciate if others with 855GM tests patches and kernel (with AGP driver compiled-in) Daniele suggested because we have slightly different results (Daniele has problem with font rendering, while I don't have).
Comment 40 legolas558 2010-02-17 06:05:56 UTC
(In reply to comment #39)
> I'd appreciate if others with 855GM tests patches and kernel (with AGP driver
> compiled-in) Daniele suggested because we have slightly different results
> (Daniele has problem with font rendering, while I don't have).
> 

This is not correct, because you are testing by using a modified Fedora Core, please test with the vanilla kernel soup I suggested in previous comment; by making the same tests we can say if there is a difference in our findings.

When I tested your FC12 mix I got a Xorg crash before desktop display, and this is currently the only discrepancy in our single common experience (same hardware+same software, but different results). I will try again with your suggested mix next weekend (I can't before because I don't have the hard disk where fedora is).

There have been also other people using the vanilla kernel and reporting the font rendering glitch after applying the 855nolid.patch; I think that by trying with the vanilla kernel mix that I am using you will confirm this findings, e.g. a font garbled display. Perhaps FC13 kernel has another patch which fixes also this, and we yet have to identify it, so that is why you are not getting this rendering issue.

Thanks
Comment 41 gmud 2010-02-26 09:41:50 UTC
legolas558, 

>bug is fixed for me when using these patches with latest linus tree (2.6.33-rc8):

I tested the patches, too. Now I don't have freezes right at the login screen (gdm), but after a while using my system. The freeze is somehow different, the GPU hangs but I can still switch to another console now (that wasn't possible before I tested 2.6.33-rc8 with patches).

Is your system still stable with the patches? Would you mind telling us which version of the mesa etc you are using?
Comment 42 legolas558 2010-02-28 05:41:08 UTC
(In reply to comment #41)
> legolas558, 
> 
> >bug is fixed for me when using these patches with latest linus tree (2.6.33-rc8):
> 
I later discovered that system is stable until you make a CPU/GPU-intensive usage; watching a video always causes crash.

> I tested the patches, too. Now I don't have freezes right at the login screen
> (gdm), but after a while using my system. The freeze is somehow different, the
> GPU hangs but I can still switch to another console now (that wasn't possible
> before I tested 2.6.33-rc8 with patches).
> 
> Is your system still stable with the patches? Would you mind telling us which
> version of the mesa etc you are using?
> 

When it crashes I can switch to VTs but it is no more possible to start Xorg, the GPU seems left in an inconsistent status.

I am using mesa 7.7, I can switch to the git Xorg development stack if necessary.

I still believe that mnowak's test would make the difference, and possibly RedHat might know which patch does the fix
Comment 43 legolas558 2010-02-28 13:01:04 UTC
Now using:

linus git kernel + 855nolid.patch + drm-intel + drm-intel-big-hammer.patch
xorg-server 1.7.5
xf86-video-intel 2.10.0
intel-dri 7.7
libdrm-git 20100217

System seems much more stable, the font glitches are minimal and disappear after moving/changing windows (to cause a refresh)

I will report if any crash happens
Comment 44 Chris Wilson 2010-03-02 08:06:20 UTC
The drm-intel-big-hammer.patch from FC13 is insufficient to fix these CPU/GPU coherency issues. In short this is a dup of bug 26345.

*** This bug has been marked as a duplicate of bug 26345 ***
Comment 45 legolas558 2010-03-02 09:59:47 UTC
(In reply to comment #44)
> The drm-intel-big-hammer.patch from FC13 is insufficient to fix these CPU/GPU
> coherency issues. In short this is a dup of bug 26345.
> 
> *** This bug has been marked as a duplicate of bug 26345 ***
> 

False. Your patch does not fix the bug for me, instead I need to continue using drm-intel-big-hammer.patch coupled with 855nolid.patch; I have the 855GM rev 02 chipset.
Comment 46 Chris Wilson 2010-03-02 10:12:53 UTC
(In reply to comment #45)
> False. Your patch does not fix the bug for me, instead I need to continue using
> drm-intel-big-hammer.patch coupled with 855nolid.patch; I have the 855GM rev 02
> chipset.

The font glitches are evidence alone that the wbinvd() is insufficient. It is even more apparent when using a reliable test case for the incoherency issue.
Comment 47 legolas558 2010-03-02 10:18:31 UTC
(In reply to comment #46)
> (In reply to comment #45)
> > False. Your patch does not fix the bug for me, instead I need to continue using
> > drm-intel-big-hammer.patch coupled with 855nolid.patch; I have the 855GM rev 02
> > chipset.
> 
> The font glitches are evidence alone that the wbinvd() is insufficient. It is
> even more apparent when using a reliable test case for the incoherency issue.
> 

There is no reliable testcase since it cannot be even run! With your patch alone Xorg will live 1-2 seconds at max, crashing without running anything but itself. This is the same result without your patch. I can run with both patches if you want, but the "necessity" flag stays on drm-intel-big-hammer.patch and 855nolid.patch for now
Comment 48 Chris Wilson 2010-03-02 10:29:28 UTC

*** This bug has been marked as a duplicate of bug 26345 ***
Comment 49 legolas558 2010-03-02 12:09:57 UTC
it is not a duplicate, I have tested it on i855GM hardware as per previous comments
Comment 50 Carl Worth 2010-03-02 12:51:00 UTC
Chris has been looking into similar bugs, so re-assigning to him.

-Carl
Comment 51 Scott Hansen 2010-03-04 21:14:52 UTC
I have:
00:02.0 VGA compatible controller: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 01)

I've also experienced the failure for X to start with the most recent intel driver and kernel 2.6.32 (Archlinux). 
i686, intel driver 2.10.0, xorg-server 1.7.5, libgl 7.7, mesa 7.7. KMS enabled.

I get the following errors in Xorg.log when it freezes:
(EE) intel(0): Failed to submit batch buffer, expect rendering corruption or even a frozen display: Input/output error.

I compiled the vanilla git kernel 2.6.33 with the two patches listed below (lid  patch and big-hammer) and while I at least got X to start successfully, it still freezes after awhile or as soon as I start switching between virtual terminals or using graphic-intensive applications. DWM runs for much longer than XFCE!!

I can help troubleshoot on my hardware, if it's needed.

Let me know if I can provide any further info.

Scott
Comment 52 Geir Ove Myhr 2010-03-04 23:55:29 UTC
(In reply to comment #51)
> I compiled the vanilla git kernel 2.6.33 with the two patches listed below (lid
>  patch and big-hammer) 

Scott, there is a patch at bug 26345. It would probably be useful if you could test on top of drm-intel-next kernel that and if it is still a problem, get i915_error_state. See details at that bug report.
Comment 53 Daniel Vetter 2010-03-18 06:37:55 UTC
I've created a preliminary patch that fixes gtt related cache coherency problems at least for my i855GM. Look here for instructions:

http://bugs.freedesktop.org/show_bug.cgi?id=26345#c61
Comment 54 legolas558 2010-03-19 04:32:59 UTC
(In reply to comment #53)
> I've created a preliminary patch that fixes gtt related cache coherency
> problems at least for my i855GM. Look here for instructions:
> 
> http://bugs.freedesktop.org/show_bug.cgi?id=26345#c61
> 

With your patch I get the above mentioned crash:

[drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung

after playing a video.

If no video is played, Xorg seems quite stable (this is the most stable driver/Xorg combination since introduction of KMS in kernels).
Comment 55 Tony White 2010-03-24 21:40:03 UTC
Hi all,
Confirming the same 855GM revision 2 woes here too.

If comment #17 is true then commit e517a5e97080bbe52857bd0d7df9b66602d53c4d should be reverted upstream and marked REGRESSION.

Six months and no fix even after I said it didn't work before it was merged :
https://bugs.freedesktop.org/show_bug.cgi?id=23919#c9
Code that worked perfectly well has been dropped so that i915 works better.
855GM users don't care if your i915 card works much better. The old version worked just fine for us. Why break it and provide absolutely no working fix at our expense.

I tried the drm-intel-big-hammer.patch in this :

http://koji.fedoraproject.org/koji/buildinfo?buildID=163170

kernel
(* Mon Jan 11 2010 Kyle McMartin <kyle@redhat.com>
- drm-intel-big-hammer: fix IS_I855 macro.)

And while it prevents x freezing immediately after starting, x will only stay up for about an hour before crashing in exactly the same original way.

Is the patch in comment #53 designed to further improve the chances of 855GM actually working again when coupled with the big-hammer patch or is that new patch to be used without the big-hammer patch?

Just to so it's clear, Intel 855 owners have not been able to run current kernel versions with desktop Linux distributions for over six months now.
Comment 56 Daniel Vetter 2010-03-25 01:20:07 UTC
> --- Comment #55 from Tony White <tonywhite100@googlemail.com>  2010-03-24 21:40:03 PST ---
> Six months and no fix even after I said it didn't work before it was merged :
> https://bugs.freedesktop.org/show_bug.cgi?id=23919#c9
> Code that worked perfectly well has been dropped so that i915 works better.
> 855GM users don't care if your i915 card works much better. The old version
> worked just fine for us. Why break it and provide absolutely no working fix at
> our expense.

Please stop complaining. Chris Wilson and lately I have been beating our
heads against this problem for _weeks_. So yes, your problem _is_ being
looked at, it's just one of these _very_ hard things. And simply reverting
this commit is not an option, because it breaks stability for other
people.

I understand your frustration, after all I almost threw my i855 out the
window many times already ;) So if you want to help out (testing stuff is
always appreciated), I've opened a bz entry to track my latest trials at
fixing this:

http://bugs.freedesktop.org/show_bug.cgi?id=27187

Patch should apply against drm-intel-next (see the bug report for latest
details). You also need the very latest git version of xf86-video-intel,
Chris has fixed some unrelated hangs recently.
Comment 57 legolas558 2010-03-25 02:24:38 UTC
@Tony White: I am also an i855GM rev02 user and I have experienced that kind of frustration. The workaround for me was to use Xorg 1.6 and its compatible drivers, and if it was hard for me I can barely guess how hard it must have been for less experienced users than us.

Having tried most of the patches around, I can confirm that D.Vetter's patch is much better than the DRM intel big hammer patch; actually the "big hammer" produced a barely usable system (5 minutes at most), while I am now using Xorg 1.7 and the new intel drivers for more than a week with D.Vetter's patch.

I haven't yet tried the latest version of his patch (will do ASAP), but even the first version never crashes unless you play a video, and that might be the separate bug 26723

So thanks to Daniel, and let's help them to make this out! Yes it's a shame that a patch which broke older hardware in favor of new hardware was accepted without a glitch, but I wouldn't blame the people which is now working to fix this :)

I have personally been admitting to newbie linux users that linux was broken for their hardware, and that there was no short-term solution for them (and neither for me) - so perhaps this is changing right now.
Comment 58 daniel 2010-03-25 06:50:28 UTC
Hi!
is there any need for another "driver-tester"? I would like to help because I am also affected by this problem.
If needed, how could I help?
Thanks
Comment 59 Geir Ove Myhr 2010-03-25 06:56:59 UTC
(In reply to comment #58)
> is there any need for another "driver-tester"? I would like to help because I
> am also affected by this problem.
> If needed, how could I help?

Daniel, if you head over to bug 27187, there is one DDX driver patch and one kernel patch you can test and provide feedback from.
Comment 60 Tony White 2010-04-08 17:11:25 UTC
@All
My comment was aimed only at whoever merged what was merged after I had posted that the very first patch would make this issue worse and whoever made the decision to remove ums support from the i915 gpu driver when 855 support was broken. I was angry at them. No one else. I'm sorry if my shouting was annoying but it appears I am being ignored when I report that something is not working here.
If I stop reporting, I won't get ignored. I can just go ahead and ignore myself instead.

With all the 855 threads on this bugzilla, I had read though a few of them before making that comment and couldn't see anything reassuring me that the problem was even fixable. Because it is an old card, I was guessing that the problem was being ignored and maybe because kms wasn't possible at all on the 855GM or no one with the ability to write code in C for the kernel had an 855GM machine. I guessed wrong, obviously but it wasn't until after I posted here, I found the separate thread Chris, Daniel and legolas are working on.

That's cool guys. You are actually trying to fix this problem and I and every other desktop Linux user with an 855 are in debt to you guys for trying so hard.
Thank You!
I had been patient for six months, I didn't see any chance of anything happening and thought the worst.

So, sorry about the complaining, It wasn't obvious to me that you guys were working on it and I won't post anything more until I can confirm that the latest patch works.
Thanks for all the effort gone into to try to fix this. I do, as I'm sure many others, appreciate it greatly.
Comment 61 legolas558 2010-04-08 18:44:44 UTC
@Tony: I think the bug was crushed with latest patch modification in bug 27187. The kernel patch is sufficient.

Also I no more verify all the other bugs so it is highly probable that it fixed also this one; I hope the original reporter and other people can confirm this.
Comment 62 Andrej Podzimek 2010-05-30 08:39:53 UTC
2.6.34 freezes exactly the same way as reported above. (CPU hung error in dmesg and a big bunch of I/O errors in Xorg.0.log.)

Moving large objects on the screen (scrolling a folder view, dragging a window) triggers the freeze almost immediately.

Is there a patch that can be applied to a 2.6.34 vanilla kernel and is supposed to fix this? AFAIK, the solution suggested here (https://bugs.freedesktop.org/show_bug.cgi?id=27187) still does not work properly.
Comment 63 legolas558 2010-05-30 08:44:21 UTC
(In reply to comment #62)
> 2.6.34 freezes exactly the same way as reported above. (CPU hung error in dmesg
> and a big bunch of I/O errors in Xorg.0.log.)
> 
> Moving large objects on the screen (scrolling a folder view, dragging a window)
> triggers the freeze almost immediately.
> 
> Is there a patch that can be applied to a 2.6.34 vanilla kernel and is supposed
> to fix this? AFAIK, the solution suggested here
> (https://bugs.freedesktop.org/show_bug.cgi?id=27187) still does not work
> properly.

This bug is perfectly fixed for me with patch in bug 27187
Comment 64 legolas558 2010-05-30 08:47:53 UTC

*** This bug has been marked as a duplicate of bug 27187 ***


bug/show.html.tmpl processed on Mar 29, 2017 at 19:07:05.
(provided by the Example extension).