Bug 20453 - (xf86-video-intel) [865] GPU hang at server start, 2.7.1/2.6.30.1
(xf86-video-intel)
[865] GPU hang at server start, 2.7.1/2.6.30.1
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Driver/intel
7.4 (2008.09)
x86 (IA32) Linux (All)
: medium critical
Assigned To: Eric Anholt
Xorg Project Team
: NEEDINFO
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-04 01:50 UTC by Georg Grabler
Modified: 2009-10-19 13:31 UTC (History)
4 users (show)

See Also:


Attachments
Xorg.0.log using xf86-video-intel 2.6.3 (29.59 KB, text/plain)
2009-03-10 01:51 UTC, Georg Grabler
no flags Details
New XOrg.0.log (29.57 KB, text/plain)
2009-07-09 06:43 UTC, Georg Grabler
no flags Details
dmesg of the crash (26.38 KB, text/plain)
2009-07-09 06:44 UTC, Georg Grabler
no flags Details
/var/log/messages of the crash (40.02 KB, application/octet-stream)
2009-07-09 06:44 UTC, Georg Grabler
no flags Details
intel_gpu_dump of the crash (134.61 KB, application/x-gzip)
2009-07-16 01:39 UTC, Georg Grabler
no flags Details
intel_gpu_dump of 2.7.1 (127.14 KB, application/x-gzip)
2009-07-16 01:50 UTC, Georg Grabler
no flags Details
crash with patch by Eric Anholt (133.83 KB, application/x-gzip)
2009-07-16 02:24 UTC, Georg Grabler
no flags Details
intel_gpu_dump (2.7.99.902 + patch by Eric) (132.64 KB, application/x-gzip)
2009-07-16 03:23 UTC, Georg Grabler
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Georg Grabler 2009-03-04 01:50:55 UTC
Recently, I installed xorg-server 1.6.0 and xf86-video-intel 2.6.2 on my computer, which has the following graphics card:

00:02.0 VGA compatible controller: Intel Corporation 82865G Integrated Graphics Controller (rev 02)

I'm not using any xorg.conf, so I'm starting X with the default configuration file, not using composite.

Now, when I start up, X actually does start, but freezes the X server some seconds later (and since ctrl+alt+F1 isn't possible most likely evdev within this as well)
There are no errors displayed in the Xorg.0.log, and there is no output about a crash.
The only thing you can do is to push the reset button of your computer.

Package versions are quite up-to-date:
xorg-server 1.6.0-1
xorg-server-utils 7.4-2
intel-dri 7.3-1
xf86-video-intel 2.4.3-1
xf86driproto 2.0.4-1
dri2proto 1.99.3-1
libdrm 2.4.5-1
mesa 7.3-1
glproto 1.4.9-1
libpciaccess 0.10.5-1
xf86-input-evdev 2.1.3-1
Comment 1 Georg Grabler 2009-03-04 01:52:53 UTC
Oh, the video-intel is the currently installed driver (since I had to backport to 2.4.3, which works with xorg 1.6.0).

It's crashing with
xf86-video-intel 2.6.2-1
Comment 2 Gordon Jin 2009-03-04 17:37:10 UTC
Does master branch or 2.6.3 work?
Comment 3 Georg Grabler 2009-03-05 04:12:28 UTC
I just compiled the masterbranch of 1.6.3 (git) and had the same problem / "system freeze" using the new version.
Comment 4 Georg Grabler 2009-03-05 04:16:01 UTC
If it helps:
I'm using archlinux (http://www.archlinux.org), if you want to check the packages for patches. I'm using the dri / mesa drivers of the distribution, and the intel-driver compiled from git master.

As I know with 2.6.2 / xorg 1.6, most patches which were in before were removed by the maintainers.

ofc using 2.6.3 didn't help .. getting confused with all those similar versions.
Comment 5 Georg Grabler 2009-03-06 03:42:57 UTC
As a side-note:
The driver (2.6.2) works perfectly on my Vostro 1510 with 945G chipset. So it seems as if it's really a intel driver related issue with the 82865G chipset.
Comment 6 Georg Grabler 2009-03-09 02:24:03 UTC
May be related / duplicate of
http://bugs.freedesktop.org/show_bug.cgi?id=19727
Comment 7 Eric Anholt 2009-03-09 22:09:34 UTC
dmesg, Xorg.0.log
Comment 8 Georg Grabler 2009-03-10 01:51:55 UTC
Created attachment 23715 [details]
Xorg.0.log using xf86-video-intel 2.6.3

Xorg.0.log of the crash.

Don't know how to produce a dmesg on a complete system freeze (crash, hard reset necessary, can't even switch to console using ctrl+F1) - any advice?
Comment 9 Georg Grabler 2009-03-16 05:43:42 UTC
Testing with xf86-video-intel 2.6.99.902 gives a quite "new" result:

1.) Screen flickering
2.) Still crashing
3.) Some strange drawing problems (as if blured / smeared .. I don't know the correct english term, sorry, no native speaker).
Comment 10 Sven Hoexter 2009-04-12 14:42:34 UTC
Hi,
ok same problem with the same chipset here with Debian/unstable running Linux 2.6.29 and driver version 2.6.99.903.
When I try to start KDE it's locking up completly. Using fluxbox I can at least open a terminal and blindly start glxgears which to my surprise works. Everything else is unusable and goes down in the blur and other drawing madness.

One workaround is to disable accelaration with Option "NoAccel" "true". Makes everything slow like hell but at least it's kind of useable again.

My second system with a 945G chipset works fine so this seems to be a 8xx (maybe even 855/865?) specific problem.

So any chance to help with further enduser debugging to get this fixed soon?
Should I attach my Xorg.0.log aswell?
Comment 11 Mark Knecht 2009-04-18 21:06:59 UTC
I suspect that I am having more or less the same problem. I have no problem starting X but trying to use any application that does xv-type video crashes X completely and returns me to the login screen.

This did not happen until Gentoo pushed xorg-server-1.5 out as stable. I was running 1.3 prior to that without problems. I've tried updating to xorg-7.4 but that hasn't changed anything. I'm currently using xorg-server-1.5.3-r5 and xf86-video-intel-2.6.3-r1.

I have found that with mplayer, where I can specific the rendering type, that if I choose OpenGL it doesn't crash, but if I choose xv (which is pretty much default) it does crash. When it does I see some evidence of a segfault, but I'm not set up to back trace anything. (I'm a user-type, not a developer.)

<SNIP>

(==) Log file: "/var/log/Xorg.0.log", Time: Sat Apr 18 20:34:02 2009
(EE) Unable to locate/open config file
New driver is "intel"
(==) Using default built-in configuration (30 lines)
(EE) Failed to load module "fbdev" (module does not exist, 0)
Failed to initialize GEM.  Falling back to classic.
exaCopyDirty: Pending damage region empty!
(EE) intel(0): Failed to pin xv buffer
intel_bufmgr_gem.c:839: Error setting memory domains -1214539056 (00000040 00000040): Inappropriate ioctl for device .

Backtrace:
0: /usr/bin/X(xorg_backtrace+0x37) [0x812735f]

Fatal server error:
Caught signal 11.  Server aborting
<SNIP>


Seems like the issue is this 'failed to pin xv buffer' message...

Hardware:

dragonfly ~ # lspci
00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub Interface (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82865G Integrated Graphics Controller (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) AC'97 Audio Controller (rev 02)
01:03.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
dragonfly ~ # 
dragonfly ~ # lsmod
Module                  Size  Used by
snd_pcm_oss            32160  0 
snd_mixer_oss          12288  1 snd_pcm_oss
snd_seq_oss            24732  0 
snd_seq_midi_event      5632  1 snd_seq_oss
snd_seq                40232  4 snd_seq_oss,snd_seq_midi_event
snd_usb_audio          70368  0 
snd_usb_lib            12928  1 snd_usb_audio
snd_rawmidi            16928  1 snd_usb_lib
snd_seq_device          5900  3 snd_seq_oss,snd_seq,snd_rawmidi
snd_hwdep               6276  1 snd_usb_audio
i915                   25856  2 
drm                    61736  3 i915
sbp2                   18572  0 
ieee1394               66064  1 sbp2
snd_intel8x0           25756  1 
snd_ac97_codec         87072  1 snd_intel8x0
ac97_bus                1664  1 snd_ac97_codec
snd_pcm                56452  4 snd_pcm_oss,snd_usb_audio,snd_intel8x0,snd_ac97_codec
snd_timer              16520  2 snd_seq,snd_pcm
snd                    40632  14 snd_pcm_oss,snd_mixer_oss,snd_seq_oss,snd_seq,snd_usb_audio,snd_rawmidi,snd_seq_device,snd_hwdep,snd_intel8x0,snd_ac97_codec,snd_pcm,snd_timer
intel_agp              22588  1 
snd_page_alloc          6792  2 snd_intel8x0,snd_pcm
agpgart                25524  3 drm,intel_agp
dragonfly ~ # 

dragonfly ~ # modinfo i915
filename:       /lib/modules/2.6.27-gentoo-r10/kernel/drivers/gpu/drm/i915/i915.ko
license:        GPL and additional rights
description:    Intel Graphics
author:         Tungsten Graphics, Inc.
depends:        drm
vermagic:       2.6.27-gentoo-r10 SMP preempt mod_unload PENTIUM4 
dragonfly ~ # 

I don't know what other info you want so just ask and I'll try to provide it. 
Comment 12 Mark Knecht 2009-04-18 21:08:33 UTC
Also meant to add that I see this in dmesg but have the impression it's not important:

[drm:i915_getparam] *ERROR* Unknown parameter 5
[drm:i915_getparam] *ERROR* Unknown parameter 5
Comment 13 Georg Grabler 2009-04-20 23:29:42 UTC
Behaviour didn't change in 2.7.0 in Arch Linux Testing using Kernel 2.6.29.1.

I think I could get a dmesg out of X booting - if that would help. It's now surviving longer (sometimes) before crashing ... don't know why though.

Arch provides a -legacy driver (2.3.2), which I'm using right now. Works, but would be nice to get this one fixed though - even for the old cards (since who knows how long 2.3.2 will work with xorg).
Comment 14 Gordon Jin 2009-06-02 11:38:42 UTC
(In reply to comment #8)
> Created an attachment (id=23715) [details]
> Xorg.0.log using xf86-video-intel 2.6.3
> 
> Xorg.0.log of the crash.
> 
> Don't know how to produce a dmesg on a complete system freeze (crash, hard
> reset necessary, can't even switch to console using ctrl+F1) - any advice?
> 

either remote login, or get /var/log/messages after reboot.
Comment 15 Gordon Jin 2009-07-08 23:53:31 UTC
ping

Could you try 2.6.30? Don't forget attach dmesg or /var/log/messages.
Comment 16 Georg Grabler 2009-07-09 06:43:39 UTC
Created attachment 27517 [details]
New XOrg.0.log

Xorg.0.log of the crash when using xf86-video-intel 2.7.1 and kernel 2.6.30.1
Comment 17 Georg Grabler 2009-07-09 06:44:06 UTC
Created attachment 27518 [details]
dmesg of the crash

dmesg output of the crash when using xf86-video-intel 2.7.1 and kernel 2.6.30.1
Comment 18 Georg Grabler 2009-07-09 06:44:35 UTC
Created attachment 27519 [details]
/var/log/messages of the crash

/var/log/messages of the crash when using xf86-video-intel 2.7.1 and kernel 2.6.30.1
Comment 19 Georg Grabler 2009-07-09 06:54:17 UTC
In between, the kernel keeps running (no kernel crash at all, i can log in via SSH remote, that's how I got the logs). X completely hangs.

If you want me to enable any options for more detailed debugging outputs let me know.

Kind regards,
Georg
Comment 20 Eric Anholt 2009-07-15 16:59:27 UTC
intel_gpu_dump output would really help debug.

Also, this may or may not help:

commit a1e6abb5ca89d699144d10fdc4309b3b78f2f7a9
Author: Eric Anholt <eric@anholt.net>
Date:   Wed Jul 15 14:15:10 2009 -0700

    Use batch_start_atomic to fix batchbuffer wrapping problems with 8xx render.
    
    Bug #22483.
Comment 21 Georg Grabler 2009-07-16 01:39:38 UTC
Created attachment 27752 [details]
intel_gpu_dump of the crash

Couldn't take one when the system is running normally.

I updated my xf86-video-intel right before taking the dump (hopefully that won't mess it up).
If you need a dump of 2.7.1, that wouldn't be a problem.

Versions used:
intel_gpu_dump version: 1.0.1
xf86-video-intel 2.7.99.901

Note: gzipped, taller than the maximum size allowed here as an attachment.
Comment 22 Georg Grabler 2009-07-16 01:50:24 UTC
Created attachment 27753 [details]
intel_gpu_dump of 2.7.1

Realized that the fix is about the version I used, so I took the old 2.7.1 version. The problem could be the same though.

I'll try the patch provided in the other bug report and report back here if it worked for me as well.
Comment 23 Georg Grabler 2009-07-16 02:24:22 UTC
Created attachment 27756 [details]
crash with patch by  Eric Anholt

I gave it a shot by including the commit of eric as a patch in the intel driver. I don't know if I missed some parent patch in there since i had offsets in the files.

I can't test it against git master, since i can't check out using git (only http allowed here).

If you want, I can test it tomorrow against any branch / revision you want.

Attached the gpu dump again, 2.7.99.901 including the patch of commit a1e6abb5ca89d699144d10fdc4309b3b78f2f7a9 by Eric Anholt.
Comment 24 Georg Grabler 2009-07-16 03:23:30 UTC
Created attachment 27757 [details]
intel_gpu_dump (2.7.99.902 + patch by Eric)

I realized that you tagged 2.7.99.902 3 days ago, and that Erics patch was right after the tagging.
I built 2.7.99.902 by downloading the package, and patched against 2.7.99.902. The result is: Again hangs up.

Attached the intel_gpu_dump of 2.7.99.902 with Erics patch when using kernel 2.6.30.1.
Comment 25 Eric Anholt 2009-09-25 11:16:22 UTC
There's been another important fix for 8xx hangs, this time in the kernel:

commit e517a5e97080bbe52857bd0d7df9b66602d53c4d
Author: Eric Anholt <eric@anholt.net>
Date:   Thu Sep 10 17:48:48 2009 -0700

    agp/intel: Fix the pre-9xx chipset flush.

And if you retest against it, please clear needinfo so we see your response in a timely manner :)
Comment 26 Patrick McCarty 2009-09-26 16:55:47 UTC
(In reply to comment #25)
> There's been another important fix for 8xx hangs, this time in the kernel:
> 
> commit e517a5e97080bbe52857bd0d7df9b66602d53c4d
> Author: Eric Anholt <eric@anholt.net>
> Date:   Thu Sep 10 17:48:48 2009 -0700
> 
>     agp/intel: Fix the pre-9xx chipset flush.

I've had the same symptoms as the OP, and this commit fixes the problem for me.

Applied against 2.6.30.6 kernel (Arch Linux) and using xf86-video-intel 2.8.1.
Comment 27 Eric Anholt 2009-10-19 13:31:39 UTC
OK, closing this out. Georg, please reopen if the problem continues.