Summary: | [845G] GPU hanging on X start (sometimes) | ||
---|---|---|---|
Product: | xorg | Reporter: | david manyé <dmanye> |
Component: | Driver/intel | Assignee: | Wang Zhenyu <zhenyu.z.wang> |
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> |
Severity: | critical | ||
Priority: | medium | CC: | dave.plater, dmanye, eich, gordon.jin, joeyadams3.14159, kent.liu, ling.yue, mat, mrmazda, quanxian.wang |
Version: | 7.3 (2007.09) | Keywords: | NEEDINFO |
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Description
david manyé
2008-09-22 04:10:55 UTC
Created attachment 19093 [details]
log for version 1.4.2 from a machine that refuses to boot X
Created attachment 19094 [details]
log for version 1.4.2 from a machine that boot X correctly
Created attachment 19095 [details]
log for version 1.5.0 from the same machine that refuses to boot X with 1.4.2
I'd suggest to focus on the problem with the latest driver. Note: when we say driver version, it means xorg-video-intel (2.4.2) not xserver version (1.5.0). *** Bug 17670 has been marked as a duplicate of this bug. *** I have openSUSE Factory and kernel-2.6.27-rc6-7-pae on i845G on Dell GX260 desktop. Xorg.0.log shows Xorg 1.5.0 Intel module version 2.4.97. Attempting to use X either directly or via Sax2 completely locks up the system. Excerpt for Xorg.0.log: Fatal server error: lockup Error in I830WaitLpRing(), timeout for 2 seconds pgetbl_ctl: 0x3ffe0001 getbl_err: 0x00000000 ipeir: 0x00000000 iphdr: 0x05000000 LP ring tail: 0x00000020 head: 0x0000000c len: 0x0001f001 start 0x00000000 eir: 0x0000 esr: 0x0000 emr: 0xff7b instdone: 0xffc1 instpm: 0x0000 memmode: 0x00000000 instps: 0x00000040 hwstam: 0xffff ier: 0x0000 imr: 0xffff iir: 0x0000 Ring at virtual 0xaf897000 head 0xc tail 0x20 count 5 acthd 0x311a8 Comment 6 system booted to Mandriva Cooker 2.6.27-rc7.5.1 with Xorg 1.4.2/Intel 2.4.2-4 seems to work fine. I still crash on 1.5.2 and 2.4.97 on openSUSE Factory's 2.6.27.1. I have both Intel motherboard and Foxconn/Dell motherboard 845G systems. Problem is gone with 2.5.0 and 1.5.2 on the Intel, but not on the Dell, regardless whether BIOS video buffer is set to 1M or 8M. How about if you add "AccelMethod NoAccel" in under device section in xorg.conf? This bug seems a bit similar to bug#17291. (In reply to comment #10) > How about if you add "AccelMethod NoAccel" in under device section in > xorg.conf? ... 1.5.2 ... Parse error on line 164 of section Device in file /etc/x11/xorg.conf "AccelMethod NoAccel" is not a valid keyword in this section." ... Adding 'Option "NoAccel"' in Section "Device" in xorg.conf enables X startup success on the GX260. Isn't this a duplicate of Bug #18270? *** Bug 18270 has been marked as a duplicate of this bug. *** Created attachment 21290 [details]
Xorg.0.logs, dmesgs, and lsmod in working versus nonworking times
Yay, glad I did a search. Indeed, this happens to me when I get a "(WW) intel(0): PRB0_HEAD (0x00000004) and PRB0_TAIL (0x00000000) indicate ring buffer not flushed". Additionally, I get "underrun on pipe A" and a couple of other potentially related errors. I wonder if they're related. Since I have free time, plenty of knowledge of C, and a very very tiny bit of knowledge about X drivers, I'll see if I can hack away at this. Don't count on me, though :) Anyway, here's the bug report I was about to post: Title: Intel driver locks up system at startup randomly; underrun on pipe A I have an Intel 82845G/GL integrated chipset on an HP Pavilion 503n displaying on a 17-inch LCD monitor. The following bug happens on the latest Ubuntu Intrepid on Linux 2.6.27-9-generic as well as in Ubuntu Hardy on Linux 2.6.24-14-generic. All my testing is on Ubuntu Intrepid for this bug information. My system locks up completely (can't be accessed even through VT switching or SSH) when Ubuntu Intrepid starts up, but this problem occurs randomly. I'm guessing the randomness is caused by Ubuntu racing to start X before starting a few other system services. Other times, the driver runs fine, 3D and all, except for these potentially related problems: 1. A blank screen after VT-switching a bit and switching back to F7 or wherever the X server is running. 2. Horizontal jumping effects (seemingly lasting only three or so screen refreshes; a tiny fraction of a second) after resuming from suspend. These effects appear in greater frequency at higher resolutions (over ten times more on 1280x1024 than on 1024x768), and they happen more often when a lot of 2D graphics action is going on (glxgears and video don't cause that much jumpiness, but GNOME's progress bar causes it like crazy). The lockup as well as these two problems are always accompanied by one or more instances of this message in /var/log/Xorg.0.log: (EE) intel(0): underrun on pipe A! Before a lockup occurred, the (WW) lines below appeared in the log in a test: (II) intel(0): Fixed memory allocation layout: (II) intel(0): 0x00000000-0x0001ffff: ring buffer (128 kB) -- snip -- (II) intel(0): 0x08000000: end of aperture (WW) intel(0): PRB0_HEAD (0x00000004) and PRB0_TAIL (0x00000000) indicate ring buffer not flushed (WW) intel(0): Existing errors found in hardware state. When the blank screen problem occured, these (WW) lines appeared instead in a separate test: (WW) intel(0): ESR is 0x00000010, page table error (WW) intel(0): PGTBL_ER is 0x00000011 (WW) intel(0): Existing errors found in hardware state. When the resume from suspend jumping effects happened, no (WW) lines after the "Fixed memory allocation layout:" occurred in yet another test. In my last occurrence of a lockup, the X cursor appeared and the mouse worked for a second or two before X crashed and brought Linux with it. Before X's demise, the following appears in the log: (EE) intel(0): underrun on pipe A! (EE) intel(0): underrun on pipe A! -- snip -- Error in I830WaitLpRing(), timeout for 2 seconds pgetbl_ctl: 0x3ff60001 getbl_err: 0x00000000 ipeir: 0x00000000 iphdr: 0x54300004 LP ring tail: 0x000002c0 head: 0x000001e4 len: 0x0001f001 start 0x00000000 eir: 0x0000 esr: 0x0000 emr: 0xff7b instdone: 0xffc1 instpm: 0x0000 memmode: 0x00000000 instps: 0x00000024 hwstam: 0xfffe ier: 0x0002 imr: 0x053c iir: 0x0080 Ring at virtual 0xaf89b000 head 0x1e4 tail 0x2c0 count 55 Ring at virtual 0xaf89b000 head 0x1e4 tail 0x2c0 count 55 -- supersnip -- Ring at virtual 0xaf89b000 head 0x1e4 tail 0x2c0 count 55 Ring end space: 130844 wanted 131064 (II) intel(0): [drm] removed 1 reserved context for kernel (II) intel(0): [drm] unmapping 8192 bytes of SAREA 0xf8aee000 at 0xb7b0c000 (II) intel(0): [drm] Closed DRM master. Fatal server error: lockup (II) Macintosh mouse button emulation: Close (II) UnloadModule: "evdev" (II) AT Translated Set 2 keyboard: Close (II) UnloadModule: "evdev" (II) Logitech Trackball: Close (II) UnloadModule: "evdev" (II) AIGLX: Suspending AIGLX clients for VT switch Backtrace: 0: /usr/X11R6/bin/X(xf86SigHandler+0x79) [0x80c3009] 1: [0xb803b400] 2: /usr/lib/xorg/modules/drivers//intel_drv.so [0xb7ab2b50] 3: /usr/X11R6/bin/X [0x80d6b0a] 4: /usr/lib/xorg/modules/extensions//libglx.so [0xb7b6cbe9] 5: /usr/X11R6/bin/X(AbortDDX+0x79) [0x80a8b09] 6: /usr/X11R6/bin/X(AbortServer+0x28) [0x813c498] 7: /usr/X11R6/bin/X(FatalError+0x63) [0x813caa3] 8: /usr/lib/xorg/modules/drivers//intel_drv.so(I830WaitLpRing+0x201) [0xb7aa71d1] 9: /usr/lib/xorg/modules/drivers//intel_drv.so(I830Sync+0x1c3) [0xb7aa75e3] 10: /usr/lib/xorg/modules/drivers//intel_drv.so [0xb7acf7ea] 11: /usr/lib/xorg/modules//libexa.so(exaWaitSync+0x65) [0xb79b0045] 12: /usr/lib/xorg/modules//libexa.so(ExaDoPrepareAccess+0x7e) [0xb79b123e] 13: /usr/lib/xorg/modules//libexa.so(ExaCheckPutImage+0x103) [0xb79b8e03] 14: /usr/lib/xorg/modules//libexa.so [0xb79b2585] 15: /usr/X11R6/bin/X [0x817948d] 16: /usr/X11R6/bin/X(ProcPutImage+0x15e) [0x808951e] 17: /usr/X11R6/bin/X(Dispatch+0x34f) [0x808c89f] 18: /usr/X11R6/bin/X(main+0x47d) [0x8071d1d] 19: /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7c44685] 20: /usr/X11R6/bin/X [0x8071101] Saw signal 11. Server aborting. (II) AIGLX: Suspending AIGLX clients for VT switch Complete logs are attached (see "Xorg.0.logs, dmesgs, and lsmod in working versus nonworking times"), and the DESCRIPTIONS file in the tarball explains each of the 9 logs. openSUSE 11.1/Novell SLE11 specific ----------------------------------- * The issue is that sax2, which runs during installation (and also on LiveCD), does not use Xserver's "-br" option yet and the problem only occurs when this option is not being set. This is the reasons why xdm/gdm/kdm, which now use "-br" option by default, work after installation (after being forced to reboot). Still it's a driver bug. (In reply to comment #17) > openSUSE 11.1/Novell SLE11 specific > ----------------------------------- > * The issue is that sax2, which runs during installation (and also on > LiveCD), does not use Xserver's "-br" option yet and the problem only > occurs when this option is not being set. This is the reasons why > xdm/gdm/kdm, which now use "-br" option by default, work after > installation (after being forced to reboot). Still it's a driver bug. hi david, Could you try comment 7? Thanks Ma Ling ping david ~ (In reply to comment #19) > ping david ~ > hello, sorry for the delay but there were health issues (which aren't solved yet). the problem isn't still solved. today i've booted all 30 computers in the lab. about 1/6 failed. from those failed, some have got frozen (kernel hang) and some others got gdm failing/retrying to start X. in one of computers where gdm keep retrying, i've installed intel driver from debian experimental (v2.5.1) but gdm complains the same way. i've installed also the -dbg package to get more info... i'll try to install mandriva to see what happens... also, i'll try with a 2.6.27 debian kernel compiled by me (i've tried with an unofficial version and it seems kernel and X don't like each other). Created attachment 22006 [details]
xserver 1.5.3 + intel drv 2.5.1
these are the logs from X (1.5.3) and from gdm with intel driver 2.5.1
Please use 'intel_gtt' dumper under src/reg_dumper, provide us two logs; one is when AGP is disabled (agp=off kernel param should work), and another one is normal boot with AGP enabled. (In reply to comment #22) > Please use 'intel_gtt' dumper under src/reg_dumper, provide us two logs; one is > when AGP is disabled (agp=off kernel param should work), and another one is > normal boot with AGP enabled. > i've used intel driver v2.6.0 sources. # ./intel_gtt Unsupported chipset for gtt dumper # dmesg | grep Chipset [ 7.410788] agpgart: Detected an Intel 830M Chipset. hi david Could you help us to generate two log files with Modedebug option under comment #1 and #2 case respectively, then paste them again. Thanks Ma Ling Created attachment 23320 [details]
X starts successfully, mode debug on, rev 3 machine
Created attachment 23321 [details]
X fails to, mode debug on, rev 3 machine
(In reply to comment #24) > hi david > Could you help us to generate two log files with Modedebug option under comment > #1 and #2 case respectively, then paste them again. > > Thanks > Ma Ling > sorry for the long delay. i hope it's not too late ;-) i've sent the logs you ask and i want to add something i've found. about 85% of the machines have this: 00:02.0 VGA compatible controller: Intel Corporation 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 01) the rest of the machines are the same but "rev 3" instead of "rev 1". what i can see is that "rev 1" machines usually work, and that "rev 3" machines usually fail. in other words, "rev 3" machines are more faulty than "rev 1" ones. thanks. Adjusting severity: crashes & hangs should be marked critical. hello, i tested again with: kernel 2.6.30 xorg 1.6.2 rc1 intel driver 2.7.1 and all machines boot X correctly, though it is easy to crash the "rev 3" ones when going back from console to X or when gdm starts again after an X session log out. for the next months we plan not to use the "rev 3" machines so this bug hopefully won't appear. if you want to close this bug, feel free to do it. it you want me to test something more, feel free to ask it. thanks. Does it also occur with KMS enabled? switching to/from text mode has always been quite unreliable, and KMS should fix that. Also, intel_gpu_dump output with the hung X Server would help. Hi, David How about the test result? thanks. (In reply to comment #32) > Hi, David > How about the test result? > thanks. > about kms: kms in debian kernel is not active by default. sadly i have not much time to recompile my own kernel and get it working ... about intel_gpu_dump: sorry, i haven't found it neither the debian sid/unstable precompiled package nor in debian sources. i've tried intel_reg_dumper (don't know if it would help) and basically i get just two different dumps: one when in console mode and another in graphic mode. there's no difference whether the X server is working or hung. i'll attach both outputs. i have not much time and in a week i'll be on holidays during a month, and i expect to have a lot of urgent work when i come back, so probably i'll can't get some time to follow this bug. also, these failing machines are scheduled to be substituted soon. i'll ask to retain one of them to play with this bug... thanks. ------- Comment #31 From Eric Anholt 2009-07-15 14:18:40 PST [reply] ------- Also, intel_gpu_dump output with the hung X Server would help. ------- Comment #32 From ykzhao 2009-07-31 06:18:52 PST [reply] ------- Hi, David How about the test result? thanks. Created attachment 28286 [details]
output log for intel_reg_dumper in console mode
Created attachment 28287 [details]
output log for intel_reg_dumper in graphic mode
Please test my three patches on bug #23082, which aims to fix 845G problem in KMS with UXA. This should be fixed by Eric's commit e517a5e97080bbe52857bd0d7df9b66602d53c4d Author: Eric Anholt <eric@anholt.net> Date: Thu Sep 10 17:48:48 2009 -0700 agp/intel: Fix the pre-9xx chipset flush. Ever since we enabled GEM, the pre-9xx chipsets (particularly 865) have had serious stability issues. Back in May a wbinvd was added to the DRM to work around much of the problem. Some failure remained -- easily visible by dragging a window around on an X -retro desktop, or by looking at bugzilla. The chipset flush was on the right track -- hitting the right amount of memory, and it appears to be the only way to flush on these chipsets, but the flush page was mapped uncached. As a result, the writes trying to clear the writeback cache ended up bypassing the cache, and not flushing anything! The wbinvd would flush out other writeback data and often cause the data we wanted to get flushed, but not always. By removing the setting of the page to UC and instead just clflushing the data we write to try to flush it, we get the desired behavior with no wbinvd. This exports clflush_cache_range(), which was laying around and happened to basically match the code I was otherwise going to copy from the DRM. Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org> Cc: stable@kernel.org Please test with upstream kernel with KMS. Could you verify with new linus kernel? othewise this is just time out warning, and I'll mark it as fixed... (In reply to comment #38) > Could you verify with new linus kernel? othewise this is just time out warning, > and I'll mark it as fixed... > until today i've had not time to do any test. as i foretold you, the computers are already substituted and the few that were left cannot be used to test anything. so leave this bug rest in peace. ok, mark this as fixed by Eric's patch in kernel. Feel free to reopen if you would have chance to retest and issue was still there. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.