I've been having intermittent problems with org since the kms work was merged into the kernel, and that means the xserver has been freezing, crashing and not responding to input. To add to that, nothing in the logs, a complete mystery. Then finally today, I found something in Xorg.0.log.old because I was randomly kicked out of kde straight into init 3. I've attached the log and maybe someone here will know if it's a bug because there doesn't appear I can back trace it. When it happens, kde freezes and nothing responds to input. Ctrl + Alt + Backspace nor Ctrl + Alt + Delete do anything and the xserver doesn't automatically restart after the crash. The desktop completely locks up and I can only hold down the power button to shutdown and power up again.
Created attachment 29509 [details] Xorg.0.log The error is at the end of the log.
Created attachment 29510 [details] lspci -vv If any other data or info is required, please just ask and I'll try my best to add what I know.
> --- Comment #1 from Tony White <tonywhite100@googlemail.com> 2009-09-14 03:35:08 PST --- > Created an attachment (id=29509) > --> (http://bugs.freedesktop.org/attachment.cgi?id=29509) > Xorg.0.log > > The error is at the end of the log. > That error looks like bug #20516.
Nope, doesn't look like it. I can log in and out just fine. Also, I'm using debian sid right now and this exact same issue has also occured on two other linux installs from two other different Linux vendors but this is the first time I've been able to find a record of anything going wrong in the logs.
the log says: (EE) intel(0): Failed to initialize kernel memory manager (==) intel(0): VideoRam: 131072 KB (II) intel(0): Attempting memory allocation with tiled buffers. (WW) intel(0): xf86AllocateGARTMemory: allocation of 1536 pages failed (Cannot allocate memory)
Could you paste dmesg? Eric has done a fix for 8xx which is on his drm-intel-next branch, could you test it? (passing i915.powersave=0 if that's not relevate to you). git clone git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel.git
I pulled the latest snapshot from here : http://git.kernel.org/?p=linux/kernel/git/anholt/drm-intel.git;a=summary the last commit was : 4 days ago Chris Wilson drm/i915: Only destroy a constructed mmap offset drm-intel-next. It's building now and I'll test it after it's built. I'll post back either way with a yes it's fixed here or not in about a week (Hopefully.) I don't use : i915.powersave=0 or anything like that on this machine, when booting Linux. If this fix works, is there anyway to know if/when it will get merged into mainline? Sort of like next bug fix release for the 2.6.31 kernel or 2.6.32? I'm guessing it's intel-next tag means 2.6.32 possibly? Or is that a silly question?
Created attachment 29596 [details] dmesg
I don't know how but that snapshot made things a lot worse. There were a few small horizontal black lines on a qt 4 application (Opera) That were not supposed to be there (I'd call them artifacts, about 2 cm long.) It took this kernel a grand total of 2 minutes and thirty seconds before it crashed as described. Complete lock up and nothing in the logs. I've attached the Xorg.0.log but it has logged no error. Can't the intel xorg driver just be regressed right back to the 2.6.27.x version because this is a major problem here and has been since that kernel, getting worse with every new version released. I know it seems like I'm moaning but I simply cannot use Linux like this. 10 year old Windows XP is actually more reliable right now. Please fix this bug. I'm willing to test any patches you guys can post but I can only put up with this for another three months. I will just buy/build another (Non Intel machine) To solve it if you can't. Thanks for the patch suggestion but it very much did not work.
Created attachment 29607 [details] xorg crashed but logged no errors
From your kernel dmesg (seems not drm-intel-next kernel), you don't load i915 module at all. And you have vesafb loaded, don't do that. And X log showed that you're using UMS, instead of KMS. Please check your kernel config, make sure you have enable following configs: CONFIG_AGP=y CONFIG_AGP_INTEL=y CONFIG_DRM=y CONFIG_DRM_I915=y CONFIG_DRM_I915_KMS=y CONFIG_FRAMEBUFFER_CONSOLE=y
Firstly, are you saying that I should use intelfb on the kernel command line and in fact the kernel (Or udev) Is incorrectly loading the fallback vesa framebuffer device driver instead of intelfb? Secondly, debian isn't configured to my knowledge to use kms and I don't expect it will for at least a year. Doesn't there need to be something that sends a signal to the kernel to stop kms when xorg is run from userspace if CONFIG_DRM_I915_KMS=y is set because that means kernel mode settings turned on by default on the intelfb? kms has never worked on this machine when the intelfb driver is set to kms by default. It gets to x and then x won't load. Never the less, I'll try it again. @ seems not drm-intel-next kernel - Sorry, yes that is the wrong dmesg. I will post the right one.
> --- Comment #12 from Tony White <tonywhite100@googlemail.com> 2009-09-17 01:31:46 PST --- > Firstly, are you saying that I should use intelfb on the kernel command line > and in fact the kernel (Or udev) Is incorrectly loading the fallback vesa > framebuffer device driver instead of intelfb? No, using intelfb leads to lots of pain. > Secondly, debian isn't configured to my knowledge to use kms and I don't expect > it will for at least a year. Doesn't there need to be something that sends a > signal to the kernel to stop kms when xorg is run from userspace if > CONFIG_DRM_I915_KMS=y is set because that means kernel mode settings turned on > by default on the intelfb? Forget about intelfb. For kms, adding 'options i915 modeset=1' to /etc/modprobe.d/kms.conf (and regenerating your initramfs) should do the trick. Or simply adding i915.modeset=1 to the kernel command line should work too.
OK. So I did as asked, pulled the snapshot again, used : CONFIG_AGP=y CONFIG_AGP_INTEL=y CONFIG_DRM=y CONFIG_DRM_I915=y CONFIG_DRM_I915_KMS=y CONFIG_FRAMEBUFFER_CONSOLE=y booted and the result is the same. The xserver won't stay up without crashing and freezing for any longer than two minutes. I've booted into it about eight times now. It freezes every go. In the logs : /var/log/messages : Sep 17 13:35:05 pentium-m kernel: i915 0000:00:02.0: VGA-1: EDID invalid. Sep 17 13:35:05 pentium-m kernel: Sep 17 13:35:05 pentium-m kernel: i915 0000:00:02.0: VGA-1: EDID invalid. Sep 17 13:35:05 pentium-m kernel: [drm] DAC-6: set mode 1024x768 1c Sep 17 13:35:23 pentium-m kernel: Sep 17 13:35:23 pentium-m kernel: i915 0000:00:02.0: VGA-1: EDID invalid. Sep 17 13:35:23 pentium-m kernel: Sep 17 13:35:23 pentium-m kernel: i915 0000:00:02.0: VGA-1: EDID invalid. But I guess that means wrong display resolution, however that's wrong and it says that the resolution is supported in .xsession-errors. /var/log/syslog : Sep 17 14:14:29 pentium-m kernel: [drm:edid_is_valid] *ERROR* Raw EDID: Sep 17 14:14:29 pentium-m kernel: <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Sep 17 14:14:29 pentium-m kernel: <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Sep 17 14:14:29 pentium-m kernel: <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Sep 17 14:14:29 pentium-m kernel: <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Sep 17 14:14:29 pentium-m kernel: <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Sep 17 14:14:29 pentium-m kernel: <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Sep 17 14:14:29 pentium-m kernel: <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Sep 17 14:14:29 pentium-m kernel: <3>00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Sep 17 14:14:29 pentium-m kernel: Sep 17 14:14:29 pentium-m kernel: i915 0000:00:02.0: VGA-1: EDID invalid. Sep 17 14:14:29 pentium-m kernel: fb: conflicting fb hw usage inteldrmfb vs VESA VGA - removing generic driver Sep 17 14:14:29 pentium-m kernel: Console: switching to colour dummy device 80x25 Sep 17 14:14:29 pentium-m kernel: fbcon: inteldrmfb (fb0) is primary device Sep 17 14:14:29 pentium-m kernel: render error detected, EIR: 0x00000010 Sep 17 14:14:29 pentium-m kernel: [drm:i915_handle_error] *ERROR* EIR stuck: 0x00000010, masking Sep 17 14:14:29 pentium-m kernel: render error detected, EIR: 0x00000010 ~.xsession-errors : X Error: XSyncBadAlarm 152 Extension: 143 (Uknown extension) Minor opcode: 11 (Unknown request) Resource id: 0x0 kdeinit4: preparing to launch /usr/lib/libkdeinit4_kcminit_startup.so X Error: XSyncBadAlarm 152 Extension: 143 (Uknown extension) Minor opcode: 11 (Unknown request) Resource id: 0x0 So further instructions please, it's definitely worse. If this gets merged as it is, it doesn't look like I'll be using Linux on this machine. I've attached the logs if they are any use.
Created attachment 29642 [details] intel-drm kernel dmesg
Created attachment 29643 [details] intel-drm kernel messages
Created attachment 29644 [details] intel-drm syslog
Created attachment 29645 [details] xsession-errors
Created attachment 29646 [details] intel-drm Xorg.0.log
I also see this as the first message when booting the intel drm kernel : [drm : edid_is_valid] *ERROR* Raw EDID [drm : i915_handle_error] *ERROR* EIR stuck : 0x00000010, Masking render error detected, EIR 0x00000010
I'm just posting again to say how angry I am about this bug and that I bought intel hardware because I require reliability. I've used Linux for ten years and this is the worst problem I have ever seen. It is getting worse and worse and worse with every update you guys push. I've just received an xserver pre release copy along with what I guess is the intel driver pre release and it is even worse than when I first posted this report. The xserver used to crash and freeze randomly once or twice a week but now it actually does it does it every two hours! I need to know if you guys know what the problem is and whether you guys are committed to solving it because I cannot work like this. Three months of this random freezing here and others also reporting it happening with this driver and still no fix. What is going on???
(In reply to comment #21) > I'm just posting again to say how angry I am about this bug and that I bought > intel hardware because I require reliability. Hi Tony, I know it's frustrating to encounter bugs like this, and I'm sorry you haven't seen any improvements yet. > I need to know if you guys know what the problem is and whether you guys are > committed to solving it because I cannot work like this. > Three months of this random freezing here and others also reporting it > happening with this driver and still no fix. > What is going on??? We've definitely seen that lots of people with 855 and 865 hardware were having lots of pain. And yes, we've been working hard to fix these issues. We very recently made a couple of important breakthroughs that fix things for many users. The first is a commit by Eric Anholt to the kernel: commit e517a5e97080bbe52857bd0d7df9b66602d53c4d Author: Eric Anholt <eric@anholt.net> Date: Thu Sep 10 17:48:48 2009 -0700 agp/intel: Fix the pre-9xx chipset flush. Ever since we enabled GEM, the pre-9xx chipsets (particularly 865) have had serious stability issues. Back in May a wbinvd was added to the DRM to work around much of the problem. Some failure remained -- easily visible by dragging a window around on an X -retro desktop, or by looking at bugzill The chipset flush was on the right track -- hitting the right amount of memory, and it appears to be the only way to flush on these chipsets, but th flush page was mapped uncached. As a result, the writes trying to clear the writeback cache ended up bypassing the cache, and not flushing anything! Th wbinvd would flush out other writeback data and often cause the data we want to get flushed, but not always. By removing the setting of the page to UC and instead just clflushing the data we write to try to flush it, we get the desired behavior with no wbinvd. This exports clflush_cache_range(), which was laying around and happened to basically match the code I was otherwise going to copy from the DRM. Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org> Cc: stable@kernel.org If you can verify that your kernel includes that, (or update if it doesn't), and report back whether that helps, that would be very useful. We also recently fixed some issues with the xf86-video-intel driver in the 2.8.99.902 release, (which is the 2nd release candidate for 2.9.0). That release includes this fix: commit 2cc1f3cb6034dddd65b3781b0cde7dff4ac1e803 Author: Keith Packard <keithp@keithp.com> Date: Sat Sep 19 17:30:57 2009 -0700 i8xx: Format projective texture coordinates correctly. Projective texture coordinates must be delivered as TEXCOORDFMT_3D using TEXCOORDTYPE_HOMOGENOUS. This meant selecting the correct type in i830_texture_setup, the correct format in i830_emit_composite_state and sending only 3 coordinates in i830_emit_composite_primitive. Signed-off-by: Keith Packard <keithp@keithp.com> [ickle: tweaked to fix up a couple of use-before-initialised] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Please let us know if these commits don't fix your issues, (and please remove the NEEDINFO keyword when you reply). Thanks, -Carl
Thanks Carl, I have realised what's given me a bee in my bonnet. It's that I've reported [855GM KMS] allocate memory fail as the problem but with the latest stuff, [855GM KMS] allocate memory fail is fixed but the problem that I thought [855GM KMS] allocate memory fail was causing is worse with the new stuff; if that makes any sense. So it's just the confusion I was under, not realising that the problem is not [855GM KMS] allocate memory fail, which is now fixed looking at the logs. As far as where I'm at with this : I'm using the VESA driver instead with xorg for the time being until I can verify the problem is gone; so at least my worst fear (No Linux for me) Is unfounded. The VESA driver actually works very well and I've not had one single crash or freeze whilst using it. However I do want to use this intel driver instead for dual display. So I need to debug the xserver over ssh to catch the freeze and get the crash data to the developers. I'm just waiting for a part for a second machine I have here and then I can debug the crash over ssh. If I succeed, I'll create a new report.
Please, open a new track for your new problem.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.