Bug 23082

Summary: [845G] crash!
Product: xorg Reporter: Oswald Buddenhagen <ossi>
Component: Driver/intelAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: bernardgagnon, brian, chris, moikkis
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
log file. i'm not sure whether it is actually from *this* crashing run, though - too many reboots.
none
(kernel patch 1) fix errata for sync flush enable
none
(kernel patch 2) fix batch buffer end address
none
(xorg driver patch) don't emit render state when enter VT none

Description Oswald Buddenhagen 2009-08-01 13:57:07 UTC
Created attachment 28253 [details]
log file. i'm not sure whether it is actually from *this* crashing run, though - too many reboots.

i got this one while switching vts, but i also had spontaneous crashes and x servers stuck in D state after switching vts - no idea whether these are related.

#0  0xa77b641f in drm_intel_bo_alloc (bufmgr=0x0,
    name=0xa78094e4 "HW cursors", size=20480, alignment=524288)
    at /home/ossi/src/dl/xorg/mesa/drm/libdrm/intel/intel_bufmgr.c:51
#1  0xa77e0066 in i830_allocate_memory_bo (pScrn=0x8a7a4c8,
    name=0xa78094e4 "HW cursors", size=20480, pitch=0, align=524288,
    flags=<value optimized out>, tile_format=TILE_NONE)
    at /home/ossi/src/dl/xorg/driver/xf86-video-intel/src/i830_memory.c:730
#2  0xa77e06b9 in i830_allocate_cursor_buffers (pScrn=0x8a7a4c8)
    at /home/ossi/src/dl/xorg/driver/xf86-video-intel/src/i830_memory.c:1146
#3  0xa77e0c98 in i830_allocate_2d_memory (pScrn=0x8a7a4c8)
    at /home/ossi/src/dl/xorg/driver/xf86-video-intel/src/i830_memory.c:1276
#4  0xa77d8519 in i830_try_memory_allocation (pScrn=0x8a7a4c8)
    at /home/ossi/src/dl/xorg/driver/xf86-video-intel/src/i830_driver.c:2281
#5  0xa77d8658 in i830_memory_init (pScrn=0x8a7a4c8)
    at /home/ossi/src/dl/xorg/driver/xf86-video-intel/src/i830_driver.c:2328
#6  0xa77db87b in I830ScreenInit (scrnIndex=0, pScreen=0x9631068, argc=9,
    argv=0xafe5ee54)
    at /home/ossi/src/dl/xorg/driver/xf86-video-intel/src/i830_driver.c:2673
#7  0x0808a2fc in AddScreen (pfnInit=0xa77db4a1 <I830ScreenInit>, argc=9,
    argv=0xafe5ee54) at /home/ossi/src/dl/xorg/xserver/dix/dispatch.c:4048
#8  0x080aa981 in InitOutput (pScreenInfo=0x81a4c18, argc=9, argv=0xafe5ee54)
    at /home/ossi/src/dl/xorg/xserver/hw/xfree86/common/xf86Init.c:1027
#9  0x08066c7a in main (argc=9, argv=0xafe5ee54, envp=0x73726f)
    at /home/ossi/src/dl/xorg/xserver/dix/main.c:201
Comment 1 Oswald Buddenhagen 2009-08-02 05:54:10 UTC
here's a backtrace from a semi-spontaneous lockup (the server was hanging in S state indefinitely). this might well be a different problem, but it's also related to memory allocation, so i'm putting it here for now.

#0  0xa7f27424 in __kernel_vsyscall ()
#1  0xa7bbe589 in ioctl () from /lib/i686/cmov/libc.so.6
#2  0xa77e2fa4 in drm_intel_gem_bo_map_gtt (bo=0xa01a400)
    at /home/ossi/src/dl/xorg/mesa/drm/libdrm/intel/intel_bufmgr_gem.c:744
#3  0xa781945e in i830_uxa_prepare_access (pixmap=0xa039f40,
    access=UXA_ACCESS_RW)
    at /home/ossi/src/dl/xorg/driver/xf86-video-intel/src/i830_uxa.c:498
#4  0xa7826a3c in uxa_prepare_access (pDrawable=0xa368300,
    access=UXA_ACCESS_RW)
    at /home/ossi/src/dl/xorg/driver/xf86-video-intel/uxa/uxa.c:155
#5  0xa782bd93 in uxa_check_image_glyph_blt (pDrawable=0xa368300,
    pGC=0xa428e88, x=146, y=359, nglyph=1, ppci=0xafba218c, pglyphBase=0x0)
    at /home/ossi/src/dl/xorg/driver/xf86-video-intel/uxa/uxa-unaccel.c:273
#6  0x08156051 in miImageText8 (pDraw=0xa368300, pGC=0xa428e88, x=146, y=359,
    count=1, chars=0xa68b560 " esiL\n\a")
    at /home/ossi/src/dl/xorg/xserver/mi/mipolytext.c:114
#7  0x080eaff6 in damageImageText8 (pDrawable=0xa368300, pGC=0xa428e88, x=146,
    y=359, count=1, chars=0xa68b560 " esiL\n\a")
    at /home/ossi/src/dl/xorg/xserver/miext/damage/damage.c:1598
#8  0x08070a0f in doImageText (client=0xa428cc0, c=0xafba2640)
    at /home/ossi/src/dl/xorg/xserver/dix/dixfonts.c:1572
#9  0x08070b33 in ImageText (client=0xa428cc0, pDraw=0xa368300, pGC=0xa428e88,
    nChars=1, data=0xa68b560 " esiL\n\a", xorg=146, yorg=359,
    reqType=<value optimized out>, did=58720274)
    at /home/ossi/src/dl/xorg/xserver/dix/dixfonts.c:1623
#10 0x0808d693 in ProcImageText8 (client=0xa428cc0)
    at /home/ossi/src/dl/xorg/xserver/dix/dispatch.c:2358
#11 0x0808f913 in Dispatch ()
    at /home/ossi/src/dl/xorg/xserver/dix/dispatch.c:426
#12 0x08066e92 in main (argc=9, argv=0xafba27f4, envp=Cannot access memory at address 0x400c6467)
    at /home/ossi/src/dl/xorg/xserver/dix/main.c:282

Comment 2 Wang Zhenyu 2009-08-06 00:36:30 UTC
It looks I have made KMS with UXA working fine on the 845G here. I'm attaching patches here. Oswald, please help to test and verify.
Comment 3 Wang Zhenyu 2009-08-06 00:38:43 UTC
Created attachment 28392 [details] [review]
(kernel patch 1) fix errata for sync flush enable

Kernel patches are against recent linux-2.6 git tip and merge anholt's drm-intel-next tree. They should be just fine to apply to 2.6.31-rc5 to test this.
Comment 4 Wang Zhenyu 2009-08-06 00:39:12 UTC
Created attachment 28393 [details] [review]
(kernel patch 2) fix batch buffer end address
Comment 5 Wang Zhenyu 2009-08-06 00:40:08 UTC
Created attachment 28394 [details] [review]
(xorg driver patch) don't emit render state when enter VT

This is against xf86-video-intel git tip.
Comment 6 Oswald Buddenhagen 2009-08-06 13:43:55 UTC
patches applied against linux 2.6.30.4 and intel master.
well ... it didn't get worse. :D
but after some random vt switching between two x servers and text consoles i got a lockup again.
i'll report missing long-term stability in case it shows. :)

i'm also getting those in the kernel log rather often:
[drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 1

Comment 7 Oswald Buddenhagen 2009-08-06 14:25:29 UTC
bleh - got the spontaneous lockup as well.

fwiw, an attempt to start the old x server after shutting down this one ended in consistent server crashes until i rebooted. i guess some state isn't restored on exit ...
Comment 8 Wang Zhenyu 2009-08-07 02:34:47 UTC
Do you test with KMS? I only tried it with KMS only, and 845G has only one pipe, so that message should be harmless.
Comment 9 Oswald Buddenhagen 2009-08-08 14:47:46 UTC
i have no idea whether i used kms - doesn't the log tell? i used whatever the default is for this driver/kernel combo.
Comment 10 Wang Zhenyu 2009-08-09 18:20:39 UTC
If your kernel config has CONFIG_DRM_I915_KMS=y, then kms will be default on. Or try to load i915 with 'modeset=1', also dmesg will tell if kms is in use.
Comment 11 Oswald Buddenhagen 2009-08-09 23:30:03 UTC
oh, right, i read that when compiling a new kernel some weeks back ... it came with a big warning, so i thought "no" will do for now. :)
so no kms here.
Comment 12 Lenar Lõhmus 2009-08-10 05:06:33 UTC
Since I'm too seeing hangs and this line sometimes in the log:

[drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled
pipe 1

and it  also behaves as described in comment #7, I thought I point your attention to this report:

https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/385232

Maybe the dumps there can help solve the problem quicker. It's quite frustrating.

And as told there, it happens with KMS too and is even more severe when modesetting enabled.

Should I provide something more? Or maybe try some later git head?
Comment 13 Wang Zhenyu 2009-08-10 18:54:58 UTC
Please this bug is 845G only, for different chipset it should be in another bug.
And please try helping to test recent kernel with KMS and my patches attached here.
Comment 14 Oswald Buddenhagen 2009-08-11 00:59:27 UTC
isn't the non-kms-variant supposed to work as well? :}

anyway, so i tried modeset=1.
the good new is that a fresh boot works and hasn't crashed yet (after ~10 minutes ...).
the bad news:
- just unloading the never used i915 module at a vga console leaves me with a black screen
- starting the new server after an old one was running leaves me with a hung black screen
- attempting to switch vts from within the x server leaves, uhm, let's call it "something very arty" and a hung server
- attempting to shut down the server leaves a black screen and a hung server

at least in all cases a "killall -9 X; chvt 1; mode3" from an ssh login restored a workable console. follow-up attempts to fire up the new x server always hang and leave a black screen until i reboot.

the X log is in all cases particularly non-spectacular and doesn't tell anything beyond the non-kms logs.
the kernel log says this so far:

Aug 11 09:34:42 info [drm] Initialized drm 1.1.0 20060810
Aug 11 09:34:42 info i915 0000:00:02.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, low) -> IRQ 10
Aug 11 09:34:42 debug i915 0000:00:02.0: setting latency timer to 64
Aug 11 09:34:43 warning allocated 1280x1024 fb: 0x00fff000, bo e7541360
Aug 11 09:34:43 info fb0: inteldrmfb frame buffer device
Aug 11 09:34:43 info registered panic notifier
Aug 11 09:34:43 info [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
Aug 11 09:34:43 info [drm] DAC-5: set mode 1280x1024 d
Aug 11 09:34:43 info [drm] DAC-5: set mode 1280x1024 17
[and so on when trying to switch vts whichever way]

(the one odd thing seems to be the driver release date ;).
Comment 15 Oswald Buddenhagen 2009-08-11 01:09:31 UTC
oh, and it says this:

(WW) intel(0): Disabling Xv because no adaptors could be initialized.

well, duh - epic fail. of course i have no textured video (even less so given that i disabled DRI), but i kinda expect the hardware overlay to continue to be supported ...
Comment 16 Oswald Buddenhagen 2009-08-11 01:20:06 UTC
guess what ... when i started switching windows right after committing the last message, i got an X server lockup again (prolly hung in S state again, as it responded to kill -9). neither the X nor the kernel log contain any trace of that event.
Comment 17 Wang Zhenyu 2009-09-28 00:41:45 UTC
This should be fixed by Eric's
commit e517a5e97080bbe52857bd0d7df9b66602d53c4d
Author: Eric Anholt <eric@anholt.net>
Date:   Thu Sep 10 17:48:48 2009 -0700

    agp/intel: Fix the pre-9xx chipset flush.
    
    Ever since we enabled GEM, the pre-9xx chipsets (particularly 865) have had
    serious stability issues.  Back in May a wbinvd was added to the DRM to
    work around much of the problem.  Some failure remained -- easily visible
    by dragging a window around on an X -retro desktop, or by looking at bugzilla.
    
    The chipset flush was on the right track -- hitting the right amount of
    memory, and it appears to be the only way to flush on these chipsets, but the
    flush page was mapped uncached.  As a result, the writes trying to clear the
    writeback cache ended up bypassing the cache, and not flushing anything!  The
    wbinvd would flush out other writeback data and often cause the data we wanted
    to get flushed, but not always.  By removing the setting of the page to UC
    and instead just clflushing the data we write to try to flush it, we get the
    desired behavior with no wbinvd.
    
    This exports clflush_cache_range(), which was laying around and happened to
    basically match the code I was otherwise going to copy from the DRM.
    
    Signed-off-by: Eric Anholt <eric@anholt.net>
    Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org>
    Cc: stable@kernel.org

Please test with upstream kernel.
Comment 18 Oswald Buddenhagen 2009-10-03 10:36:09 UTC
i purged all previous patches from both the kernel and the driver, cherry-picked this kernel patch on top of 2.6.31.1 and tried (without kms).

as a "welcome message", i get that:

Oct  3 18:16:42 info [drm] Initialized drm 1.1.0 20060810
Oct  3 18:16:43 info i915 0000:00:02.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, low) -> IRQ 10
Oct  3 18:16:43 debug i915 0000:00:02.0: setting latency timer to 64
Oct  3 18:16:43 info [drm] fb0: inteldrmfb frame buffer device
Oct  3 18:16:43 info [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
Oct  3 18:16:43 err render error detected, EIR: 0x00000010
Oct  3 18:16:43 err [drm:i915_handle_error] *ERROR* EIR stuck: 0x00000010, masking
Oct  3 18:16:43 err render error detected, EIR: 0x00000010

still, it kinda works ...

some widgets in the kde session look "shallow". i suspect some breakage with pixmap rendering. dunno.

then it ran for a while. the old server with XAA still feels a lot snappier than that, though.

then it ground to a halt over half a second or so. reboot from ssh was possible. nothing in the logs.

switching vts yields the same graphics mess as before, but at least the server as such lives on when one switches back to its vt.

"of course", still no xv.

ah, and i get that when i'm going back to the old server+driver:
[drm:i915_initialize] *ERROR* Client tried to initialize ringbuffer in GEM mode
and dri refuses to work. i suppose that's expected.
Comment 19 Wang Zhenyu 2009-10-19 23:48:15 UTC
oh, could you test with KMS enabled?
Comment 20 Oswald Buddenhagen 2009-11-14 01:41:38 UTC
regarding that part from comment 14:
> just unloading the never used i915 module [KMS not enabled] at a vga console leaves me with a black screen
>
that's still true with the current kernel (2.6.31.6).
this effect is observed even when unloading while an xserver (and old one with disabled dri, obviously) is running. and it affects vt switching from the x server to a vga console, i.e., the UMS path. i'd call that "undue interference". :D
Comment 21 Oswald Buddenhagen 2009-11-15 02:41:46 UTC
ok, i'm stupid. i had a module config with modeset=1, so comment 20 is utter nonsense. though i must say the syslog wasn't really helpful in noticing it (it's not like i wouldn't have checked ...).

anyway, i now properly tested the driver (yesterday's master, vanilla 2.6.31.6 kernel) with kms (and fbcon). guess what? it locked up after some time. sysrq-k did *something*, but it was unable to restart a server or drop me to console. the system as such was still alive, though.
Comment 22 Oswald Buddenhagen 2010-10-02 03:21:04 UTC
status update. i'm using this bug as my general "845g doesn't work" dumping ground, so please clone out particular issues you identify here.
using xorg master on top of 2.6.35.7. dri is still disabled.

now that the gpu hangcheck plus fallback mode are in place, the server doesn't randomly lock up any more. also, it doesn't seem to actually crash.

however, in fallback mode, *something* gets mixed up - colors are messed up in qt applications: they do repaints with more or less random color sets. looks something between "interesting" and "unrecognizable".
also, xv doesn't work at all in fallback mode - i would have expected this to be fairly undemanding.

for some reason, recently the gpu started to hang a lot more often than before.

i'll try the new shadow mode and see how things work out ...
Comment 23 Oswald Buddenhagen 2010-10-02 07:54:56 UTC
the shadow mode turned out a complete failure: not only did it not prevent the gpu hang, but it also has no usable fallback, leaving me with a frozen screen. and it's really slow for some pretty basic things like dragging around a window over a kde plasma desktop with a background image - but i'm not complaining about that part. :)

i should probably note that with a hung gpu the mode switching doesn't work particularly reliably. while i was able to vt-switch once after the hang and got a useful console, both switching back to the x server vt and a direct x server shutdown wedge the console (sysrq-b still works though).

out of interest, what is the fundamental architectural difference to the userspace-only driver which makes the chipset bugs so problematic? i mean, the old driver was both reasonably fast (at least for my boring desktop usage) and rock-stable ...
Comment 24 Oswald Buddenhagen 2010-10-24 02:46:59 UTC
yay, shadow mode seems to be stable now.
so i went crazy and enabled DRI. :-D
i guess something goes wrong with tiling ... only the top-left-most ~296^2 pixel square gets rendered. the rest of the window is either black or (after moving the window) a screenshot of some part of the desktop.
Comment 25 Eugeni Dodonov 2011-09-08 15:56:05 UTC
This issue is affecting a hardware component which is not being actively worked on anymore.

Moving the assignee to the dri-devel list as contact, to give this issue a better coverage.
Comment 26 Oswald Buddenhagen 2011-09-08 23:57:03 UTC
actually, let's just close it. last time i tried, things worked fairly ok. too bad that i finally decommissioned the old board shortly afterwards ...

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.