Bug 16521 - [915GM/945GM] ring hang during fbo_firecube demo
[915GM/945GM] ring hang during fbo_firecube demo
Status: RESOLVED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i915
git
x86 (IA32) Linux (All)
: medium major
Assigned To: haihao
Xorg Project Team
:
Depends on:
Blocks: 18841
  Show dependency treegraph
 
Reported: 2008-06-25 09:57 UTC by Tobias Jakobi
Modified: 2009-08-24 12:30 UTC (History)
10 users (show)

See Also:


Attachments
old xorg log saved after the x crash (and reboot of the system) (20.25 KB, text/plain)
2008-06-25 10:00 UTC, Tobias Jakobi
Details
log after crash (37.31 KB, application/octet-stream)
2008-07-04 11:28 UTC, Dmitry Rudakov
Details
new log from the crash (happened today) (20.24 KB, text/plain)
2008-07-12 02:23 UTC, Tobias Jakobi
Details
font after setting EXANoComposite=true (121.20 KB, image/png)
2008-07-21 05:08 UTC, Tobias Jakobi
Details
latest lockup log (xorg-server-1.5 + intel-2.4.2-r1) (20.35 KB, text/plain)
2008-09-15 13:31 UTC, Tobias Jakobi
Details
Patch making glBitmap fall back to software on fbo (713 bytes, patch)
2008-12-05 12:43 UTC, Pierre Willenbrock
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Jakobi 2008-06-25 09:57:41 UTC
X just went blank and seemed to try a restart of the server, failing (screen stays black).

After shutting down the system via ACPI keys (this still worked) I checked both the kernel log and Xorg log. 
ernel log: nothing there

Xorg log: Error in I830WaitLpRing() with some additional lines (attaching log)

uname -a:
Linux leena 2.6.25-gentoo-r5 #1 PREEMPT Thu Jun 19 20:27:19 CEST 2008 i686 Intel(R) Celeron(R) M processor 1.50GHz GenuineIntel GNU/Linux

lspci (graphics card):
00:02.0 VGA compatible controller: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 03)

xorg-server, libdrm, kernel DRM module and mesa are GIT (snapshot from today).
intel driver version: xf86-video-intel-2.3.2

Thanks,
Tobias
Comment 1 Tobias Jakobi 2008-06-25 10:00:05 UTC
Created attachment 17377 [details]
old xorg log saved after the x crash (and reboot of the system)
Comment 2 Gordon Jin 2008-07-01 18:29:36 UTC
Can you find a way to steadily reproduce it?
Comment 3 Dmitry Rudakov 2008-07-04 11:26:25 UTC
Have same problem, attach log file. Try restart xserver some times, but it exit with same message, run only after 'reboot' command.
Comment 4 Dmitry Rudakov 2008-07-04 11:28:18 UTC
Created attachment 17534 [details]
log after crash
Comment 5 Tobias Jakobi 2008-07-07 01:50:49 UTC
(In reply to comment #2)
> Can you find a way to steadily reproduce it?
> 

Not really. Also I didn't encounter this problem since the last time it appeared (that was when I reported it here).

I updated libdrm and DRM kernel module in the meantime though, so maybe this was fixed already.

@Dmitry: You're also using GIT master?
Comment 6 Dmitry Rudakov 2008-07-07 12:38:52 UTC
В Пнд, 07/07/2008 в 01:50 -0700, bugzilla-daemon@freedesktop.org пишет:
> http://bugs.freedesktop.org/show_bug.cgi?id=16521
> 
> 
> 
> 
> 
> --- Comment #5 from Tobias Jakobi <liquid.acid@gmx.net>  2008-07-07 01:50:49 PST ---
> (In reply to comment #2)
> > Can you find a way to steadily reproduce it?
> > 
> 
> Not really. Also I didn't encounter this problem since the last time it
> appeared (that was when I reported it here).
> 
> I updated libdrm and DRM kernel module in the meantime though, so maybe this
> was fixed already.
> 
> @Dmitry: You're also using GIT master?
Im use Debian testing and libdrm version on it 2.3.0
Comment 7 Zdenek Kabelac 2008-07-11 15:07:22 UTC
Hmm and another same one related to my post #16664 - is anyone aware of some recent change which could lead to this problem - looks like its across distributions and different recent Xorg versions.
Comment 8 Tobias Jakobi 2008-07-12 02:23:53 UTC
Created attachment 17644 [details]
new log from the crash (happened today)

Hi again,

the bug is not fixed for me. It happened again today without any kind of warning.

...attaching xorg log from after the "crash", nothing in the kernel log.

Greets,
Tobias
Comment 9 Joshua J. Berry 2008-07-15 01:36:14 UTC
I'm also seeing this bug.  X crashed with SIGSEGV and the following log message/backtrace.

I'm using Xorg 1.4.2 on Linux 2.6.25-gentoo-r6, on x86_64.
I have version 2.3.2 of the Intel video driver installed.
Graphics card lists itself as "Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)"

I was using KDE's trunk (post-4.1) at the time, and I had compositing enabled (via OpenGL).  The X server itself may have been running for a few days (although it was reset via logout/login not more than 24 hours ago), but the login session was not more than a few hours.

I wasn't doing anything fancy at the time -- I had KDE's System Settings open, and had just clicked on an icon to open one of the control panel modules.  Earlier I had been watching video in Xine (using the opengl VO plugin).

Here's the relevant snippet of log from the crash (from my kdm.log):

-------------------------------------
Error in I830WaitLpRing(), timeout for 2 seconds
pgetbl_ctl: 0x7ffc0001 getbl_err: 0x00000010
ipeir: 0x00000000 iphdr: 0x02000011
LP ring tail: 0x00001060 head: 0x0001fa14 len: 0x0001f001 start 0x00000000
eir: 0x0000 esr: 0x0010 emr: 0xffff
instdone: 0xfa41 instpm: 0x0000
memmode: 0x00000306 instps: 0x800f00c4
hwstam: 0xfffe ier: 0x0082 imr: 0x0000 iir: 0x0050
Ring at virtual 0x7fd0eff13000 head 0x1fa14 tail 0x1060 count 1427
        0001f994: 00000000
        0001f998: 02203c00
        0001f99c: 01230044
        0001f9a0: 44004444
        0001f9a4: 00000000
        0001f9a8: 00000000
        0001f9ac: 00000000
        0001f9b0: 00000000
        0001f9b4: 00000000
        0001f9b8: 00000000
        0001f9bc: 7f1c000b
        0001f9c0: 41ff0000
        0001f9c4: 40380000
        0001f9c8: 3f800000
        0001f9cc: 3f800000
        0001f9d0: 41ef0000
        0001f9d4: 40380000
        0001f9d8: 00000000
        0001f9dc: 3f800000
        0001f9e0: 41ef0000
        0001f9e4: be000000
        0001f9e8: 00000000
        0001f9ec: 00000000
        0001f9f0: 54f00006
        0001f9f4: 03cc0010
        0001f9f8: 00000000
        0001f9fc: 00030003
        0001fa00: 028fd010
        0001fa04: 00030003
        0001fa08: 00000020
        0001fa0c: 028fcf50
        0001fa10: 02000011
        0001fa14: 00000000
Ring end
space: 125356 wanted 131064

Fatal server error:
lockup


Backtrace:
0: /usr/bin/X(xf86SigHandler+0x6a) [0x495def]
1: /lib/libc.so.6 [0x7fd102b0a2c0]
2: /usr/bin/X(XkbEnableDisableControls+0x11) [0x55e46c]
3: /usr/bin/X(XkbRemoveResourceClient+0xaf) [0x55ff18]
4: /usr/bin/X [0x4498f4]
5: /usr/bin/X(CloseDownDevices+0x1f) [0x449af2]
6: /usr/bin/X(AbortServer+0x13) [0x593e9b]
7: /usr/bin/X(FatalError+0xd5) [0x59442f]
8: /usr/lib64/xorg/modules/drivers//intel_drv.so(I830WaitLpRing+0x181) [0x7fd100cbf858]
9: /usr/lib64/xorg/modules/drivers//intel_drv.so(I830Sync+0x1b3) [0x7fd100cbfc2d]
10: /usr/lib64/xorg/modules//libexa.so(exaWaitSync+0x35) [0x7fd0fff178ae]
11: /usr/lib64/xorg/modules//libexa.so(exaPrepareAccess+0x51) [0x7fd0fff18165]
12: /usr/lib64/xorg/modules//libexa.so(ExaCheckPutImage+0x3b) [0x7fd0fff1fdf7]
13: /usr/lib64/xorg/modules//libexa.so [0x7fd0fff19475]
14: /usr/bin/X [0x53e563]
15: /usr/bin/X(ProcPutImage+0x188) [0x44d284]
16: /usr/bin/X(Dispatch+0x33c) [0x4508c6]
17: /usr/bin/X(main+0x4a4) [0x4373cd]
18: /lib/libc.so.6(__libc_start_main+0xe6) [0x7fd102af6486]
19: /usr/bin/X(FontFileCompleteXLFD+0x291) [0x4366b9]

FatalError re-entered, aborting
Caught signal 11.  Server aborting
-------------------------------------

My X server now refuses to start.  (I have tried unloading and reloading the i915 kernel module, to no avail).  Subsequent attempts to start the X server fail with the following message:

-------------------------------------
Error in I830WaitLpRing(), timeout for 2 seconds
pgetbl_ctl: 0x7ffc0001 getbl_err: 0x00000010
ipeir: 0x00000000 iphdr: 0x02000011
LP ring tail: 0x0001f9f0 head: 0x0001fa14 len: 0x0001f001 start 0x00000000
eir: 0x0000 esr: 0x0010 emr: 0xffff
instdone: 0xfa41 instpm: 0x0000
memmode: 0x00000306 instps: 0x800f04c4
hwstam: 0xfffe ier: 0x0002 imr: 0x0000 iir: 0x00f0
Ring at virtual 0x7fa6b52eb000 head 0x1fa14 tail 0x1f9f0 count 32759
        0001f994: 03cc2000
        0001f998: 00280098
        0001f99c: 002c009c
        0001f9a0: 01000000
        0001f9a4: 00000000
        0001f9a8: 00000010
        0001f9ac: 02000000
        0001f9b0: 54f00006
        0001f9b4: 03cc2000
        0001f9b8: 0028009c
        0001f9bc: 002c00a0
        0001f9c0: 01000000
        0001f9c4: 00000000
        0001f9c8: 00000010
        0001f9cc: 02000000
        0001f9d0: 54f00006
        0001f9d4: 03cc2000
        0001f9d8: 002800a0
        0001f9dc: 002c00a4
        0001f9e0: 01000000
        0001f9e4: 00000000
        0001f9e8: 00000010
        0001f9ec: 02000000
        0001f9f0: 54f00006
        0001f9f4: 03cc0010
        0001f9f8: 00000000
        0001f9fc: 00030003
        0001fa00: 028fd010
        0001fa04: 00030003
        0001fa08: 00000020
        0001fa0c: 028fcf50
        0001fa10: 02000011
        0001fa14: 00000000
Ring end       
space: 28 wanted 32
               
Fatal server error:
lockup         
-------------------------------------

Repeated attempts to restart X result in the same error.

I haven't yet tried rebooting, as the machine has some other tasks to complete (compile jobs, backups, etc.).  If rebooting doesn't help, I'll leave another note.  (I should note the machine has been up--and running X, with resets every day--for 7 days now.)

Hope this helps.
Comment 10 Gordon Jin 2008-07-17 23:21:16 UTC
I tend to reassign such tough bugs to Jesse.
Comment 11 Jesse Barnes 2008-07-18 11:45:36 UTC
Gee, thanks Gordon. :p

Yeah, often you have to reboot after a chip lockup since we don't include code to do a full reset yet.

So Joshua is using KDE's compositing features, Joshua are you using GL or Render based composition (I think there's a KDE setting for that).

Tobias, does the crash happen after you've run 3D applications?

Often ring hangs like this are due to bad programming on the Mesa side, but they can also be due to the render acceleration code (most of the other code is simple enough not to trigger these problems), so you could try disabling render accel with Option "ExaNoComposite" "true" in the intel driver section of xorg.conf.  It would be good if we could narrow things down to one or the other; generic ring hangs are hard to debug.
Comment 12 Joshua J. Berry 2008-07-18 12:02:23 UTC
(In reply to comment #11)
> So Joshua is using KDE's compositing features, Joshua are you using GL or
> Render based composition (I think there's a KDE setting for that).

I'm using OpenGL, in "Texture from Pixmap" mode.  Both "Direct rendering" and "Use VSync" are turned on.

Comment 13 Jesse Barnes 2008-07-18 12:29:12 UTC
Ok, that's a good data point.  Can you try reproducing with XRender based compositing instead?
Comment 14 Joshua J. Berry 2008-07-18 13:14:53 UTC
(In reply to comment #13)
> Ok, that's a good data point.  Can you try reproducing with XRender based
> compositing instead?

I'll try, but it happens very infrequently, so I don't know how much luck I'll have.
Comment 15 Tobias Jakobi 2008-07-18 13:48:52 UTC
First of all I'm using a standard xfce4 setup. No composite, no compiz and other fancy stuff.

(In reply to comment #11)
> Tobias, does the crash happen after you've run 3D applications?
Not really, the last time X locked up this way it was minutes after I had started the system. I was just doing regular browsing on the net, opened a virtual terminal to sync my gentoo portage tree. After typing in the command X went blank... and you know the rest :)



> 
> Often ring hangs like this are due to bad programming on the Mesa side, but
> they can also be due to the render acceleration code (most of the other code is
> simple enough not to trigger these problems), so you could try disabling render
> accel with Option "ExaNoComposite" "true" in the intel driver section of
> xorg.conf.  It would be good if we could narrow things down to one or the
> other; generic ring hangs are hard to debug.
> 

So you sugges to switch render accel off and hope that the hang doesn't appear anymore?

Greets,
Tobias
Comment 16 Jesse Barnes 2008-07-18 14:03:50 UTC
Thanks for the update Tobias, yeah I'm just curious if the hang will happen with render accel disabled.  That's not a real fix of course (we want render accel to work) but it will tell us if the problem is likely there or not...
Comment 17 Tobias Jakobi 2008-07-21 04:45:30 UTC
Hi there, reporting back.

I haven't yet disabled EXA compositing in xorg.conf, but I encountered this.

As already said I don't have anything related to composite enabled in xfce. Support is compiled in though, so I just tried to turn composite on in the settings manager. Did work for some seconds, but as soon as I opened Seamonkey and moved around the window X locked up and restarted (restart fails though).
So this seems to be a way to reproduce the problem.

I'm retesting this and then I'm gonna disabled EXA composite in xorg.conf, maybe this helps.

Greets,
Tobias
Comment 18 Tobias Jakobi 2008-07-21 05:08:47 UTC
Created attachment 17787 [details]
font after setting EXANoComposite=true

OK, disabling EXA composite isn't an option for me. It simply leaves all text garbled. Seems like it's used somewhere even if no explicitly activated in the settings menu.
Comment 19 Tobias Jakobi 2008-07-24 16:23:00 UTC
Found a way to reproduce the crashing.

Just use the new fbo_firecube demo from the mesa git repository. I crashes X in under 5 seconds, at least for me...

Greets,
Tobias
Comment 20 Jesse Barnes 2008-07-24 16:38:31 UTC
Ooh good, that means I get to reassign to one of the 3D guys :)
Comment 21 Tobias Jakobi 2008-09-15 13:25:09 UTC
Reconfirming with different software setup:

xf86-video-i810-2.4.2-r1
libdrm-2.3.1
mesa-7.1,
xorg-server-1.5.0
gentoo-sources-2.6.26-r1

DRM kernel module was build from the kernel sources.

With this setup I don't have any FBO caps exported in the GL extension string, so reproducing the lockup with this setup and fbo_firecube is not possible.

I'm now attaching the new lockup xorg.log...
Comment 22 Tobias Jakobi 2008-09-15 13:31:23 UTC
Created attachment 18889 [details]
latest lockup log (xorg-server-1.5 + intel-2.4.2-r1)
Comment 23 Tobias Jakobi 2008-10-12 10:46:47 UTC
Reconfirming with just another setup (the one I used currently).

xf86-video-i810-9999 (git intel-2.5 branch)
libdrm-9999 (git master branch)
mesa-9999 (git master branch)
xorg-server-1.5.1
anholt's linux 2.6 tree (GEM enabled)

This is a GEM powered setup (and I can confirm GEM works but other 3D applications).

When starting fbo_firecube with this setup X just stops responding and the graphics freezes. There is still some harddrive activity, but I can't seem to able to shutdown the system with ACPI buttons or magic SysRq.

Turning it off by holding the power button is the only working method.

Notice that the behaviour is not the same like before. X doesn't crash (and tries to restart) this time, it simply freezes. Everything: graphics + input

I also can't find any good information in the system logs. Nothing there...
Comment 24 Pierre Willenbrock 2008-11-07 09:30:54 UTC
For me, fbo_firecube works with glutBitmapCharacter commented out. Further testing revealed the glBitmap call used by glutBitmapCharacter does not play nicely with fbos. It does not honour the borders of the fbo. For example, coordinates wrap at the right border to the left, and seem to be clipped at the width of the window. I don't know if the y-coordinates grow upwards, but if they do, glBitmap probably does the same in that direction, with the difference that the fbo's data storage ends there and some other object begins, and is corrupted. Either way, glBitmap drawing beyond the top of the fbo is an easy way to make my graphics hang. (additionally, the position where glBitmap renders seems to be a bit erratic when there are no other primitives drawn previously)
Comment 25 Sven Arvidsson 2008-11-15 15:47:48 UTC
I'm using a G45 and fbo_firecube is one of the many 3D apps which results in a complete X hang. I have previously reported this in bug 18081.

I'm not sure if it's the same problem or simply a symptom of something else.
Comment 26 Michael Fu 2008-11-18 00:56:06 UTC
(In reply to comment #18)
> Created an attachment (id=17787) [details]
> font after setting EXANoComposite=true
> 
> OK, disabling EXA composite isn't an option for me. It simply leaves all text
> garbled. Seems like it's used somewhere even if no explicitly activated in the
> settings menu.
> 

what if you use XAA instead of EXA? I suspect the fbo_firecube bug isn't the same issue as this bug was originally opened for...
Comment 27 Gordon Jin 2008-12-04 21:10:50 UTC
Haien/Jiewen, do you see this issue when running mesa demo fbo_firecube with the latest code?
Comment 28 liuhaien 2008-12-04 23:59:47 UTC
(In reply to comment #27)
> Haien/Jiewen, do you see this issue when running mesa demo fbo_firecube with
> the latest code?
> 
it works well with the latest code.
xf86_video_intel   xf86-video-intel-2.6-branch
       commit b156b3165e1aae5df0353737d0335ac2e653f5fd

mesa   intel-2008-q4 branch
       commit 39091cc6385e6253464900e436cd7e9c04409ce6

drm   shipped with kernel

libdrm  master branch
       commit b0d93c74d884b40bd94469a5ef75fdb2fef17680
GEM_kernel:       (for-airlied)66647dc60d16fae9f6963fd98b6d9baa1a8dac69
Comment 29 Philip Langdale 2008-12-05 12:05:36 UTC
Haien, Can you elaborate on the configuration you used to test this successfully?

I have updated my trees to match yours but cannot get any gl programs to run successfully except glxinfo.

fbo_firecube segfaults and glxgears asserts at 'vbo/vbo_save_api.c' which I believe is in intel_drm.so

This is with a 'master' xserver build from two days ago and I've tried with and without uxa - it makes no difference.
Comment 30 Pierre Willenbrock 2008-12-05 12:43:19 UTC
Created attachment 20843 [details] [review]
Patch making glBitmap fall back to software on fbo

Since there is definitely a bug in intels glBitmap handling code(using display coordinates on an fbo is definitely wrong), this patch makes it fall back if used on fbos. So, this should be able to get that issue out of the way debugging anything else that may be left.
Comment 31 Philip Langdale 2008-12-05 13:50:53 UTC
Ah, found the problem.

Haien, you mentioned the intel-2008-q4 branch but the commit you gave was off 'master'. I first tried with the branch and that gave me the error I mentioned so I switched back to master and it now works.

Thanks.
Comment 32 Gordon Jin 2008-12-06 03:44:24 UTC
(In reply to comment #31)
> Ah, found the problem.
> Haien, you mentioned the intel-2008-q4 branch but the commit you gave was off
> 'master'. I first tried with the branch and that gave me the error I mentioned
> so I switched back to master and it now works.

Good to know it works now, though it's interesting why the master works while intel-2008-q4 is said to be not working, as it's just a snapshot of master about 2 days ago.
Comment 33 Gordon Jin 2008-12-06 03:46:12 UTC
it seems this bug can be closed?
Comment 34 Pierre Willenbrock 2008-12-06 06:10:13 UTC
Comment on attachment 20843 [details] [review]
Patch making glBitmap fall back to software on fbo

My patch did not even compile. Corrected version can be found in bug #18914 if anyone is still interested. Sorry for the noise.
Comment 35 Tobias Jakobi 2008-12-06 07:39:15 UTC
Still not fixed for me. Using Intel i915 with recent mesa git master, libdrm git master and drm-intel-next kernel branch.

I start some music, then start fbo_firecube. X freezes but the music continues to play.

VT switching doesn't work, ACPI buttons seem to trigger shutdown process, but it doesn't finish so the system never goes off.
MagicSysRq doesn't work at all.
Comment 36 Sven Arvidsson 2008-12-06 08:52:40 UTC
Still a problem for me on G45, using:

-- xf86-video-intel: 2e3c098c5ed9a8451713dc754a5f086992249336
-- xserver: 1.5.2
-- mesa: 6e0f8b174dddeb743b4bdc0d831eb1121f62ff50
-- drm: b0d93c74d884b40bd94469a5ef75fdb2fef17680
-- kernel: for-airlied 66647dc60d16fae9f6963fd98b6d9baa1a8dac69

Comment 37 Gordon Jin 2008-12-07 01:49:53 UTC
Haien/Jiewen, please test on more platforms to see if we can reproduce it.

All, we are using server-1.6-branch, and for-airlied kernel.
Comment 38 liuhaien 2008-12-07 18:22:35 UTC
(In reply to comment #31)
> Ah, found the problem.
> 
> Haien, you mentioned the intel-2008-q4 branch but the commit you gave was off
> 'master'. I first tried with the branch and that gave me the error I mentioned
> so I switched back to master and it now works.
> 
> Thanks.
> 

sorry, my fault.
Comment 39 Eric Anholt 2008-12-08 11:30:07 UTC
The fbo_firecube glBitmap problem is now fixed in master, but I'm reasonably certain that fbo_firecube has nothing to do with the original issue reported (unless you were running 3D applications using FBOs)
Comment 40 Tobias Jakobi 2008-12-08 12:06:38 UTC
Thanks Eric!

(In reply to comment #39)
> The fbo_firecube glBitmap problem is now fixed in master, but I'm reasonably
> certain that fbo_firecube has nothing to do with the original issue reported
> (unless you were running 3D applications using FBOs)

Well, if that's so I think we can resolve this one to FIXED. The original issue, which also happened during normal work inside X didn't happen since some time now.

The fbo_firecube thing was only brought up by me because Gordon Jin was asking for some way to reproduce the hang. At least fbo_firecube did cause the same errors messages in the logs back then.

So, what should I resolve it?
Comment 41 Eric Anholt 2008-12-08 16:21:53 UTC
Both the original issue and fbo_firecube failure are apparently gone now, so marking it fixed.
Comment 42 ajax at nwnk dot net 2009-08-24 12:30:29 UTC
Mass version move, cvs -> git