Bug 473

Summary: [Matrox/MGA] endless loop in MGAWaitForIdleDMA
Product: Mesa Reporter: Kenan Esau <kenan.esau>
Component: Drivers/DRI/MGAAssignee: mesa-dev
Status: RESOLVED WORKSFORME QA Contact:
Severity: normal    
Priority: high CC: ajax, bugs, cmetzler, scottfk
Version: unspecified   
Hardware: Other   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: Xorg.0.log
xorg.conf
dmesg output
Output of dmesg
/var/log/messages with MGA_DMA_DEBUG and debug=1

Description Kenan Esau 2004-04-14 09:58:10 UTC
After starting a program which uses DRI (eg glxgears) the screen freezes. I can 
still move the mouse-cursor but nothing else happens. I can still log in to the 
machine using ssh. There is no chance to revive the graphics card but rebooting 
the whole system. 
 
The Xorg.0.log is full of messages like this: 
(EE) MGA(0): [dri] Idle timed out, resetting engine... 
(EE) MGA(0): [dri] Idle timed out, resetting engine... 
(EE) MGA(0): [dri] Idle timed out, resetting engine... 
(EE) MGA(0): [dri] Idle timed out, resetting engine... 
(EE) MGA(0): [dri] Idle timed out, resetting engine... 
(EE) MGA(0): [dri] Idle timed out, resetting engine... 
(EE) MGA(0): [dri] Idle timed out, resetting engine... 
(EE) MGA(0): [dri] Idle timed out, resetting engine... 
 
 
This Bug seems to be there quite a while -- I also saw it on Xfree 4.3.0 
(Debian/unstable). 
 
I am currently using a 2.6.5 Linux-Kernel (vanilla) and I've used the 
DRM-kernel modules which are provided with the kernel and the ones provided 
with the X sources -- no effect.
Comment 1 Kenan Esau 2004-04-14 09:59:40 UTC
Created attachment 199 [details]
Xorg.0.log
Comment 2 Kenan Esau 2004-04-14 10:00:36 UTC
Created attachment 200 [details]
xorg.conf
Comment 3 Kenan Esau 2004-04-27 22:25:54 UTC
Is this the right place to file this bug -- or should I try the DRI-tree and see
if the bug is also present there ??

If you need more information to work on this bug please tell me !!
Comment 4 Adam Jackson 2004-05-30 11:48:10 UTC
Do you get the same behavior with DRI from CVS?

Could you also post dmesg output?
Comment 5 Kenan Esau 2004-05-30 22:19:59 UTC
Created attachment 338 [details]
dmesg output

Here is the requested demsg output
Comment 6 Kenan Esau 2004-05-30 22:28:00 UTC
Yes I get the same behaviour with everything from CVS. I installed everything
like described in the DRI-Wiki.

The DRM stuff from CVS didn't even build because of some silly errors:

****************************
make -C /lib/modules/2.6.7-rc1/build  SUBDIRS=`pwd` DRMSRCDIR=`pwd` modules
make[1]: Entering directory `/store/src/linux-2.6.7-rc1'
  CC [M]  /store/src/drm/linux/mga_drv.o
In file included from /store/src/drm/linux/mga_drv.c:52:
/store/src/drm/linux/drm_vm.h: In function `mga_do_vm_nopage':
/store/src/drm/linux/drm_vm.h:104: error: structure has no member named `count'
make[2]: *** [/store/src/drm/linux/mga_drv.o] Error 1
make[1]: *** [_module_/store/src/drm/linux] Error 2
make[1]: Leaving directory `/store/src/linux-2.6.7-rc1'
make: *** [modules] Error 2
****************************

After removing the access to the non-existing member of the structure it did build.

But there is still no change in behaviour. I guess the problem is either in the
Xfree-DRI-module or in the mga-kernel-module (more likely the latter). And there
hasn't been a significant change since a while in those modules ...
Comment 7 Adam Jackson 2004-06-08 08:49:13 UTC
mm.  MGA development's been quiet for a while so i'm wondering if there's a
kernel change causing this.  can you test with, say, 2.6.0 or 2.4.whatever?
Comment 8 Brandon D. Valentine 2004-07-05 21:38:38 UTC
(In reply to comment #7)
> mm.  MGA development's been quiet for a while so i'm wondering if there's a
> kernel change causing this.  can you test with, say, 2.6.0 or 2.4.whatever?

Comment 9 Brandon D. Valentine 2004-07-05 22:36:47 UTC
(In reply to comment #7)
> mm.  MGA development's been quiet for a while so i'm wondering if there's a
> kernel change causing this.  can you test with, say, 2.6.0 or 2.4.whatever?

[ Apologies for the empty reply that preceded this.  Fat fingered inside
the browser while adding myself to the Cc list. ]

I wanted to confirm that this bug is also present on the following
system:

  523 dallben:/export/dallben/bandix% uname -a
  FreeBSD dallben 5.2.1-RELEASE-p8 FreeBSD 5.2.1-RELEASE-p8 #1: Mon Jun  7
  22:18:22 CDT 2004
  root@dallben.prydain.us:/usr/obj/usr/src/sys/DALLBEN  i386
  524 dallben:/export/dallben/bandix% pkg_info | grep XFree86
  XFree86-4.3.0,1     X11/XFree86 core distribution (complete, using
  mini/meta-po
  XFree86-FontServer-4.3.0_3 XFree86-4 font server
  XFree86-Server-4.3.0_14 XFree86-4 X server and related programs
  XFree86-clients-4.3.0_8 XFree86-4 client programs and related files
  XFree86-documents-4.3.0 XFree86-4 documentation
  XFree86-font100dpi-4.3.0 XFree86-4 bitmap 100 dpi fonts
  XFree86-font75dpi-4.3.0 XFree86-4 bitmap 75 dpi fonts
  XFree86-fontCyrillic-4.3.0 XFree86-4 Cyrillic fonts
  XFree86-fontDefaultBitmaps-4.3.0 XFree86-4 default bitmap fonts
  XFree86-fontEncodings-4.3.0 XFree86-4 font encoding files
  XFree86-fontScalable-4.3.0 XFree86-4 scalable fonts
  XFree86-libraries-4.3.0_7 XFree86-4 libraries and headers
  dri-4.3.0,1         OpenGL hardware acceleration drivers for XFree86
  imake-4.3.0_2       Imake and other utilities from XFree86
  wrapper-1.0_3       Wrapper for XFree86-4 server
  525 dallben:/export/dallben/bandix% glxinfo
  name of display: :0.0
  display: :0  screen: 0
  direct rendering: Yes
  server glx vendor string: SGI
server glx version string: 1.2
  server glx extensions:
      GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_EXT_import_context
  client glx vendor string: SGI
  client glx version string: 1.2
  client glx extensions:
      GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_EXT_import_context
  GLX extensions:
      GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_EXT_import_context
  OpenGL vendor string: VA Linux Systems Inc.
  OpenGL renderer string: Mesa DRI G200 20020221 AGP 1x x86/MMX/3DNow!/SSE
  OpenGL version string: 1.2 Mesa 4.0.4

[ glxinfo OpenGL Extensions output trimmed ]

I realize I am not running the latest X, nor the Xorg X, but I have not
noticed any significant changes to MGA since the version included in
FreeBSD 5.2.1's DRM and XFree86 4.3.0's DRI.  I have not seen any
commits to the MGA drivers since the last update of this bug.  Has there
been any progress on diagnosing this issue?  I have the same "Idle timed
out" messages as the originator of this bug.  OpenGL applications are
unusable on my Matrox G200 due to this bug.  I would very much like to
resolve this issue without buying a new video card.  I am willing to
spend some time on it.  If there has been no progress, how can I help?
  
I am going to try running the glean suite on my system and will report
back whether or not it succeeds and where it fails if I can obtain that
information.
  
Thanks,
  
Brandon D. Valentine
Comment 10 Eric Anholt 2004-08-14 18:56:36 UTC
As I've said in other bugs, I'm inclined to blame AGP for many DRI hangs.  I
don't follow Linux kernel development, so I can't say much about the reporter's
issue, but Brandon, could you send me your dmesg?
Comment 11 Jona Brax 2005-10-09 05:13:00 UTC
Created attachment 3519 [details]
Output of dmesg

I can reproduce this, happens every time when I start wine with GoogleEarth.exe

Mouse moves, but nothing else. The only way to recover is ssh to the machine
and reboot. 

WHAT HAPPENS:
I start GoogleEarth in terminal with command: 
WINEDLLOVERRIDES="ole32,oleaut32,rpcrt4=n" wine .wine/drive_c/Program\
Files/Google/Google\ Earth\ Plus/GoogleEarth.exe
I get the GoogleEarth splash screen, and then the display hangs. 

Terminal: 
fixme:font:ExtTextOutW called on an open path
fixme:font:ExtTextOutW called on an open path
fixme:font:ExtTextOutW called on an open path

...

Xorg.log: 
(EE) MGA(0): [dri] Idle timed out, resetting engine...
(EE) MGA(0): [dri] Idle timed out, resetting engine...
(EE) MGA(0): [dri] Idle timed out, resetting engine...
(EE) MGA(0): [dri] Idle timed out, resetting engine...

PLATFORM:
Matrox Millennium G450, Asus A7V880 motherboard
Fedora Core 4, Linux 2.6.13-1.1526_FC4 i686 athlon i386 GNU/Linux
Comment 12 Ian Romanick 2005-10-12 08:43:47 UTC
Reassigning to mesa3d-dev.
Comment 13 Ian Romanick 2005-10-12 08:45:24 UTC
*** Bug 715 has been marked as a duplicate of this bug. ***
Comment 14 Ian Romanick 2006-10-19 15:33:47 UTC
I am now also able to reproduce this bug.

The X server hangs in MGAWaitForIdelDMA.  drmCommandWrite is always returning
-EBUSY.  The DRI driver then hangs at a different location in drmGetLock.

All of my bits (DRM, DRI, X server, and 2D driver) are current as of yesterday.
Comment 15 Ian Romanick 2006-10-20 09:40:10 UTC
Created attachment 7476 [details]
/var/log/messages with MGA_DMA_DEBUG and debug=1

I rebuilt mga.ko with MGA_DMA_DEBUG and loaded drm.ko with debug=1.  This is
the output of /var/log/messages when glxgears was run.

I examined this log and compared it to the code.  Here's the part that confuses
me.  It appears that mga_do_wait_for_idle returns success.  In that case, the
ioctl should return success and X should be happy.  Why is X acting like the
ioctl is returning -EBUSY?
Comment 16 Ian Romanick 2006-10-20 09:47:09 UTC
(In reply to comment #15)

> I examined this log and compared it to the code.  Here's the part that confuses
> me.  It appears that mga_do_wait_for_idle returns success.  In that case, the
> ioctl should return success and X should be happy.  Why is X acting like the
> ioctl is returning -EBUSY?

Ignore that bit.  I hadn't looked far enough down in the log.
Comment 17 Ian Romanick 2006-10-23 15:55:51 UTC
I've done a bunch more debugging on this problem, and I'm totally stumped.  With
glxgears, the chip processes the first 0x290 bytes (20 and a half vertices) in
the first vertex secondary buffer.  Then it locks.

I tried a different program that draws a single, flat triangle.  After
processing approximately the same amount of vertex data, the chip locks.

However, if I take the same card and put it in a different system, it works
perfectly.  The non-functional system is an ASRock 939Dual-VSTA Socket 939 ULi
M1695 with an Athlon64 3000+.  The functional system is Epox 8K3A+ with an
Athlon 2200+.  I'm leaning towards one of two possabilities.

1. ULi M1695 has AGP related problems.

2. MGA driver has x86-64 related problems.

I have a PCI G450 and a PCI-e G550, so I'll give those a go in the ULi system. 
Stay tuned...
Comment 18 Chris Metzler 2006-10-24 07:11:09 UTC
Re: the last poster writing:

> I'm leaning towards one of two possabilities.
>
> 1. ULi M1695 has AGP related problems.
>
> 2. MGA driver has x86-64 related problems.

I had this bug, and opened it at freedesktop.org (I'm in the CC for this bug #,
so I assume they got merged together), at XF86 (since the X.org stuff hadn't
happened yet), and one other place I can't remember (DRI?).  The system I had at
the time which generated this bug, as noted in my bug posts, was an Athlon XP
2000+ (so this wasn't just an x86-64 issue), on an ASus A7V333 mobo (VIA KT333
chipset -- so this wasn't just an issue for a single chipset either).

Unfortunately I can't provide any new/useful information about this; this bug
happened so long ago that I don't have the system on which it occurred anymore.
Comment 19 Ian Romanick 2006-10-30 15:59:51 UTC
It turns out that the problem I was seeing was actually bug #8666, which is now
fixed in xf86-video-mga-1.4.4.  I'm going to close this bug, too.  As old as
this is, the original cause may well be fixed / changed / moved / etc.  If this
problem crops up again, we'll open a new bug for it.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.