Bug 25593 - Multiple video cards freeze x-server
Multiple video cards freeze x-server
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Lib/pciaccess
7.5 (2009.10)
x86 (IA32) Linux (All)
: highest critical
Assigned To: Xorg Project Team
Xorg Project Team
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-12-11 11:21 UTC by jcook5376
Modified: 2011-09-24 22:44 UTC (History)
2 users (show)

See Also:


Attachments
lsmod, xorg.conf, lspci, uname -a, and xorg version information (10.00 KB, application/x-tar)
2009-12-11 11:21 UTC, jcook5376
no flags Details
Partial kernel log with vgaarb debug turned on (14.21 KB, text/x-log)
2010-01-25 06:47 UTC, Olivier Valentin
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description jcook5376 2009-12-11 11:21:08 UTC
Created attachment 31999 [details]
lsmod, xorg.conf, lspci, uname -a, and xorg version information

When using Fedora Core 12 (2.6.31.5-127.fc12.i686.PAE) and multiple video cards (so far tested nVidia Quadro VMS 450 and Quadro FX 1700 using driver 190.42)

The X-server will freeze after it comes up to the login screen (and the mouse is moved)

Nothing in the Xorg logs, but in the messages log there is constant messages stating:

vgaarb: this pci device is not a vga device

After 120 seconds, the Xorg will produce a dump:

 localhost kernel: Call Trace:
 localhost kernel: [<c0450f78>] ? add_wait_queue+0x30/0x35
 localhost kernel: [<c062f6bc>] vga_get+0xef/0x113
 localhost kernel: [<c04396c8>] ? default_wake_function+0x0/0x12
 localhost kernel: [<c062f7d1>] vga_arb_write+0xa6/0x3c4
 localhost kernel: [<c04c89ee>] ? rw_verify_area+0x9d/0xc0
 localhost kernel: [<c062f72b>] ? vga_arb_write+0x0/0x3c4
 localhost kernel: [<c04c8dc8>] vfs_write+0x85/0xe4
 localhost kernel: [<c04cf4e6>] ? path_put+0x1a/0x1d
 localhost kernel: [<c04c8ec5>] sys_write+0x40/0x62
 localhost kernel: [<c0408f7b>] sysenter_do_call+0x12/0x28

I noticed the vga_arb_write call.  I know VGA Arbitration is a new feature in X server v1.7.  Possibly a conflict.

This was not an issue in Fedora 11 using X server 1.6.4 with the same graphic card and driver.

All of the displays are active at login time. X only seems to freeze once the mouse is moved.  A single video card in either PCI-E slot works fine, it's only when the second card is added and attempts to drive additional displays.
Comment 1 jcook5376 2009-12-15 12:22:04 UTC
Some additional information.  There is a difference between using a Quadro FX 1700 card and a VMS 450.  I do not received the vgaarb errors when using the FX 1700 cards, but the freeze does not occur.

I stopped Fedora from starting gnome or any Window manager.  It just starts Xorg and does and xsetroot -def.  I configure my xorg.conf file for screens on each card.

All there is, is a mouse cursor that can be moved between displays.  If you move the mouse around for a bit, the machine will reboot.

No obvious errors in the logs.
Comment 2 jcook5376 2009-12-16 09:40:00 UTC
Looks like disabling the nVidia driver from communicating over the legacy VGA channel works as a workaround for this issue.

setting vga_set_legacy_decoding(dev, VGA_RSRC_NONE) at init time will allow multiple gpus to function properly. 
Comment 3 Jeremy Rumpf 2010-01-11 10:06:10 UTC
(In reply to comment #2)
> Looks like disabling the nVidia driver from communicating over the legacy VGA
> channel works as a workaround for this issue.
> 
> setting vga_set_legacy_decoding(dev, VGA_RSRC_NONE) at init time will allow
> multiple gpus to function properly. 
> 

I have this same issue with a GeForce 9500 GT and a GeForce 8400 GS driving three monitors. If you could give me a patch or describe the code fixup, I'd like to verify that it fixes things for me as well.
Comment 4 Olivier Valentin 2010-01-25 01:48:31 UTC
I have the same problem with a dual screen configuration on two cards. The server starts, the greater appears on screen 0, the screen 1 goes to black and everything is ok as long as the mouse pointer stays on screen 0.

Once hung, CPU is idle, and Xorg process is in an uninterruptible sleep.

xserver-xorg-core:2:1.7.4-2
kernel:2.6.32-trunk-686

Will try to compile vgaarb with debug-on to have the call history.

Comment 5 Olivier Valentin 2010-01-25 06:47:59 UTC
Created attachment 32805 [details]
Partial kernel log with vgaarb debug turned on

I recompiled a 2.6.32-2 (Debian) with vgaarb debug.

One X server running: it hangs at 49.047368

If I understand well, at the end of the file, X tries to get a lock on the two cards at the same time.
Comment 6 Jeremy Rumpf 2010-02-10 09:03:01 UTC
http://www.nvnews.net/vbulletin/showthread.php?t=142656

Folks defining how to disable the vga arbiter with NVIDIA binary driver kit.
Comment 7 Olivier Valentin 2010-02-15 00:05:05 UTC
(In reply to comment #6)

In my particular case, there is no nvidia card involved. My configuration is made of a Matrox card (mga) and an i915 (intel). So, no proprietary driver.

Also, contrary to the thread you mentioned, when my server hangs, the CPU is idle.

Either card works ok on its own even with VGAARB enabled.
But:
* if I start one server with the 2 cards, it hangs.
* if I start simultaneously two Xs with one card each, the second one hangs.

Are we sure that drivers release the ARBITER from time to time when idle ?
Comment 8 Jeremy Rumpf 2010-03-02 09:01:42 UTC
Agreed, I don't think this is tied to a particular driver. In the case of the link I provided, I added the code to the nvidia driver as they describe, but X still hangs. It hangs in the same fashion that you described.

X starts, as soon as I move the mouse off the primary screen. X hangs.

My interim solution was to downgrade to X11R7.4 as it does not exhibit the problem:

X.Org X Server 1.5.1
Release Date: 23 September 2008
X Protocol Version 11, Revision 0
Build Operating System: Linux 2.6.31.9-174.fc12.i686 i686 

The server with the issue for me is:

X.Org X Server 1.7.5
Release Date: 2010-02-16
X Protocol Version 11, Revision 0
Build Operating System: x86-04 2.6.18-164.6.1.el5 

I have also tried a R7.5 GIT build with the same issue:

X.Org X Server 1.7.99.3
Release Date: (unreleased)
X Protocol Version 11, Revision 0
Build Operating System: Linux 2.6.31.9-174.fc12.i686 i686 


Comment 9 Olivier Valentin 2010-05-25 03:51:55 UTC
I tried once again this morning with Xorg server 1.7.7 and it still gives the same result: the two screens start, display what has to be displayed, but as soon as the mouse goes over the second screen -> hang

It seems not related to the drivers since the scenario is the exact same if I use VESA as the driver for the two cards or any combinations ! And it still looks like a deadlock using vgaarb.

Olivier

It's been more than one year now since multi-screen is broken...
Comment 10 Tiago Vignatti 2010-05-25 08:32:02 UTC
(In reply to comment #9)
> I tried once again this morning with Xorg server 1.7.7 and it still gives the
> same result: the two screens start, display what has to be displayed, but as
> soon as the mouse goes over the second screen -> hang

very likely you're using SW cursor and this was fixed upstream some weeks ago:

commit 518f3b189b6c8aa28b62837d14309fd06163ccbb
Author: Pierre-Loup A. Griffais <pgriffais@nvidia.com>
Date:   Wed Apr 21 16:46:17 2010 -0700

    mi: don't thrash resources when displaying the software cursor across screens
    
    This changes the DC layer to maintain a persistent set of GCs/pixmaps/pictures
    for each pScreen instead of failing to thrash between them when changing
    screens.
Comment 11 Jeremy Huddleston 2011-09-24 22:44:01 UTC
No response since Tiago mentioned this is likely fixed.  Closing.