Bug 20816 - Make multi-card xorg work again
Summary: Make multi-card xorg work again
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/General (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Tiago Vignatti
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on: 18160 18321 20817 20849
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-23 19:45 UTC by Tim Nelson
Modified: 2010-08-25 07:55 UTC (History)
21 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg.0.log (50.29 KB, text/x-log)
2009-05-06 16:21 UTC, Anibal Avelar
no flags Details
xorg.conf (3.11 KB, application/octet-stream)
2009-05-06 16:22 UTC, Anibal Avelar
no flags Details
lspci -v output (7.06 KB, application/octet-stream)
2009-05-06 16:25 UTC, Anibal Avelar
no flags Details
Denny's Xorg.0.log (19.92 KB, text/x-log)
2009-12-18 13:33 UTC, Denny de la Haye
no flags Details
Denny's xorg.conf (2.40 KB, application/octet-stream)
2009-12-18 13:33 UTC, Denny de la Haye
no flags Details
Xorg.0.log with 2 PCI cards using the nvidia driver (10.19 KB, application/octet-stream)
2010-08-25 07:55 UTC, Thomas Spear
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Nelson 2009-03-23 19:45:48 UTC
This is supposed to be a master bug for multi-card xorg problems.  The basic problem at the moment is that, in the general case, xorg doesn't work with multiple video cards, and hasn't since the libpciaccess revisions.  There are cases where it works (if I recall correctly, sometimes a driver can handle multiple cards of the same brand), but in general, they don't seem to.  Until now, relevant discussions have been at bug #18160, but the problem mentioned in that bug seems to be fixed in the main trunk now (if I understood recent mailing list messages correctly).  If you find a specific problem, please open a sub-bug, so that we can address that specifically.
Comment 1 Tim Nelson 2009-03-23 20:15:37 UTC
In bug #18160, comment 44, Tiago Vignatti argues that the POSTing of secondary
video cards should be sorted out before writing a VGA Arbiter.  My question is,
if the int10 problem is solved (which was done in the xorg server by enabling
the card before the BIOS is read) then is there anything else that needs to be
done to initialise the secondary video cards before worrying about the VGA
arbiter?  
Comment 2 Bill Crawford 2009-03-24 05:03:34 UTC
It's looking as though POSTing still isn't working correctly on some systems (notably mine :o)).

With the older X server (Xorg 1.3 on Fedora 8), it works. Each card comes up in turn, and it looks as though the other two cards are briefly disabled as each POSTs (they turn black). With latest X server in Fedora Rawhide (which is labelled as "1.6.0"), that doesn't happen, and I see random characters appear on the primary screen when the second is POSTing; when display appears, second screen is completely garbled, looking like an "encrypted" satellite or cable channel, I'll try to take a photo some time but my phone doesn't have a very good camera.

What is the older X server doing differently? I've had a quick look at the sources for the int10 module and BIOS-reading, but it's not immediately obvious whether anything is wrong there; I'll look at the ATI driver source when I have a little more time.
Comment 3 Pedro Eugênio Rocha 2009-03-24 07:07:49 UTC
Hi!

I compared the instructions that x86emu executes in both Xorg versions, 1.4.2 and 1.6.99. I realized that in the newer Xorg all the instructions returns 0, differently from the older Xorg. It seems that x86emu was reading or writing in the wrong memory location. As all the operations return 0, the loop in 'hw/xfree/x86emu/decode.c' function 'X86EMU_exec' never exits, hanging the system. I compiled both Xorgs using x86emu int10 backend. On my system, the BIOS is being read correctly. Sorry if I said something that you already know. I'm using a sis and ati video card.
Comment 4 Tim Nelson 2009-03-24 15:53:24 UTC
Looks like I did the right thing by creating these bugzilla issues.  I'm hoping that we can have most of the discussion about specific problems in the bugs that block this one.  I've created bug #20849 for discussion of video cards still not POSTing properly; I'd recommend continuing these discussions there.  
Comment 5 Bill Crawford 2009-03-25 07:27:15 UTC
@Pedro: how are you tracing the emulated instructions? I can see DEBUG_IO_TRACE() etc but how are you setting the flags?
Comment 6 Pedro Eugênio Rocha 2009-03-25 08:24:36 UTC
I'm hacking the function 'X86EMU_exec' and printing the value of variable 'op1' for each iteration of the loop. In the old Xorg, xf86emu returns the right values, but in the new one it always returns 0 (at least in my case).
Comment 7 Anibal Avelar 2009-05-06 16:21:17 UTC
Created attachment 25574 [details]
Xorg.0.log



Added Xorg.0.log latest status after libpciaccess was fixed.
Comment 8 Anibal Avelar 2009-05-06 16:22:52 UTC
Created attachment 25575 [details]
xorg.conf 



My xorg.conf configuration.

I always use only two monitors (the two nVidia). I don't use the Intel card.
Comment 9 Anibal Avelar 2009-05-06 16:25:00 UTC
Created attachment 25576 [details]
lspci -v output



The outpur for lspci -v command.
To see the models of the big bug.

Until I was happy :( Now I can't use two monitor on Xinarama.
Comment 10 Anibal Avelar 2009-05-06 16:29:00 UTC

I added my files to see my problem remains.

The kernel is 2.6.27 comming from a Jaunty upgrade (I had the same problem on Intrepid 2.6.24). Until Hardy Ilived happy. 
Comment 11 Denny de la Haye 2009-05-30 03:00:45 UTC
Also seeing this problem after an upgrade from Ubuntu Hardy to Jaunty.


Here is the xorg.conf for my three screen set-up, and the diffs to get two/one screen set-ups for testing (I just remove the unwanted screens from the server layout section - everything else stays the same):

http://pastebin.com/m6304c8a


Here is the Xorg.0.log for each of the set-ups:

http://pastebin.com/m7af1b62c (2 screens, PCIe card, works)
http://pastebin.com/m177b81ab (1 screen, PCI card, works)
http://pastebin.com/m721a574c (3 screens, both cards - broken)


I think I put a month time-out on those pastebins.  Let me know if you want them uploaded as attachments here, but I'm assuming they don't tell you anything you didn't already know - multi-card has been disabled on purpose from what I can gather.  Bit disappointing from my point of view - what did the project gain by taking out this feature before the replacement was written?
Comment 12 Tim Nelson 2009-05-30 03:36:16 UTC
My understanding is that the old system was that every driver did its own PCI access stuff, and this was causing some kind of problem.  So they wrote libpciaccess which would do all that for them, but no-one ever tested it with multi-card.  Some time afterwards, they realised the problem, and started working on a fix, but it was a bit late to roll back by then.  

Note that I don't know anything; this is all inferences and guesswork based on what I've seen others say.  
Comment 13 freedesktop 2009-06-06 01:08:39 UTC
I can as well confirm this situation. I have Ubuntu 8.04 and Ubuntu 9.04 installed on my multiseat system. I have two separate graphic cards (AGP and PCI)

00:08.0 VGA compatible controller: nVidia Corporation NV11DDR [GeForce2 MX200] (rev b2)
01:00.0 VGA compatible controller: nVidia Corporation NV11DDR [GeForce2 MX200] (rev b2)

Ubuntu 8.04 works like a charm. Ubuntu 9.04 doesn't work and how it breaks is  highly dependant on my BIOS settings. If I setup in BIOS the AGP card as primary, then the PCI seat comes up just fine and allows log in, but the AGP seat screen is corrupted. Both mice seems to be independent. If the PCI is primary, then both seats come up, but the mice and keyboards are synchronized (moving mouse and typing affects both screens) or possibly both displays show the same screen.

If there is anything I can test or supply information for please let me know.
Comment 14 Oli Wade 2009-06-06 01:51:56 UTC
Multiseat (ie: 2 xservers) on 9.04 works fine for me with two identical PCIe nVidia cards:
====
04:00.0 VGA compatible controller: nVidia Corporation NV43 [GeForce 6600 GT] (rev a2)
05:00.0 VGA compatible controller: nVidia Corporation NV43 [GeForce 6600 GT] (rev a2)
====

The trick to separate the keyboard and mouse input is to add the following to xorg.conf:
====
Section "ServerFlags"
        Option  "AutoAddDevices" "false"
        Option  "AutoEnableDevices" "false"
EndSection
====
Comment 15 Ancoron 2009-06-06 03:25:31 UTC
(In reply to comment #14)
> Multiseat (ie: 2 xservers) on 9.04 works fine for me with two identical PCIe
> nVidia cards:

Well that should always (but sometimes even that it not guaranteed) be the case as two or more completely identical cards can use the exactly same BIOS and the only difference in handling that devices is the differnet PCI bus ID.

Otherwise SLI or CrossFire wouldn't work for anyone with a recent xserver.

So one "may" workaround this bug by using multiple exactly same graphics cards instead of different ones. I tried it myself with two ATI Radeon X1950 XTX with exact same specs but as they are different vendors (I guess that's the reason) even that attempt failed for me.
Comment 16 freedesktop 2009-06-06 14:38:49 UTC
> The trick to separate the keyboard and mouse input is to add the following to
> xorg.conf:
> ====
> Section "ServerFlags"
>         Option  "AutoAddDevices" "false"
>         Option  "AutoEnableDevices" "false"
> EndSection
> ====

I already had Option  "AutoAddDevices" "false". Now I added Option  "AutoEnableDevices" "false" and it actually made things worse.

Now the primary display is always corrupted regardless what I select in BIOS as primary card (AGP/PCI). As far as input devices, the behavior is the same both screens react to the same keyboard.

I also cannot get the mouse to work reliably because the behavior has changed between 8.04 and 9.04

In 8.04 I had keyboard 1 at
/dev/input/by-path/pci-0000:00:10.4-usb-0:4.1:1.0-event-kbd
and mouse 1 at
/dev/input/by-path/pci-0000:00:10.4-usb-0:4.1:1.1-event-
and keyboard 2 at 
/dev/input/by-path/pci-0000:00:10.4-usb-0:4.2:1.0-event-kbd
and mouse 2 at
/dev/input/by-path/pci-0000:00:10.4-usb-0:4.2:1.1-event-

In 9.04 the keyboard pats stayed the same, but the mouse path do not exist anymore. I have tried to configure them using /dev/input/eventXXX but in multiseat config it doesnt work.

Comment 17 Tim Nelson 2009-06-08 03:29:09 UTC
Please put stuff about keyboards and mice in a separate bug; this one is about multiple graphics cards.  
Comment 18 freedesktop 2009-06-11 00:01:18 UTC
I got multiseat on Ubuntu 9.04 fully working. The issue with the primary card not being correctly initialized so that the login screen wasn't displayed (was corrupted) was resolved by disabling the graphical boot. 

Once I disabled graphical boot the primary card is first time switched to graphical mode from console mode when the X is started and login screen is displayed. The secondary card is also first time initialized at all when the X is started and login screen is displayed. I guess this step makes the initialization deterministic and therefore it makes it work.
Comment 19 Tim Nelson 2009-06-11 00:19:11 UTC
I'm under the impression that it works for some combinations of cards and not others.  I'm not using graphical boot at the moment at all, and it still didn't work for me the last time I tried it.  
Comment 20 combyrm 2009-11-17 22:07:51 UTC
In all honesty I've been coming back to this problem over and over again for over a year at this point, and in googling and IRC'ing I've run into a LOT of frustrated users and a LOT of discouraged attitudes related to this bug.  I don't want to hurt my reputation by making this change myself, but I would strongly suggest to up the priority from "medium."  I don't think I'd be exaggerating to say this bug is slowing the adoption of free software, because of the amount of users who won't consider using Linux as long as multi-card on Windows "just works" and Linux doesn't even "work" period.  What would it take to get some more action on this, in all seriousness?  Because I think there's a large pool of users who would be willing to contribute something to getting this closed if we knew where and how to do it.  Thanks to all who've worked on it so far.
Comment 21 Tiago Vignatti 2009-11-18 14:54:24 UTC
Secondary card POSTing is working with in 1.7 and linux 2.6.32 using VGA arbitration. I had double check also x86emu which seems to be okay. So, can you please verify with such versions of X and kernel?

Anyway, sadomasochism: reassigning to myself.
Comment 22 Tiago Vignatti 2009-11-18 14:55:51 UTC
(second attempt)
Comment 23 Denny de la Haye 2009-12-18 13:33:22 UTC
Created attachment 32186 [details]
Denny's Xorg.0.log

I can't work out whether I'm seeing this same bug or something new.  My second card seems to be POSTing (I think?  It's in the Xorg.0.log anyway), but it's not detecting the monitors that are connected to it.  Physically removing the primary card from the system and trimming the ServerLayout section to only reference the second Screen results in the monitors being detected and the system coming up as expected.

Attaching xorg.conf and Xorg.0.log, in case they're of interest.


denny@serenity:~$ Xorg -version

X.Org X Server 1.6.4
Release Date: 2009-9-27
X Protocol Version 11, Revision 0
Build Operating System: Linux 2.6.24-23-server x86_64 Ubuntu
Current Operating System: Linux serenity 2.6.31-16-generic #53-Ubuntu SMP Tue Dec 8 04:02:15 UTC 2009 x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-2.6.31-16-generic root=UUID=5aa26f7c-1246-416f-a9cd-4e5fe70a62fe ro quiet splash
Build Date: 26 October 2009  05:19:56PM
xorg-server 2:1.6.4-2ubuntu4 (buildd@) 
	Before reporting problems, check http://wiki.x.org
	to make sure that you have the latest version.
Comment 24 Denny de la Haye 2009-12-18 13:33:42 UTC
Created attachment 32187 [details]
Denny's xorg.conf
Comment 25 Tiago Vignatti 2009-12-18 14:12:05 UTC
you need to upgrade your system. It's already fixed on server 1.7 and kernel 2.6.32.
Comment 26 Denny de la Haye 2009-12-19 01:50:16 UTC
I did see your November comment saying that, thanks.  I look forward to my distributor packaging 1.7 - April 2010 seems the most likely time for it to be available to me.

But as I pointed out in my comment/question, the second card _is_ showing up in my logs here already - it's only the monitors which aren't being detected - so I wasn't sure if this is the same bug or something different.  It'd be a shame to reach April and find that Xorg still can't see the monitors which are attached to my second card  :)
Comment 27 Thomas Spear 2010-08-24 11:16:28 UTC
I'm seeing issues similar to this as well on Fedora 13 with all of the latest updates. See below:

[root@tomcat ~]# cat /etc/redhat-release 
Fedora release 13 (Goddard)
[root@tomcat ~]# uname -r
2.6.33.6-147.2.4.fc13.x86_64
[root@tomcat ~]# X -version

X.Org X Server 1.8.2
Release Date: 2010-07-01
X Protocol Version 11, Revision 0
Build Operating System: x86-16 2.6.32-44.el6.x86_64 
Current Operating System: Linux tomcat.localdomain 2.6.33.6-147.2.4.fc13.x86_64 #1 SMP Fri Jul 23 17:14:44 UTC 2010 x86_64
Kernel command line: ro root=/dev/mapper/vg_tomcat-lv_root rd_LVM_LV=vg_tomcat/lv_root rd_LVM_LV=vg_tomcat/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=us rhgb quiet rdblacklist=nvidia 3
Build Date: 03 August 2010  05:10:46AM
Build ID: xorg-x11-server 1.8.2-3.fc13 
Current version of pixman: 0.18.0
	Before reporting problems, check http://bodhi.fedoraproject.org/
	to make sure that you have the latest version.

[root@tomcat ~]# lspci -nnv
01:00.0 VGA compatible controller [0300]: nVidia Corporation G86 [GeForce 8400 GS] [10de:0422] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Jaton Corp Device [1b13:0422]
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at fc000000 (32-bit, non-prefetchable) [size=16M]
	Memory at b0000000 (64-bit, prefetchable) [size=256M]
	Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
	I/O ports at dc80 [size=128]
	Expansion ROM at fde00000 [disabled] [size=128K]
	Capabilities: [60] Power Management version 2
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau
	Kernel modules: nvidia, nouveau, nvidiafb
05:00.0 VGA compatible controller [0300]: nVidia Corporation G98 [GeForce 8400 GS] [10de:06e4] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Device [196e:05cf]
	Flags: fast devsel, IRQ 18
	Memory at f8000000 (32-bit, non-prefetchable) [disabled] [size=16M]
	Memory at d0000000 (64-bit, prefetchable) [disabled] [size=256M]
	Memory at f6000000 (64-bit, non-prefetchable) [disabled] [size=32M]
	I/O ports at cc80 [disabled] [size=128]
	Expansion ROM at f9d00000 [disabled] [size=128K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Kernel modules: nvidia, nouveau, nvidiafb

I've tried both the nvidia 256.44 driver and the nouveau driver to no avail. Sometimes I get hard lockups with the nvidia driver, but with nouveau, it just never initializes the secondary card. I did see something at one point in the Xorg.0.log about /dev/dri/card1 not being a valid file (it should be), so this may be a MESA bug.

I have 1 PCI-E GeForce 8600GT and 1 PCI GeForce 8600GT, different manufacturers though.

Interestingly, both cards are listed as being on the PCI bus in /sys, rather than one being on PCI-E and one being on PCI. The device on ID 01:00.0 in the output above is the PCI-E one, and ID 05:00.0 is the PCI one.

I plan to try a second (hopefully) identical PCI card in the other PCI slot of this machine a bit later today, without the PCI-E card. The motherboard does have a VGA chip on it, however I am unsure of any of it's info as it is not listed in lspci, and the machine is a work machine. It is a Dell Optiplex 780.
Comment 28 Thomas Spear 2010-08-25 07:53:11 UTC
I got the second PCI card today. It is 100% identical to the first. I've removed the PCI-E card for now and installed the PCI. I then booted the machine to runlevel 3, with the nouveau driver blacklisted and the nvidia driver loaded. No joy. So I blacklisted nvidia, unblacklisted nouveau, removed the Xorg.0.log and xorg.conf, and rebooted again to runlevel 3.

This time I got some errors during init from nouveau's drm, which I am going to report to them, but still could not get X to start even on just one screen. I hope to come back to this once I have a fix for the problem with nouveau.

As a side note, I tried using the onboard intel chip and nouveau together... No joy there either. :-(
Comment 29 Thomas Spear 2010-08-25 07:55:12 UTC
Created attachment 38141 [details]
Xorg.0.log with 2 PCI cards using the nvidia driver

As you can see here when using 2 PCI cards, the NVIDIA driver for the first GPU goes into a wait state, which hard locks the system.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.