Bug 18160 - X server lockup in int10 when booting a secondary card
Summary: X server lockup in int10 when booting a secondary card
Status: RESOLVED INVALID
Alias: None
Product: xorg
Classification: Unclassified
Component: Lib/pciaccess (show other bugs)
Version: unspecified
Hardware: Other All
: high normal
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL:
Whiteboard: 2011BRB_Reviewed
Keywords:
Depends on:
Blocks: 20816
  Show dependency treegraph
 
Reported: 2008-10-21 14:50 UTC by Jan "Yenya" Kasprzak
Modified: 2018-06-12 19:07 UTC (History)
16 users (show)

See Also:
i915 platform:
i915 features:


Attachments
xorg.conf from the x86_64 + 2x Radeon HD 3450 box (1010 bytes, text/plain)
2008-10-21 14:50 UTC, Jan "Yenya" Kasprzak
no flags Details
X.org log with s3 and nouveau (33.55 KB, text/plain)
2008-12-05 08:04 UTC, Andrew Randrianasulu
no flags Details
xorg conf: two screens with s3 and nouveau (10.21 KB, text/plain)
2008-12-05 08:06 UTC, Andrew Randrianasulu
no flags Details
Patch to add error checking and debugging to file on libpciaccess (3.68 KB, patch)
2008-12-05 08:21 UTC, Alex Villacís Lasso
no flags Details | Splinter Review
Log file created with debug patch. (387 bytes, text/plain)
2008-12-05 08:21 UTC, Alex Villacís Lasso
no flags Details
Patch to add error checking and attempted fallback on libpciaccess, cleaned up. (1.02 KB, patch)
2008-12-05 08:22 UTC, Alex Villacís Lasso
no flags Details | Splinter Review
strace ouput with debug patch (280.76 KB, text/plain)
2008-12-05 08:24 UTC, Alex Villacís Lasso
no flags Details
Xorg.0.log of xserver with recent git (319.72 KB, text/plain)
2008-12-17 10:17 UTC, Alex Villacís Lasso
no flags Details
Xorg.0.log from failed attempt to use secondary card (13.57 KB, text/x-log)
2008-12-19 06:19 UTC, Bill Crawford
no flags Details
Config file related to previous log (5.94 KB, text/plain)
2008-12-19 06:20 UTC, Bill Crawford
no flags Details
Hardware concerned with previous log and conf file. (12.56 KB, text/plain)
2008-12-19 06:21 UTC, Bill Crawford
no flags Details
Log from attempt to use primary and a secondary card together (62.07 KB, application/x-trash)
2008-12-19 06:31 UTC, Bill Crawford
no flags Details
Add enable/disable through sysfs around actual reading of ROM. (2.20 KB, patch)
2009-01-13 08:21 UTC, Alex Villacís Lasso
no flags Details | Splinter Review
Add enable/disable through sysfs around actual reading of ROM (try 2). (2.31 KB, patch)
2009-01-16 08:21 UTC, Alex Villacís Lasso
no flags Details | Splinter Review
Xorg log attempt with both screens enabled (60.36 KB, text/plain)
2009-03-07 00:44 UTC, Tim Nelson
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan "Yenya" Kasprzak 2008-10-21 14:50:09 UTC
Created attachment 19793 [details]
xorg.conf from the x86_64 + 2x Radeon HD 3450 box

I have a dual-head/dual-seat setup, and I have problems starting up the secondary head on Fedora 9 (xorg-x11-server-Xorg-1.5.0-2.fc9 package). When starting the secondary X server, it locks up in the "int10" module. Downgrading the X server and all its drivers to the Fedora 8 version (xorg-x11-server-Xorg-1.3.0.0-46.fc8) fixes the problem.

I have tested it on two boxes different enough that I am pretty sure it is an int10 module problem, not the driver problem or whatever:

Box 1:
32-bit x86 (Athlon XP), ATI Radeon 9200SE AGP as a primary card, NVidia Riva128 PCI as a secondary card - reported in lspci as follows:
00:13.0 VGA compatible controller: NVidia / SGS Thomson (Joint Venture) Riva128 (rev 10)
01:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200 SE] (rev 01)

Box 2:
64-bit x86 (quad-core Phenom), two identical Radeon HD 3450 PCIe cards.

The last few lines from Xorg.1.log are the following:

[...]
(II) Loading sub module "int10"
(II) LoadModule: "int10"

(II) Loading /usr/lib64/xorg/modules//libint10.so
(II) Module int10: vendor="X.Org Foundation"
        compiled for 1.5.0, module version = 1.0.0
        ABI class: X.Org Video Driver, version 4.1
(II) RADEON(0): initializing int10

and then the X server locks up and eats 100 % of CPU time.

After installing a debuginfo package, running the X server under gdb, and interrupting it once it starts looping, I have got the following backtrace:

Program received signal SIGTERM, Terminated.
fetch_decode_modrm (mod=0x7fff913b0604, regh=0x7fff913b05fc, 
    regl=0x7fff913b0600) at ../x86emu/decode.c:163
163		*regh = (fetched >> 3) & 0x07;
(gdb) where
#0  fetch_decode_modrm (mod=0x7fff913b0604, regh=0x7fff913b05fc, 
    regl=0x7fff913b0600) at ../x86emu/decode.c:163
#1  0x00007f7b87e3dc1b in x86emuOp_add_byte_RM_R (op1=<value optimized out>)
    at ../x86emu/ops.c:120
#2  0x00007f7b87e47823 in X86EMU_exec () at ../x86emu/decode.c:122
#3  0x00007f7b87e35695 in xf86ExecX86int10 (pInt=0xc7e8d0) at xf86x86emu.c:40
#4  0x00007f7b87e36357 in xf86ExtendedInitInt10 (entityIndex=0, Flags=0)
    at generic.c:242
#5  0x00007f7b888a3a7b in RADEONPreInit ()
   from /usr/lib64/xorg/modules/drivers//radeon_drv.so
#6  0x0000000000463329 in InitOutput (pScreenInfo=0x7d7440, argc=4, 
    argv=0x7fff913b0978) at xf86Init.c:749
#7  0x000000000042ca16 in main (argc=4, argv=0x7fff913b0978, 
    envp=0x7fff913b09a0) at main.c:358
(gdb) 

(this one is from Box 2).

Some time ago I have also reported it to Fedora bugzilla as https://bugzilla.redhat.com/show_bug.cgi?id=454864, but probably the x.org bz will be a more appropriate place. In that bug entry, more logs, configs, etc. from the Box 1 can be found.

I will attach my Xorg.conf from Box 2 here (I am starting the secondary X server with the "X :1 -layout Secondary" command line).
Comment 1 Bill Crawford 2008-10-23 08:35:01 UTC
I see similar problems with a bunch of radeon cards here, but was told this was due to current drivers not supporting soft-booting secondary cards at all since the libpciaccess changes. Have I misunderstood something here?
Comment 2 Tim Nelson 2008-12-02 02:36:55 UTC
Similar problem for me, except that I was going FC6 -> FC10.  

I was asked to add details of my hardware here:

VideoCard0: NV: Found NVIDIA GeForce4 MX 440
VideoCard1: Chipset is SiS6326 AGP (H0) (revision 0x0b)

Relevant thread at: http://lists.freedesktop.org/archives/xorg/2008-December/040961.html
Comment 3 Tim Nelson 2008-12-02 15:35:30 UTC
Here's some more interesting info from my X log.  


(--) PCI: (0@1:0:0) nVidia Corporation NV17 [GeForce4 MX 440] rev 163, Mem @ 0xf0000000/0, 0xe0000000/0, 0xe8000000/0, BIOS @ 0x??
??????/131072
(--) PCI:*(0@2:2:0) Silicon Integrated Systems [SiS] 86C326 5598/6326 rev 11, Mem @ 0xf4000000/0, 0xf3000000/0, I/O @ 0x00009800/0
, BIOS @ 0x????????/65536

...

(II) Primary Device is: PCI 02@00:02:0
(--) NV: Found NVIDIA GeForce4 MX 440 at 01@00:00:0

I have no idea where that "Primary Device is" part comes from, but I note that it's neither if the previously mentioned PCI IDs.  
Comment 4 Tim Nelson 2008-12-02 15:40:26 UTC
...ok, maybe it is, just formatted differently.  Confusing :).  
Comment 5 Andrew Randrianasulu 2008-12-05 08:03:39 UTC
I think same bug hit me too:

i have two cards plugged in:
---
00:05.0 VGA compatible controller: S3 Inc. 86c764/765 [Trio32/64/64V+] (rev 54)
[old PCI s3]
01:00.0 VGA compatible controller: nVidia Corporation NV44A [GeForce 6200] (rev a1)
[newer agp card]
---

Until i add  Option "NoINT10" "1" in s3's "Device" section i has just two black screens and one working button - poweroff.

X log  and config will follow .... 
Comment 6 Andrew Randrianasulu 2008-12-05 08:04:44 UTC
Created attachment 20835 [details]
X.org log with s3 and nouveau

At least nouveau screen is working ...
Comment 7 Andrew Randrianasulu 2008-12-05 08:06:33 UTC
Created attachment 20836 [details]
xorg conf: two screens with s3 and nouveau

My s3 only has 1Mb (and works fine as primary/only one device) - so i must use two separate screens, with different modes and bit depth, correct?
Comment 8 Alex Villacís Lasso 2008-12-05 08:19:48 UTC
All of the following applies to the stock linux 2.6 kernel from a fresh installation of Fedora 10.

I have been looking into the int10 hang when initializing the BIOS of a secondary card. Since the thread on xorg@lists.freedesktop.org suggested libpciaccess as the faulty component, I checked the code. This is what I found:
- The function responsible for reading the ROM of the PCI video card is pci_device_linux_sysfs_read_rom() for the Fedora 10 case. 
- This function pci_device_linux_sysfs_read_rom() is *not* exercised at all when using (only) the primary display, even when an option such as UseBIOS is in effect. So this function might as well be broken and nobody with a single display would notice.
- pci_device_linux_sysfs_read_rom() is exercised when initializing a secondary display (using "vesa" in my case) and its ROM needs to boot up.

I introduced a bit of logging in the patch libpciaccess-partial-fix-with-debug.patch that outputs messages to a file in /tmp . The basic problem is that, despite all the sysfs dance to enable the ROM, the kernel terminates the read with 0 bytes when trying to read the ROM:

Reading ROM from /sys/bus/pci/devices/0000:00:09.0/rom into address 0xb7f4a008
ROM size for /sys/bus/pci/devices/0000:00:09.0/rom is 32768 using 32768
Reading ROM from /sys/bus/pci/devices/0000:00:09.0/rom reached 0-sized read (EOF?) at offset 0
Dump of ROM from /sys/bus/pci/devices/0000:00:09.0/rom (0 bytes):
Reading ROM failed with short read, using /dev/mem to read from 0xdffe0000

I introduced an attempt at a fallback that calls pci_device_linux_devmem_read_rom() when the total amount read is less than the expected ROM size. In current git for libpciaccess, the buffer remains uninitialized and hangs the machine. I hoped that the fallback would be enough to read the ROM and fix this problem. However, I ran into another problem. The attempted fallback ends up using pread() on /dev/mem at the offset matching the one reported for the ROM. However, this failed with EINVAL (Invalid argument). By using strace on the stock X server and the modified libpciaccess library, I saw that the pread implementation calls into pread64() with an very big offset of 18446744073172549632 (0xffffffffdffe0000), which is the required offset, sign-extended into 64 bits instead of zero-extended as required. This might point to a bug in glibc headers or code, but I worked around this by replacing the call with a pread64() call, as seen in libpciaccess-partial-fix-without-debug.patch

Now, here comes the third problem: the passed address makes pread64() return EFAULT (Invalid address). I did not have time to find out whether this address is intended or not. However libpciaccess-partial-fix-without-debug.patch is enough to replace the hang with a graceful exit that allows the user to sort-of regain control of the machine. Final strace is attached, search for EFAULT in the text.

Please comment on this.
Comment 9 Alex Villacís Lasso 2008-12-05 08:21:23 UTC
Created attachment 20837 [details] [review]
Patch to add error checking and debugging to file on libpciaccess

This is the patch that I used to create the debug log. Notice that the kernel terminates the read from ROM at offset 0.
Comment 10 Alex Villacís Lasso 2008-12-05 08:21:53 UTC
Created attachment 20838 [details]
Log file created with debug patch.
Comment 11 Alex Villacís Lasso 2008-12-05 08:22:48 UTC
Created attachment 20839 [details] [review]
Patch to add error checking and attempted fallback on libpciaccess, cleaned up.

This patch is enough to turn the hang into the error it should have been.
Comment 12 Alex Villacís Lasso 2008-12-05 08:24:08 UTC
Created attachment 20840 [details]
strace ouput with debug patch

This is the strace output. You can search for EFAULT when trying to read using pread64() on /dev/mem .
Comment 13 Tim Nelson 2008-12-05 12:53:57 UTC
Wonderful!  I was hoping to get onto this sometime, but you got further than I would've been able to.  Am I correct in understanding that the real problem is probably that /sys/bus/pci/devices/0000:00:09.0/rom (or whatever) actually returns a 0 size, so this is really a kernel problem, rather than an Xorg problem?  
Comment 14 Alex Villacís Lasso 2008-12-05 13:40:37 UTC
(In reply to comment #13)
> Wonderful!  I was hoping to get onto this sometime, but you got further than I
> would've been able to.  Am I correct in understanding that the real problem is
> probably that /sys/bus/pci/devices/0000:00:09.0/rom (or whatever) actually
> returns a 0 size, so this is really a kernel problem, rather than an Xorg
> problem?  
> 

Apparently it is. Assuming that the sysfs interface us supposed to give access to any PCI ROM, not just the ones from VGA chipsets, then the interface is not (always) working as documented. My work machine has three chipsets with ROMs, as declared by sysfs:

[root@srv64 ~]# cd /sys/devices/
[root@srv64 devices]# find . -name rom
./pci0000:00/0000:00:01.0/0000:01:05.0/rom
./pci0000:00/0000:00:11.0/rom
./pci0000:00/0000:00:12.0/rom

These devices match the following declarations in the output of lspci -v:

00:01.0 PCI bridge: ATI Technologies Inc RS480 PCI Bridge (prog-if 00 [Normal decode])
        Flags: bus master, 66MHz, medium devsel, latency 99
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=68
        I/O behind bridge: 0000e000-0000efff
        Memory behind bridge: fde00000-fdefffff
        Prefetchable memory behind bridge: d8000000-dfffffff
        Capabilities: [b0] Subsystem: Intel Corporation Unknown device d600

01:05.0 VGA compatible controller: ATI Technologies Inc RC410 [Radeon Xpress 200] (prog-if 00 [VGA controller])
        Subsystem: Intel Corporation Unknown device d600
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 17
        Memory at d8000000 (32-bit, prefetchable) [size=128M]
        I/O ports at ee00 [size=256]
        Memory at fdef0000 (32-bit, non-prefetchable) [size=64K]
        [virtual] Expansion ROM at fde00000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 2
        Capabilities: [80] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
        Kernel driver in use: radeon


00:11.0 IDE interface: ATI Technologies Inc 437A Serial ATA Controller (rev 80) (prog-if 8f [Master SecP SecO PriP PriO])
        Subsystem: Intel Corporation Unknown device d600
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 23
        I/O ports at ff00 [size=8]
        I/O ports at fe00 [size=4]
        I/O ports at fd00 [size=8]
        I/O ports at fc00 [size=4]
        I/O ports at fb00 [size=16]
        Memory at fe02f000 (32-bit, non-prefetchable) [size=512]
        Expansion ROM at 40000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 2
        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
        Kernel driver in use: sata_sil
        Kernel modules: sata_sil

00:12.0 IDE interface: ATI Technologies Inc 4379 Serial ATA Controller (rev 80) (prog-if 8f [Master SecP SecO PriP PriO])
        Subsystem: Intel Corporation Unknown device d600
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 22
        I/O ports at fa00 [size=8]
        I/O ports at f900 [size=4]
        I/O ports at f800 [size=8]
        I/O ports at f700 [size=4]
        I/O ports at f600 [size=16]
        Memory at fe02e000 (32-bit, non-prefetchable) [size=512]
        Expansion ROM at 40080000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 2
        Capabilities: [50] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable-
        Kernel driver in use: sata_sil
        Kernel modules: sata_sil

So I have a radeon chipset with ROM, and two SATA controllers. Now, consider the following Perl snippet:

[root@srv64 0000:01:05.0]# perl -w -e '
use IO::Handle;
sysopen(ROM, "rom", 2) or die($!); 
binmode(ROM); 
syswrite(ROM, "1", 1);
sysseek(ROM, 0, 0); 
$buffer = ""; 
while (sysread(ROM, $buffer, 1024) > 0) {
    print $buffer;
};
sysseek(ROM, 0, 0);
syswrite(ROM, "0", 1);
close(ROM);' > /tmp/radeon_rom.bin 

On my work machine, if I change directory to /sys/devices/pci0000:00/0000:00:01.0/0000:01:05.0/ (the radeon chipset) and paste the script, I get:

[root@srv64 0000:01:05.0]# ls -l /tmp/radeon_rom.bin 
-rw-r--r-- 1 root root 49152 dic  5 16:33 /tmp/radeon_rom.bin

...proving that the Radeon ROM is readable. However, if I try the same with either SATA ROM, I get a 0-sized file. So the kernel behavior (2.6.26.6-49.fc8) is at least inconsistent.
Comment 15 Alex Villacís Lasso 2008-12-05 14:07:01 UTC
I have opened kernel bug at http://bugzilla.kernel.org/show_bug.cgi?id=12168 for this issue.
Comment 16 Keith Packard 2008-12-09 22:04:07 UTC
so, am I right in thinking that this is not a bug in any X.org code? It's a blocker for the 1.6 release currently, and so it needs to be dealt with.
Comment 17 Tim Nelson 2008-12-09 22:45:19 UTC
The short version: apply patch from comment #11.  

The long version:

No.  It's a combination of bugs.  So far, we've identified bugs in:
- Xorg
- libc possibly
- kernel PCI

Comment #11 has a patch attached which, if I understand correctly, solves at least one Xorg problem, and works around the libc problem.  That leaves only the kernel PCI problem.  We may identify more bugs when the kernel PCI problem is solved, but that's the only known problem.  

Comment 18 Alex Villacís Lasso 2008-12-10 08:26:43 UTC
(In reply to comment #17)
> The short version: apply patch from comment #11.  
> 
> The long version:
> 
> No.  It's a combination of bugs.  So far, we've identified bugs in:
> - Xorg
> - libc possibly
> - kernel PCI
> 
> Comment #11 has a patch attached which, if I understand correctly, solves at
> least one Xorg problem, and works around the libc problem.  That leaves only
> the kernel PCI problem.  We may identify more bugs when the kernel PCI problem
> is solved, but that's the only known problem.  
> 

The patch actually solves the libpciaccess problems, but there is still the issue that (apparently) xorg provides an invalid address for the ROM copy buffer, which botched the attempt at a fallback. I was too tired to check the provenance of the buffer address (the one that reports EFAULT in the strace). I will try to check that in the next few days, unless somebody beats me at it. It is either an invalid address, or my poor understanding of the arguments to the pread64 function as provided by libc from Fedora 10.
Comment 19 Alex Villacís Lasso 2008-12-17 10:17:24 UTC
Created attachment 21246 [details]
Xorg.0.log of xserver with recent git

News for this bug: at least on my home machine (ProSavageDDR + OAK Spitfire), a recent git tree for the xserver appears to be successful in reading the Oak video BIOS, but still hangs at startup. My scripts still fail at reading the same ROM. 

I notice that the strace output ends up with a couple of vm86old() calls failing with ENOSYS. My kernel is 2.6.28-rc7 on a Pentium-4 machine running 32-bit code only. No 64-bit support. So vm86 mode should be usable, right?
Comment 20 Jan "Yenya" Kasprzak 2008-12-19 03:34:39 UTC
News for my setup (Box 2 from the original report: x86_64 with 2x Radeon HD 3450 PCIe cards):

xorg-x11-server-Xorg-1.5.3-5.fc10.x86_64
xorg-x11-drv-ati-6.9.0-61.fc10.x86_64
kernel 2.6.28-rc8 (from kernel.org, not from Fedora)

I am now able to start the X server on both cards and use the dual-seat
setup, with the following problems:

1. when the secondary X server is started, the _primary_ card gets also rebooted
   (it displays blank screen, then switches to text mode, and in the first
   row there is written something that looks like a VGA BIOS version).
   The workaround is to start the primary X server _after_ the secondary card
   gets booted and initialized. Fortunately xdm can do this.

2. when I kill the secondary X server, the computer locks up (no response
   to ping, no reaction to pressing NumLock, etc.). It happens no matter
   how the X server is killed - I tried Ctrl-Alt-Backspace and
   sending SIGTERM - the computer stop responding to ping about two seconds
   after that.

Other than that, it seems that my configuration is mostly usable as dual-seat now.

Comment 21 Bill Crawford 2008-12-19 06:16:58 UTC
With a fairly recent kernel ("kernel-2.6.27.4-47.rc3.fc10.x86_64") and Xorg server, I'm still unable to use more than one screen (although it appears I can now use one of the secondary cards on its own). Will attach a log in a moment or two ...

[root@bill ~]# Xorg -version

X.Org X Server 1.5.3
Release Date: 5 November 2008
X Protocol Version 11, Revision 0
Build Operating System: Linux 2.6.18-92.1.18.el5 x86_64 
Current Operating System: Linux bill.wcn.co.uk 2.6.24.7-92.fc8 #1 SMP Wed May 7 16:26:02 EDT 2008 x86_64
Build Date: 11 December 2008  05:27:30PM
Build ID: xorg-x11-server 1.5.3-6.fc10 
        Before reporting problems, check http://wiki.x.org
        to make sure that you have the latest version.

Comment 22 Bill Crawford 2008-12-19 06:19:56 UTC
Created attachment 21318 [details]
Xorg.0.log from failed attempt to use secondary card

Oops. No, what actually happened was, attempting to use the primary plus one secondary card led to a working display, but only the primary card working. The secondary card still fails (in fact it appears from the log to misidentify the secondary cards, too).
Comment 23 Bill Crawford 2008-12-19 06:20:37 UTC
Created attachment 21319 [details]
Config file related to previous log
Comment 24 Bill Crawford 2008-12-19 06:21:39 UTC
Created attachment 21321 [details]
Hardware concerned with previous log and conf file.
Comment 25 Bill Crawford 2008-12-19 06:31:27 UTC
Created attachment 21323 [details]
Log from attempt to use primary and a secondary card together

This led to a display working, but only the primary card displaying anything (and mouse was bound to the one screen, so I didn't get the impression the second was working at all, as opposed to just failure to display anything on the screen).
Comment 26 Tim Nelson 2009-01-11 01:43:20 UTC
To transfer the knowledge back from the kernel bug, the file /sys/bus/pci/devices/<deviceID>/enable needs to have a 1 echoed into it before reading the ROM.  The libpciaccess code needs to do this.  In combination with the patch from comment #11, this should get things working either the same or better (depending on the setup).  I'm not likely to get around to that for 2-3 weeks, unfortunately; if someone else wants to supply such a patch, that would be wonderful.  HTH,
Comment 27 Keith Packard 2009-01-12 11:01:34 UTC
This won't get fixed in server-1.6 unless someone attaches a patch in the next week.
Comment 28 Alex Villacís Lasso 2009-01-13 08:21:48 UTC
Created attachment 21937 [details] [review]
Add enable/disable through sysfs around actual reading of ROM.

Could you please check this patch for correctness? It includes the previous fixes plus an attempt to enable and disable the card around the reading of the ROM.
Comment 29 Tim Nelson 2009-01-13 18:25:03 UTC
Looks fine on a quick skim-over; I'll unfortunately have no chance to review it properly until at least Tuesday.  Hopefully someone else can.  
Comment 30 Alex Villacís Lasso 2009-01-16 08:21:29 UTC
Created attachment 22033 [details] [review]
Add enable/disable through sysfs around actual reading of ROM (try 2).

Disregard the previous patch. The previous patch assumed that the rom failed to be read if the total amount read is less than the reported size via fstat(). However, it seems the kernel makes no attempt to calculate the actual size of the ROM on a simple listing, but only on an actual read. So the condition always fails because the actual size is always less than the reported size (the reported size seems to be a multiple of 32 Kb).

This one considers a failed read only in the case in which no data was read at all (total size == 0).
Comment 31 Eric Anholt 2009-01-30 18:48:01 UTC
removing from 1.6 blocker list now that it's been identified to be libpciaccess. hopefully jbarnes or idr can take a look.
Comment 32 Jesse Barnes 2009-01-30 21:19:21 UTC
Yeah, enabling the device is necessary otherwise the ROM read won't work (there's actually a kernel patch for this queued too; we don't make the "enable required" obvious enough).

Aside from whitespace issues the proposed patch seems ok to me.
Comment 33 Tim Nelson 2009-01-30 22:20:27 UTC
Just so it's all clear; the kernel patch is not required for this to work properly, but it does fix some problems we ran into while debugging the problem.  I'm still hoping to get around to testing the patch sometime; if someone decides we don't need more confirmation though, I'd be glad not to need to do that.  :)
Comment 34 Tim Nelson 2009-02-03 20:04:13 UTC
In order to test Alex's patch, I applied it to the sources contained in Fedora 10's libpciaccess.  It conflicted with a patch called libpciaccess-fd-cache.patch, so I got rid of that.  I also had to modify the file paths to get it to apply cleanly.  

After application of the patch, I tried starting Xorg in 3 different configurations.  None of the three resulted in a server lockup during the loading of the int10 module, so we appear to have overcome that particular problem.  

Primary video card: sis driver
Secondary video card: nv driver

When I started Xorg with just the primary configured, it worked fine just like it always has.  

When I started Xorg with just the secondary configured, the primary screen went black, and I couldn't get any output.  However, looking at the Xorg.0.log (yes, this was the correct logfile), it appeared to have done everything appropriate except actually display something.  Logging in remotely also showed all the appropriate processes running.  However, I was unable to switch to a text-based virtual terminal.  

When I started Xorg with both cards configured, the primary screen displayed that startup thing with the alternating black & white pixels, and then it appeared to get stuck there.  

It seems to me that the appropriate action is to apply the patch, mark the bug fixed, and then open a new bug for this new problem.  

Thoughts anyone?
Comment 35 Tim Nelson 2009-02-22 22:34:00 UTC
Incidentally, a message from Steven J. Newbury may be relevant here: http://lists.freedesktop.org/archives/xorg/2009-February/043918.html
Comment 36 Tim Nelson 2009-03-02 21:18:34 UTC
Steven J. Newbury (see comment #35) posted a follow-up basically stating that he agrees that his problems are like those I continue to get.  It seems that the wrong ROM is being invoked for his video card; both are read, but the wrong one is run.  I can't confirm this myself, because the error that shows this appears to come from the Radeon driver, which I'm not using.  

Anyway, if we get that, we'll be one step closer to having multi-video-card Xorg working again.  
Comment 37 Steven Newbury 2009-03-04 16:41:58 UTC
(In reply to comment #36)
> Steven J. Newbury (see comment #35) posted a follow-up basically stating that
> he agrees that his problems are like those I continue to get.  It seems that
> the wrong ROM is being invoked for his video card; both are read, but the wrong
> one is run.  I can't confirm this myself, because the error that shows this
> appears to come from the Radeon driver, which I'm not using.  
> 
> Anyway, if we get that, we'll be one step closer to having multi-video-card
> Xorg working again.  
> 

It does kind of work for me despite the log message regarding the wrong card, it may be that the BIOS in the X800 is capable of bringing up the 9250.  Interestingly they both report the same timings despite being rather different hardware, although that would be expected if it's using the timings from the same BIOS to program both cards!
Comment 38 Bill Crawford 2009-03-06 04:28:09 UTC
I'm still seeing a complete lockup (nothing further written to log, even with disk mounted using sync option; power button has to be held down to switch off) on my system. Three identical radeon PCI cards (same part numbers, etc) so it's not down to "the wrong BIOS" as far as I can tell. Primary card works fine as a single head, and I've swapped cards around to verify it's not that the other cards are "broken".

What can I do to help debug this?
Comment 39 Steven Newbury 2009-03-06 09:29:47 UTC
Given the move towards KMS, perhaps we should concentrate on getting cards initialised outside of the X server using a standalone int10 utility?  Has anybody got something like that working yet?
Comment 40 Tim Nelson 2009-03-06 16:11:20 UTC
Bill: The only thing this patch fixes is the lockup at the int10 point.  While it's now getting further than it was, and we possibly need a new bug report, because the new bug may not be in libpciaccess, as far as real results go, there's no difference, at least from my point of view.  

Stephen: No, it hadn't occurred to me.  I didn't even know that KMS was kernel modesetting until I Googled it.  
Comment 41 Tim Nelson 2009-03-07 00:44:55 UTC
Created attachment 23616 [details]
Xorg log attempt with both screens enabled

The attached log illustrates what happens on my computer when I try to start Xorg.  One interesting thing to note is that both screen cards seem to be attempting to use int10.  Is that a problem?  I was under the impression that int10 was by default used only for the secondary card.
Comment 42 Bill Crawford 2009-03-09 02:27:13 UTC
Tim: my problem was that I was still seeing the lockup as of Friday evening, and it looked from the changelogs in Fedora's rawhide packages as though this patch was included ... and it was still locking up for me. I'll have another try tonight ...
Comment 43 Tim Nelson 2009-03-20 19:55:47 UTC
Some additional work has been done on this by some others.  If I understand correctly, what's needed at the moment is something called a "VGA arbiter".  Presumably equivalent functionality was removed along with the libpciaccess update.  There has been some work at replacing it.  The work is documented here:

http://www.x.org/wiki/VgaArbiter

To summarise, three things are needed:
- kernel VGA arbiter
- userspace library that uses the arbiter
- xserver patch that patches xserver to use the library and new kernel interface

Unfortunately, while that summary is still accurate, the page above is a) a little out of date, and b) the code was never included in any of the appropriate projects.  The most up-to-date version (ie. the version that works with the current code) is elsewhere (see below), but has not been tested, hence this message.  

The following patch first needs to be applied to your Linux kernel (with apologies to non-Linux types):

http://people.freedesktop.org/~airlied/kernel-vga-arbiter.patch

Then this patch needs to be applied to your xserver:

http://cgit.freedesktop.org/~airlied/xserver/log/?h=vga-arbiter

I'm unaware of any copy of the userspace library more up-to-date than the one on this page:

http://git.c3sl.ufpr.br/pub/scm/multiseat/libvgaaccess.git/

Instructions for using that last link are on the wiki page above.  

If anyone has a chance to test this, feedback could be useful.  

Yes, I know this isn't directly related to the original problem, but most of the people really interested in multi-card xorg are already watching this bug.  
Comment 44 Tiago Vignatti 2009-03-23 08:46:57 UTC
(In reply to comment #43)
> Some additional work has been done on this by some others.  If I understand
> correctly, what's needed at the moment is something called a "VGA arbiter". 
> Presumably equivalent functionality was removed along with the libpciaccess
> update.  There has been some work at replacing it.  The work is documented
> here:

The VGA arbiter is a second step to get the multiple card working correctly.

Not all video cards need to be programmed relying on the crappy VGA legacy registers. It's worst: seems that there's some drivers that can entirely scape from VGA interface but still using it. So a good plan is to first solve the problem of secondary cards initialization and then tackle the VGA arbitration.
Comment 45 Bill Crawford 2009-03-23 08:53:46 UTC
Based on the visible garbage occasionally appearing on the primary screen whilst the BIOS is trying to POST the secondary card(s), I'd guess that the arbiter, or simply disabling the other card(s) while doing int10, would probably help a lot.

I'll try the patches, but I don't have a huge amount of time. If you have a .src.rpm available for libvgawhatsit, patched kernel etc. that would be marvellous :o)
Comment 46 Tim Nelson 2009-03-23 20:08:58 UTC
Ok, since the problem described in this bug (ie. the int10 one) is now, if I understand recent mailing lists posts correctly, fixed, I've created a few more bugs:

#20816 is a master bug for getting multi-card xorg working
#20817 is a bug specifically for the VGA arbiter

Anyone who wants to discuss what needs to be done to help with/etc the VGA arbiter should from now on discuss that at bug #20817.  Discussing whether the VGA arbiter is the best thing to do next is should be done at #20816.  

Thanks,
Comment 47 Pedro Eugênio Rocha 2009-03-25 09:31:18 UTC
I'm using the patchs for libpciaccess and my system still hanging (sis and ati cards). My Xorg locks when it tries to initialize int10 module, specifically when it starts to emulate the operations, running into a problem that I described in 'http://bugs.freedesktop.org/show_bug.cgi?id=20816'.
It seems not fixed for me.
Comment 48 Anibal Avelar 2009-05-13 20:58:58 UTC

I have the same problem with two PCI nVidia cards using Xinerama.


I described my problems on the bug 20816[1]. Also I added my files (Xorg.0.log, xorg.conf, and lspci output) there. 

My problem is simple, the system hang when tries to load the int10 module (the original), now with the fix to the pciaccess library, the system doesn't hang but I got this error.

(II) LoadModule: "int10"
(II) Reloading /usr/lib/xorg/modules//libint10.so
(II) NV(1): Initializing int10
(EE) NV(1): Cannot read V_BIOS (3) Input/output error

and the system doesn't up.

I found this:

+ If I use two nVidia cards PCI, the system hang because I have a Intel Card too (although I don't use it, the Xorg can see it).

+ If I use a Nvidia PCI and  one Matrox ATI AGP , this card (the Matrox) inhibits to the Intel, but if I choose it how primary the result is the same (Cannot read V_BIOS (3) Input/output error). But If I choose the Intel how primary (in the BIOS), I can load both, but only how two monitors individual, I can't load the xinerama support. 

 

Please how can fix this problem, I can't apply patches on my system.

Because I still use Ubuntu Hardy while this problem remains (two new releases has been freezed).

Regards.


[1] https://bugs.freedesktop.org/show_bug.cgi?id=20816
 

Comment 49 Adam Jackson 2018-06-12 19:07:21 UTC
Mass closure: This bug has been untouched for more than six years, and is not
obviously still valid. Please reopen this bug or file a new report if you continue to experience issues with current releases.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.