Bug 8894 - Xorg PCI scan misses video card - stops scan at sound card
Summary: Xorg PCI scan misses video card - stops scan at sound card
Status: RESOLVED NOTOURBUG
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/General (show other bugs)
Version: 7.1 (2006.05)
Hardware: x86 (IA32) Linux (All)
: high critical
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 9916 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-11-05 07:06 UTC by Charles Butterfield
Modified: 2007-04-15 16:47 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Zip file of key text files: (xorg.conf, Xorg.0.log, scanpci, Xorg-scanpci, strace, etc) (42.69 KB, application/zip)
2006-11-05 07:09 UTC, Charles Butterfield
no flags Details

Description Charles Butterfield 2006-11-05 07:06:24 UTC
Description of problem: Xorg cant find video controller -- stops brute force
scan of /proc/bus/pci on next to last existing entry (my device is the last).

History: After upgrading from Fedora FC5 to FC6, my integrated ATI Rage
XL is no longer detected, with Xorg failing with "no device found".

Some clues:
- scanpci shows my video device
- Xorg -scanpci does NOT show my video device
- strace indicated a brute force scan of all possible pci devices which stops
abruptly at /proc/bus/pci/03/03.0, while my device is the next(and last) at
/proc/bus/pci/03/0e.0.
- The last device examined (PCI:03:03.0) is "Multimedia audio controller:
C-Media Electronics Inc CM8738 (rev 10)"
- My device (PCI:03:0e.0) is "VGA compatible controller: ATI Technologies Inc
Rage XL (rev 27)"
- See attached zip file containg useful output (xorg.conf, Xorg.0.log, scanpci,
Xorg-scanpci, strace, etc)
- My hardware: Dell PowerEdge 700, with integrated ATI Rage XL

Version-Release number of selected component (if applicable):
xorg-x11-server-Xorg-1.1.1-47.fc
xorg-x11-server-utils-7.1-4.fc6
xorg-x11-drv-ati-6.6.2-4.fc
Comment 1 Charles Butterfield 2006-11-05 07:09:01 UTC
Created attachment 7657 [details]
Zip file of key text files: (xorg.conf, Xorg.0.log, scanpci, Xorg-scanpci, strace, etc)

See attached zip file containg useful output (xorg.conf, Xorg.0.log, scanpci,
Xorg-scanpci, strace, etc)
Comment 2 Charles Butterfield 2006-11-05 08:35:46 UTC
Another clue - I removed the audio card (at PCI:03:03.0) and the "brute force"
scan stop somewhere early in bus 01 (PCI:01:??.?).

Presumably the "brute force" search is terminated by some externally obtained
data item (such as the address of the last PCI device), rather than by what is
being read during the scan.

Could this be an "off-by-one" problem with whatever provides the "last valid PCI
address" data?

Is there some way for me to dump out said data with only the end-user distribution?

Is there some way to force the server to actually scan the existing entries in
/proc/bus/pci (in the same way that I might and/or what lspci does by default)?

Is there some option to force lspci to scan the same way the server does to see
if that replicates the problem (currently lspci sees the video card).
Comment 3 Charles Butterfield 2006-11-05 10:56:14 UTC
Some more clues:

1) adding BusID in the Device section still fails to detect device (and has NO
effect on the scanning as seen via strace).  In particular:

Section "Device"
        Identifier  "Videocard0"
        Driver      "ati"
        BusID       "PCI:03:0e:0"
EndSection


2) changing the driver from "ati" to "vesa" allows the PCI scan to find the
device.  This is confusing.  Is the PCI scan logic built into each driver? 
Wouldn't they be calling the same utility libraries?  Or is something else
causing the difference.  Very weird.

3) Per #2 above it would seem that using VESA would be a decent workaround. 
However I can't get the refresh rate (for my FPD2185W 1680x1050 LCD) down to a
rate that the monitor can sync too, even after adding custom modelines that I
think should work (see below):
   Modeline "1680x1050@60" 154.20 1680 1712 2296 2328 1050 1071 1081 1103
Comment 4 Charles Butterfield 2006-11-06 20:01:58 UTC
Hmmm.  Looks like an OS issue.  After stepping through the Xorg scan with gdb I
noticed that we are stopping after scanning 14 PCI devices, although my video
card is the 15th (and last).

It turns out that there is a mismatch between the contents of
/proc/bus/pci/devices (14 devices) and the nodes in /proc/bus/pci/xx/* (which
number 15).

The device missing from /proc/bus/pci/devices is /proc/bus/pci/00/06.0.

Xorg is getting a count of PCI devices by counting the lines in
/proc/bus/pci/devices (function xf86OSLinuxGetPciDevs in lnx_pci.c).  Since this
is missing one PCI device, the subsequent scan stops prematurely which is only a
problem if you video device is the last one.  Mine is.

So there is clearly an OS problem, which I will try to figure out how to submit.
 Can anybody suggest where?  For that matter, is anybody reading this stuff? 
Maybe a triage team?  Some feedback would make me feel less lonely :-)
Comment 5 Simon Dean 2007-02-13 02:05:28 UTC
I have the same issue also on FC6.

lspci:
00:00.0 Host bridge: Intel Corporation 82875P/E7210 Memory Controller Hub (rev 02)
00:03.0 PCI bridge: Intel Corporation 82875P/E7210 Processor to PCI to CSA Bridge (rev 02)
00:1c.0 PCI bridge: Intel Corporation 6300ESB 64-bit PCI-X Bridge (rev 02)
00:1d.0 USB Controller: Intel Corporation 6300ESB USB Universal Host Controller (rev 02)
00:1d.4 System peripheral: Intel Corporation 6300ESB Watchdog Timer (rev 02)
00:1d.5 PIC: Intel Corporation 6300ESB I/O Advanced Programmable Interrupt Controller (rev 02)
00:1d.7 USB Controller: Intel Corporation 6300ESB USB2 Enhanced Host Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 0a)
00:1f.0 ISA bridge: Intel Corporation 6300ESB LPC Interface Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 6300ESB SATA Storage Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 6300ESB SMBus Controller (rev 02)
01:01.0 Ethernet controller: Intel Corporation 82547GI Gigabit Ethernet Controller
03:03.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
03:0e.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)

Xorg -scanpci

(0:0:0) Intel Corporation 82875P/E7210 Memory Controller Hub
(0:3:0) Intel Corporation 82875P/E7210 Processor to PCI to CSA Bridge
(0:6:0) Intel Corporation 82875P/E7210 Processor to I/O Memory Interface
(0:28:0) Intel Corporation 6300ESB 64-bit PCI-X Bridge
(0:29:0) unknown card (0x1028/0x0167) using a Intel Corporation 6300ESB USB Universal Host Controller
(0:29:4) unknown card (0x1028/0x0167) using a Intel Corporation 6300ESB Watchdog Timer
(0:29:5) unknown card (0x1028/0x0167) using a Intel Corporation 6300ESB I/O Advanced Programmable Interrupt Controller
(0:29:7) unknown card (0x1028/0x0167) using a Intel Corporation 6300ESB USB2 Enhanced Host Controller
(0:30:0) Intel Corporation 82801 PCI Bridge
(0:31:0) Intel Corporation 6300ESB LPC Interface Controller
(0:31:2) unknown card (0x1028/0x0167) using a Intel Corporation 6300ESB SATA Storage Controller
(0:31:3) unknown card (0x1028/0x0167) using a Intel Corporation 6300ESB SMBus Controller
(1:1:0) unknown card (0x1028/0x0167) using a Intel Corporation 82547GI Gigabit Ethernet Controller
(3:3:0) unknown card (0x10b7/0x1000) using a 3Com Corporation 3c905C-TX/TX-M [Tornado] 

There's an extra device that comes up on the -scanpci that I/O Memory controller. 

Cheers
Simon
Comment 6 Simon Dean 2007-02-13 02:19:38 UTC
You got yourself a Dell Poweredge 700 by the way?

The device you mention, /proc/bus/pci/00/06.0., is missing in FC4 so Xorg appears to work... though it is an older version.

Looking forward to a fix. I don't have the skills to hack this code and recompile it.

Cheers
Simon
Comment 7 Daniel Stone 2007-02-27 01:34:23 UTC
Sorry about the phenomenal bug spam, guys.  Adding xorg-team@ to the QA contact so bugs don't get lost in future.
Comment 8 Andreas Mohr 2007-02-28 09:04:36 UTC
HERE COMES A SUCCESSFUL WORKAROUND: use ServerFlags PCIProbe1 true (or similar flags as found via Google), that fixed it for my Silicon Motion SM712 LynxEM+ which also failed to get enumerated (on 7.1.1).

The device is the last one in a lspci output, PCI ID is 02:09.0 (i.e. not on primary bus!).
Since it's basic enumeration which is broken, X.org was totally unable to "see" the device, thus specifying BusID manually obviously didn't help either.
VESA driver worked without any PCI hacks for me, too, BTW (most likely since it it's abstract device access via VESA interface calls).

A solution to this annoyance (most John Doe users just give up immediately on such an issue!) is highly desired, thus Priority "high" sounds fine.

OK, Charles Butterfield is entirely correct, I also have an additional /proc/bus/pci/00/06.0 device listed on my 2.6.18-1.2747.el5 kernel (RHEL5 beta 2) which causes the off-by-1 issue. I will investigate whether this issue is fixed in a current 2.6.20 kernel, otherwise Greg KH should immediately be contacted about this.
dmesg investigation shows an
"PCI: unable to reserve mem region #1:1000@fecf0000 for device 0000:00:06.0"
, which makes the mismatch a bit more plausible (failed semi-registration of this device).

hexdump on /proc/bus/pci/00/06.0 shows PCI IDs 0x8086 0x257e, which is:

http://www.pcidatabase.com/vendor_details.php?id=1302

0x257E	
Chip Number: 	82875P/E7210
Chip Description: 	Overflow Configuration
Notes: 	delete

And what the heck would that be??? If this is an official entry by Intel, then "delete" may mean that OSes should completely disregard this device?

Thanks,

Andreas Mohr
Comment 9 Christian Hund 2007-02-28 23:40:55 UTC
*** Bug 9916 has been marked as a duplicate of this bug. ***
Comment 10 Andreas Mohr 2007-03-01 10:39:45 UTC
Just tested on 2.6.20-ck1.
The mem region bootup error does NOT exist there, and the 06.0 device is missing in both /proc places, in other words /proc output is consistent (13 devices listed each).
However the bootup error message being non-existent means that 06.0 device handling WORKED this time yet the fundamental problem (/proc mismatch in case of FAILED device initialization) may still exist, which is why I just contacted Greg KH to query whether there's actually still a /proc mismatch issue in current kernels.
Comment 11 Greg Kroah-Hartman 2007-03-02 20:43:53 UTC
Yes, the kernel is skipping initializing that device, and because of that, you
might want to take it's advice (it should say at boot time what the kernel command line option is to hopefully fix it.)

But, even though that happens, X shouldn't be missing your video card.  That's a bug in X.
Comment 12 Adam Jackson 2007-03-11 18:51:38 UTC
(In reply to comment #11)
> Yes, the kernel is skipping initializing that device, and because of that, you
> might want to take it's advice (it should say at boot time what the kernel
> command line option is to hopefully fix it.)

We hit this in RHEL a while ago. The driver for the 82875 EDAC widget is missing a step, so the device never shows up in /proc/bus/pci/devices.  I would posit that it's absolutely insane that the kernel has two uncorrelated lists of PCI devices.

You want something like this (lifted from the RHEL kernel SRPM):

--- linux-2.6.18.noarch/drivers/edac/i82875p_edac.c.orig
+++ linux-2.6.18.noarch/drivers/edac/i82875p_edac.c
@@ -261,10 +261,6 @@ static void i82875p_check(struct mem_ctl
        i82875p_process_error_info(mci, &info, 1);
 }
 
-#ifdef CONFIG_PROC_FS
-extern int pci_proc_attach_device(struct pci_dev *);
-#endif
-       
 /* Return 0 on success or 1 on failure. */
 static int i82875p_setup_overfl_dev(struct pci_dev *pdev,
                struct pci_dev **ovrfl_pdev, void __iomem **ovrfl_window)
@@ -287,17 +283,12 @@ static int i82875p_setup_overfl_dev(stru

                if (dev == NULL)
                        return 1;
+
+               pci_bus_add_device(dev);
        }

        *ovrfl_pdev = dev;

-#ifdef CONFIG_PROC_FS
-       if ((dev->procent == NULL) && pci_proc_attach_device(dev)) {
-               i82875p_printk(KERN_ERR, "%s(): Failed to attach overflow "
-                              "device\n", __func__);
-               return 1;
-       }
-#endif  /* CONFIG_PROC_FS */
        if (pci_enable_device(dev)) {
                i82875p_printk(KERN_ERR, "%s(): Failed to enable overflow "
                               "device\n", __func__);

> But, even though that happens, X shouldn't be missing your video card.  That's
> a bug in X.

Certainly.  pci-rework will save us all...
Comment 13 Charles Butterfield 2007-04-15 16:47:10 UTC
My problem has been resolved by the latest Fedora Core 6 release, as discussed in the associated Fedora bug report at https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=214050

FYI - my most recent post thereto was as follows:

My problem is resolved by FC6 kernel 2.6.20-1.2944.fc6.  In this release the contents of /proc/bus/pci/devices and the nodes in /proc/bus/pci/xx/* agree in the number of devices (both indicate 15).

The previous release (2.6.20-1.2933.fc6), did NOT resolve the problem, so thank-you to whoever fixed the problem between 2933 and 2944.

I have no idea if ALL of the issues discussed on this list have been fixed.  I suspect not, since there seem to be several different chunks of code that need to arrive at the same conclusion about what PCI devices exist, which is a recipe for future problems.  At present, on my particular hardware configuration, the various code paths seem to be in agreement.

Thanks again to all concerned!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.