Bug 7248 - Multiple domains no longer work _even_ with disjoint bus numbers.
Summary: Multiple domains no longer work _even_ with disjoint bus numbers.
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/General (show other bugs)
Version: git
Hardware: Other Linux (All)
: high normal
Assignee: Xorg Project Team
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: xorg-7.3 xorg-7.4
  Show dependency treegraph
 
Reported: 2006-06-16 02:48 UTC by Marcin Kurek
Modified: 2008-04-08 08:54 UTC (History)
8 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg log file from faulty start. (17.04 KB, text/plain)
2006-06-16 02:49 UTC, Marcin Kurek
no flags Details
Force use /proc (511 bytes, text/plain)
2006-09-22 06:12 UTC, Marcin Kurek
no flags Details
Xorg log from recent xorg-server start (48.81 KB, text/plain)
2008-03-24 12:58 UTC, Marcin Kurek
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcin Kurek 2006-06-16 02:48:52 UTC
As xorg-server moved to git I decide to give a try to recent version too. First
I compiled libX11 && mesa recent snaps and secon xorg-server.

But it doesn't work at all. It seems radeon driver is unable to locate my card
(Radeon 9000) Looking at the xorg log file I can see

...
ATI Radeon X850 XT (R480) (AGP), ATI Radeon X850 XT PE (R480) (AGP)
(EE) No devices detected.

Fatal server error:
no screens found
No devices detected

Previously I used CVS snapshot from 31.05.06 and it worked without any problems.
I give a quick look to radeon driver and it seems to die on
radeon_probe.c/RADEONProbe() I guess the xf86GetPciVideoInfo() fails.

Maybe it's related to my machine because I am using PPC machine (Pegasos II)
Comment 1 Marcin Kurek 2006-06-16 02:49:29 UTC
Created attachment 5929 [details]
Xorg log file from faulty start.
Comment 2 Michel Dänzer 2006-06-16 02:52:20 UTC
Can you try and isolate the guilty commit with git bisect?
Comment 3 Marcin Kurek 2006-06-16 04:51:35 UTC
This problem apper also with CVS version 06.06.06 some time ago. But maybe it
was related to not exported symbols. I see this was added later on.

I can try to isolatie it but this would take a long time. Xorg  compiles ages
here ;(
Comment 4 Michel Dänzer 2006-06-16 07:51:18 UTC
It's most likely related to Daniel Stone's PCI changes, so you could start by
verifying that say the tree from May 31st works and 1.1.99.2 doesn't.
Comment 5 Marcin Kurek 2006-06-16 08:58:29 UTC
One point for you. I locate both PCI related patches from git and revert:

http://gitweb.freedesktop.org/?p=xorg-xserver;a=commitdiff;h=56f21bda1ce95741c88c423b60bd709eef26eb12
http://gitweb.freedesktop.org/?p=xorg-xserver;a=commitdiff;h=8444bb77c91cf8a23d32b3cc9749e2a3d3f9f9eb

And after that Xorg starts fine.
Comment 6 Marcin Kurek 2006-06-20 15:27:22 UTC
I found a second today and it seems
http://gitweb.freedesktop.org/?p=xorg-xserver;a=commitdiff;h=56f21bda1ce95741c88c423b60bd709eef26eb12
is responsible for my problems. Also when looking at this one I found another
problem with recent git snap - https://bugs.freedesktop.org/show_bug.cgi?id=7285
Comment 7 Daniel Stone 2006-06-20 22:51:48 UTC
sounds about right.  if you revert just the changes to the for loop in
hw/xfree86/os-support/bus/Pci.c (i.e. idx < xf86MaxPciDevices, and idx ==
xf86MaxPciDevices in the if statement), does it work then?
Comment 8 Marcin Kurek 2006-06-21 01:05:05 UTC
Hmmm, it seems I am blind but I am unable to locate any loop with
xf86MaxPciDevices in this patch :/
Comment 9 Daniel Stone 2006-06-21 02:35:19 UTC
argh, sorry, I'm being dumb.

okay, if you force is26 to always be 0 at the top of the patch, does that work?
Comment 10 Marcin Kurek 2006-06-21 05:29:23 UTC
Yes, forcing is26 to 0 makes it work. Hmmm, I see it uses sysfs on 2.6 and as I
can see the used /sys/bus/pci/XXX/config looks like binary data, can this be
endianese related ?
Comment 11 Michel Dänzer 2006-06-21 05:34:43 UTC
Unlikely, works for me.
Comment 12 Marcin Kurek 2006-06-21 05:51:12 UTC
Maybe then it's a bug in Peggy OF ? 
Comment 13 Daniel Stone 2006-06-21 07:38:32 UTC
wouldn't be the first ...
Comment 14 Marcin Kurek 2006-06-22 01:56:20 UTC
Oups ... I looked at wrong directory. The libdrm.so is in
/usr/lib/xorg/modules/linux normaly, but server is unable to load it correctly.
I  compared binary of libdrm with older version (ithout recent loader changes)
and it's identical then I guess the fault is caused by loader changes.
Comment 15 Marcin Kurek 2006-06-22 01:57:42 UTC
That was for #7285 sorry.
Comment 16 Erik Andren 2006-07-28 10:03:23 UTC
Is this still an issue or has a fix been posted to the git repository?
Comment 17 Daniel Stone 2006-07-28 10:05:54 UTC
no progress yet
Comment 18 David Woodhouse 2006-09-22 04:26:05 UTC
The new code appears to handle only PCI domain #0, while the video card on the
Pegasos is 0001:01:00.0

New gitweb URL for offending patch is
http://gitweb.freedesktop.org/?p=xorg/xserver.git;a=commitdiff;h=56f21bda1ce95741c88c423b60bd709eef26eb12
Comment 19 David Woodhouse 2006-09-22 04:35:11 UTC
It looks like /proc/bus/pci/devices doesn't contain domain information at all --
we should probably be listing the contents of /sys/bus/pci/devices/ instead of
using the old file in /proc.
Comment 20 David Woodhouse 2006-09-22 04:59:00 UTC
Looking at that patch... what purpose does the hunk in linuxPciOpenFile() serve?
Can we just revert that part from the patch (i.e.
http://gitweb.freedesktop.org/?p=xorg/xserver.git;a=blobdiff;h=3e82f211b4cd4e509adb3149c4e291a2592a0ffe;hp=092f28f0398a20a6bc086ad473c358366eab7b26;hb=56f21bda1ce95741c88c423b60bd709eef26eb12;f=hw/xfree86/os-support/bus/linuxPci.c
)? We mustn't switch to /sys/bus/pci/devices until we properly handle PCI domains.
Comment 21 Marcin Kurek 2006-09-22 06:12:31 UTC
Created attachment 7118 [details]
Force use /proc

Why no use something like this ? I guess this would be easier to revert in
future when /sys handle pci domains.
Comment 22 David Woodhouse 2006-09-22 10:13:48 UTC
Once upon a time, we used to handle the existence of multiple PCI domains as
long as the bus numbers were disjoint. Now, we're always assuming PCI domain #0
which may well be wrong -- this is a regression.

Until we handle domains properly, we should revert (or disable) the part of the
patch which touches linuxPciOpenFile() -- it doesn't actually affect the main
purpose of the patch in which it was introduced.
Comment 23 David Woodhouse 2006-09-22 10:14:28 UTC
(In reply to comment #21)
> Created an attachment (id=7118) [edit]
> Force use /proc
> 
> Why no use something like this ? I guess this would be easier to revert in
> future when /sys handle pci domains.

Yes, that would work.
Comment 24 Matthias Hopf 2006-09-26 07:49:44 UTC
I have some patches for including domain support by scaning /sys instead of
/proc, however, they do not 100% work ATM, especially with PPC. I was on
vacation last week, but I hopefully will fix these issues this or next week (and
hopefully the patches will still apply to git ;)
Comment 25 David Woodhouse 2006-09-26 08:01:03 UTC
That isn't necessary. This regression can be fixed trivially by reverting the
offending part of the 'optimisation' patch, which wasn't actually necessary anyway.
Comment 26 Matthias Hopf 2006-09-27 04:01:02 UTC
/proc is considered depricated, so in the long term it *is* IMHO necessary.
Comment 27 David Woodhouse 2006-09-27 05:36:04 UTC
Yes, in the long term it _is_ necessary to get proper support for multiple
domains. In the meantime, however, we don't have to suffer this regression even
in the case of non-overlapping bus numbers. 

We're using /proc for the enumeration anyway -- which is why we don't have
proper domain numbers and can't just use PCI_TAG_TO_DOM() to get the right value
instead of '0000:'
Comment 28 Daniel Stone 2006-10-28 17:14:44 UTC
presumably the correct fix for 7.2 is just to define is26 = -1.
Comment 29 Matthias Hopf 2006-10-31 03:58:55 UTC
I have working patches for multidomain ia64 support here. Had to have them
tested more, seem to be stable now.

If noone objects I'll include them upstream this week.
Comment 30 Matthias Hopf 2006-11-03 10:14:16 UTC
Commited.

Some parts of the patches still use /proc alone, though.

morgoth, please test that this version works for you. I'm closing that as FIXED
right now, please reopen if there are any issues left.
Comment 31 David Woodhouse 2007-03-07 09:58:21 UTC
Seems broken again in 1.2.99.901

Fixed by http://david.woodhou.se/xorg-x11-server-1.2.99-unbreak-domain.patch
Comment 32 Adam Jackson 2007-04-08 13:38:00 UTC
Move to 7.3 tracker.
Comment 33 Brice Goglin 2007-05-05 02:55:42 UTC
Breakage failure reported in Debian on sparc64 (see [1]). I think it would also break with David Woodhouse's patch to disable domain support, since is26 would be 1 on this architecture anyway.

What puzzles me is that the breakage seemed to appear while fixing bug #6583 which was just about parsing /proc/bus/pci/devices only once [2]. Switching to /sys/ files at this point seems far from relevant. It looks like /sys/.../config files do not have an ioctl handler as /proc/bus/pci/... files do [3] (PCIIOC_MMAP_IS_IO/MEM is not used anywhere else). I don't see how it could work. And it seems to cause our sparc64 breakage (several ioctls fail and the server aborts in xf86MapPciMem).

Brice

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=422077
[2] http://gitweb.freedesktop.org/?p=xorg/xserver.git;a=commitdiff;h=56f21bda1ce95741c88c423b60bd709eef26eb12;hp=a9ed5a87902a839a5a135af03db78f113b18bd86
[3] http://www.linux-m32r.org/lxr/http/source/drivers/pci/proc.c?v=2.6.19#L205
Comment 34 Jesse Barnes 2007-08-31 10:35:07 UTC
It *looks* like this should be close to fixed.  The only thing that might be missing is support for looking at sysfs in linuxDomainSupport() (it just looks for domain numbers in /proc/bus/pci atm).

Matthias, it looks like the rest of the code should handle domains correctly (at least we're tracking domains in the Linux pci_dev structure), but I'm not sure if the PCITAGs being passed around have the domain encoded...

Jesse
Comment 35 Jesse Barnes 2007-08-31 10:36:06 UTC
Totally untested patch:

--- a/hw/xfree86/os-support/bus/linuxPci.c
+++ b/hw/xfree86/os-support/bus/linuxPci.c
@@ -516,6 +516,10 @@ linuxDomainSupport(void)
     struct dirent *dirent;
     char *end;

+    /* If we have sysfs, we can fill in the domain */
+    if (stat("/sys/bus/pci",&ignored) < 0)
+       return TRUE;
+
     if (!(dir = opendir("/proc/bus/pci")))
        return FALSE;
     while (1) {
Comment 36 Brice Goglin 2007-08-31 13:02:54 UTC
I can't test on a non-x86 machine but I still worried by the current code in master.

We (Debian) had to force is26 to 0 in Xserver 1.3. Without this, linuxMapPci opens a /sys/bus/pci/x.y/config file and runs an ioctl on it. Unless I am mistaken, these files don't have ioctl handlers in the kernel, so I was apparently causing failures on powerpc and sparc.

Current master (with pci-rework then) seems to keep doing the same, except if there's a /sys/class/pci_bus/z.t/legacy_mem. So, unless all problematic architectures (including powerpc and sparc) have such a legacy_mem file, I am afraid we would still need to force is26 to 0 in current master to get it to work on 2.6 kernels. I hope I am wrong...
Comment 37 Matthias Hopf 2007-09-03 03:48:36 UTC
I won't be able to look at that in the next 3-4 weeks, after that we should check whether libpciaccess is finally used or not. If not we'll probably have to dig into that, otherwise make libpciaccess run fine on this architecture as well.
Comment 38 Stefan Dirsch 2007-11-04 19:59:05 UTC
Matthias, any updates available?
Comment 39 Marcin Kurek 2008-03-24 12:55:34 UTC
Sorry for late reply a had a little amount of free time last few months. I compiled libpciaccess, xf86-video-ati and xorg-server from git (snap ftrom yesterday) to verify is tgere any problem.

A good news is that xserver starts normaly without any additional patches to radeon drivers or server itself, but it seems there is still a problem with DRI as it doesnt works. 

Result of glxinfo:
> name of display: :0.0
> libGL error: open DRM failed (Operation not permitted)
> libGL error: reverting to (slow) indirect rendering
> Segmentation fault

Looking at server log's I can see it fails in a few places:

drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is -1, (No such device or address)
drmOpenDevice: open result is -1, (No such device or address)
drmOpenDevice: Open failed
drmOpenByBusid: Searching for BusID pci:0001:01:08.0
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is 6, (OK)
drmOpenByBusid: drmOpenMinor returns 6
drmOpenByBusid: drmGetBusid reports pci:0000:01:08.0

Hmmm, this can be normal, but why it fails at first open ? And

drmGetBusid returned ''

Hmm, this should be empty ? Anyway the error is:

drmOpenByBusid: drmOpenMinor returns -19
(EE) AIGLX error: drmOpenOnce failed (Operation not permitted)
(EE) AIGLX: reverting to software rendering

and

(II) RADEON(0): [drm] failure adding irq handler, there is a device already using that irq
[drm] falling back to irq-free operation

I will attach a full log of coz.
Comment 40 Marcin Kurek 2008-03-24 12:58:17 UTC
Created attachment 15432 [details]
Xorg log from recent xorg-server start

I wonder why the card is initialized so many times ? I see the DDC I2C interace is initialized 3 times.
Comment 41 Adam Jackson 2008-04-07 14:19:32 UTC
I'm going to be wildly optimistic here and call this fixed by pciaccess.  Yay!

Anyone with a domainful machine, please test and reopen this bug if it's still busted for you.  Note, only concerned with X launching here, DRI is a different kettle of worms.
Comment 42 Marcin Kurek 2008-04-07 22:02:23 UTC
Is this a known bug ? I mean the DRI one or I should create a separate report here about that ?
Comment 43 David Woodhouse 2008-04-08 01:28:40 UTC
It's working on Pegasos in Fedora 9, certainly.
Comment 44 David Woodhouse 2008-04-08 01:29:58 UTC
Sorry, I mean X launching. I haven't looked at DRI at all.
Comment 45 Michel Dänzer 2008-04-08 01:46:02 UTC
(In reply to comment #42)
> Is this a known bug ? I mean the DRI one or I should create a separate report
> here about that ?

See e.g. bug 14326 and duplicate.
Comment 46 Marcin Kurek 2008-04-08 08:54:38 UTC
I think in my case this can be diffrent problem as this one should be fixed because libpciaccess ? Anyway created a report with more details - #15404


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.