Summary: | Multiple domains no longer work _even_ with disjoint bus numbers. | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Marcin Kurek <morgoth6> | ||||||||
Component: | Server/General | Assignee: | Xorg Project Team <xorg-team> | ||||||||
Status: | RESOLVED FIXED | QA Contact: | |||||||||
Severity: | normal | ||||||||||
Priority: | high | CC: | daniel, dberkholz, dwmw2, jbarnes, mat, morgoth6, notting, sndirsch | ||||||||
Version: | git | ||||||||||
Hardware: | Other | ||||||||||
OS: | Linux (All) | ||||||||||
Whiteboard: | |||||||||||
i915 platform: | i915 features: | ||||||||||
Bug Depends on: | |||||||||||
Bug Blocks: | 8888, 10101 | ||||||||||
Attachments: |
|
Description
Marcin Kurek
2006-06-16 02:48:52 UTC
Created attachment 5929 [details]
Xorg log file from faulty start.
Can you try and isolate the guilty commit with git bisect? This problem apper also with CVS version 06.06.06 some time ago. But maybe it was related to not exported symbols. I see this was added later on. I can try to isolatie it but this would take a long time. Xorg compiles ages here ;( It's most likely related to Daniel Stone's PCI changes, so you could start by verifying that say the tree from May 31st works and 1.1.99.2 doesn't. One point for you. I locate both PCI related patches from git and revert: http://gitweb.freedesktop.org/?p=xorg-xserver;a=commitdiff;h=56f21bda1ce95741c88c423b60bd709eef26eb12 http://gitweb.freedesktop.org/?p=xorg-xserver;a=commitdiff;h=8444bb77c91cf8a23d32b3cc9749e2a3d3f9f9eb And after that Xorg starts fine. I found a second today and it seems http://gitweb.freedesktop.org/?p=xorg-xserver;a=commitdiff;h=56f21bda1ce95741c88c423b60bd709eef26eb12 is responsible for my problems. Also when looking at this one I found another problem with recent git snap - https://bugs.freedesktop.org/show_bug.cgi?id=7285 sounds about right. if you revert just the changes to the for loop in hw/xfree86/os-support/bus/Pci.c (i.e. idx < xf86MaxPciDevices, and idx == xf86MaxPciDevices in the if statement), does it work then? Hmmm, it seems I am blind but I am unable to locate any loop with xf86MaxPciDevices in this patch :/ argh, sorry, I'm being dumb. okay, if you force is26 to always be 0 at the top of the patch, does that work? Yes, forcing is26 to 0 makes it work. Hmmm, I see it uses sysfs on 2.6 and as I can see the used /sys/bus/pci/XXX/config looks like binary data, can this be endianese related ? Unlikely, works for me. Maybe then it's a bug in Peggy OF ? wouldn't be the first ... Oups ... I looked at wrong directory. The libdrm.so is in /usr/lib/xorg/modules/linux normaly, but server is unable to load it correctly. I compared binary of libdrm with older version (ithout recent loader changes) and it's identical then I guess the fault is caused by loader changes. That was for #7285 sorry. Is this still an issue or has a fix been posted to the git repository? no progress yet The new code appears to handle only PCI domain #0, while the video card on the Pegasos is 0001:01:00.0 New gitweb URL for offending patch is http://gitweb.freedesktop.org/?p=xorg/xserver.git;a=commitdiff;h=56f21bda1ce95741c88c423b60bd709eef26eb12 It looks like /proc/bus/pci/devices doesn't contain domain information at all -- we should probably be listing the contents of /sys/bus/pci/devices/ instead of using the old file in /proc. Looking at that patch... what purpose does the hunk in linuxPciOpenFile() serve? Can we just revert that part from the patch (i.e. http://gitweb.freedesktop.org/?p=xorg/xserver.git;a=blobdiff;h=3e82f211b4cd4e509adb3149c4e291a2592a0ffe;hp=092f28f0398a20a6bc086ad473c358366eab7b26;hb=56f21bda1ce95741c88c423b60bd709eef26eb12;f=hw/xfree86/os-support/bus/linuxPci.c )? We mustn't switch to /sys/bus/pci/devices until we properly handle PCI domains. Created attachment 7118 [details]
Force use /proc
Why no use something like this ? I guess this would be easier to revert in
future when /sys handle pci domains.
Once upon a time, we used to handle the existence of multiple PCI domains as long as the bus numbers were disjoint. Now, we're always assuming PCI domain #0 which may well be wrong -- this is a regression. Until we handle domains properly, we should revert (or disable) the part of the patch which touches linuxPciOpenFile() -- it doesn't actually affect the main purpose of the patch in which it was introduced. (In reply to comment #21) > Created an attachment (id=7118) [edit] > Force use /proc > > Why no use something like this ? I guess this would be easier to revert in > future when /sys handle pci domains. Yes, that would work. I have some patches for including domain support by scaning /sys instead of /proc, however, they do not 100% work ATM, especially with PPC. I was on vacation last week, but I hopefully will fix these issues this or next week (and hopefully the patches will still apply to git ;) That isn't necessary. This regression can be fixed trivially by reverting the offending part of the 'optimisation' patch, which wasn't actually necessary anyway. /proc is considered depricated, so in the long term it *is* IMHO necessary. Yes, in the long term it _is_ necessary to get proper support for multiple domains. In the meantime, however, we don't have to suffer this regression even in the case of non-overlapping bus numbers. We're using /proc for the enumeration anyway -- which is why we don't have proper domain numbers and can't just use PCI_TAG_TO_DOM() to get the right value instead of '0000:' presumably the correct fix for 7.2 is just to define is26 = -1. I have working patches for multidomain ia64 support here. Had to have them tested more, seem to be stable now. If noone objects I'll include them upstream this week. Commited. Some parts of the patches still use /proc alone, though. morgoth, please test that this version works for you. I'm closing that as FIXED right now, please reopen if there are any issues left. Seems broken again in 1.2.99.901 Fixed by http://david.woodhou.se/xorg-x11-server-1.2.99-unbreak-domain.patch Move to 7.3 tracker. Breakage failure reported in Debian on sparc64 (see [1]). I think it would also break with David Woodhouse's patch to disable domain support, since is26 would be 1 on this architecture anyway. What puzzles me is that the breakage seemed to appear while fixing bug #6583 which was just about parsing /proc/bus/pci/devices only once [2]. Switching to /sys/ files at this point seems far from relevant. It looks like /sys/.../config files do not have an ioctl handler as /proc/bus/pci/... files do [3] (PCIIOC_MMAP_IS_IO/MEM is not used anywhere else). I don't see how it could work. And it seems to cause our sparc64 breakage (several ioctls fail and the server aborts in xf86MapPciMem). Brice [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=422077 [2] http://gitweb.freedesktop.org/?p=xorg/xserver.git;a=commitdiff;h=56f21bda1ce95741c88c423b60bd709eef26eb12;hp=a9ed5a87902a839a5a135af03db78f113b18bd86 [3] http://www.linux-m32r.org/lxr/http/source/drivers/pci/proc.c?v=2.6.19#L205 It *looks* like this should be close to fixed. The only thing that might be missing is support for looking at sysfs in linuxDomainSupport() (it just looks for domain numbers in /proc/bus/pci atm). Matthias, it looks like the rest of the code should handle domains correctly (at least we're tracking domains in the Linux pci_dev structure), but I'm not sure if the PCITAGs being passed around have the domain encoded... Jesse Totally untested patch: --- a/hw/xfree86/os-support/bus/linuxPci.c +++ b/hw/xfree86/os-support/bus/linuxPci.c @@ -516,6 +516,10 @@ linuxDomainSupport(void) struct dirent *dirent; char *end; + /* If we have sysfs, we can fill in the domain */ + if (stat("/sys/bus/pci",&ignored) < 0) + return TRUE; + if (!(dir = opendir("/proc/bus/pci"))) return FALSE; while (1) { I can't test on a non-x86 machine but I still worried by the current code in master. We (Debian) had to force is26 to 0 in Xserver 1.3. Without this, linuxMapPci opens a /sys/bus/pci/x.y/config file and runs an ioctl on it. Unless I am mistaken, these files don't have ioctl handlers in the kernel, so I was apparently causing failures on powerpc and sparc. Current master (with pci-rework then) seems to keep doing the same, except if there's a /sys/class/pci_bus/z.t/legacy_mem. So, unless all problematic architectures (including powerpc and sparc) have such a legacy_mem file, I am afraid we would still need to force is26 to 0 in current master to get it to work on 2.6 kernels. I hope I am wrong... I won't be able to look at that in the next 3-4 weeks, after that we should check whether libpciaccess is finally used or not. If not we'll probably have to dig into that, otherwise make libpciaccess run fine on this architecture as well. Matthias, any updates available? Sorry for late reply a had a little amount of free time last few months. I compiled libpciaccess, xf86-video-ati and xorg-server from git (snap ftrom yesterday) to verify is tgere any problem.
A good news is that xserver starts normaly without any additional patches to radeon drivers or server itself, but it seems there is still a problem with DRI as it doesnt works.
Result of glxinfo:
> name of display: :0.0
> libGL error: open DRM failed (Operation not permitted)
> libGL error: reverting to (slow) indirect rendering
> Segmentation fault
Looking at server log's I can see it fails in a few places:
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is -1, (No such device or address)
drmOpenDevice: open result is -1, (No such device or address)
drmOpenDevice: Open failed
drmOpenByBusid: Searching for BusID pci:0001:01:08.0
drmOpenDevice: node name is /dev/dri/card0
drmOpenDevice: open result is 6, (OK)
drmOpenByBusid: drmOpenMinor returns 6
drmOpenByBusid: drmGetBusid reports pci:0000:01:08.0
Hmmm, this can be normal, but why it fails at first open ? And
drmGetBusid returned ''
Hmm, this should be empty ? Anyway the error is:
drmOpenByBusid: drmOpenMinor returns -19
(EE) AIGLX error: drmOpenOnce failed (Operation not permitted)
(EE) AIGLX: reverting to software rendering
and
(II) RADEON(0): [drm] failure adding irq handler, there is a device already using that irq
[drm] falling back to irq-free operation
I will attach a full log of coz.
Created attachment 15432 [details]
Xorg log from recent xorg-server start
I wonder why the card is initialized so many times ? I see the DDC I2C interace is initialized 3 times.
I'm going to be wildly optimistic here and call this fixed by pciaccess. Yay! Anyone with a domainful machine, please test and reopen this bug if it's still busted for you. Note, only concerned with X launching here, DRI is a different kettle of worms. Is this a known bug ? I mean the DRI one or I should create a separate report here about that ? It's working on Pegasos in Fedora 9, certainly. Sorry, I mean X launching. I haven't looked at DRI at all. (In reply to comment #42) > Is this a known bug ? I mean the DRI one or I should create a separate report > here about that ? See e.g. bug 14326 and duplicate. I think in my case this can be diffrent problem as this one should be fixed because libpciaccess ? Anyway created a report with more details - #15404 |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.