The int10 module directly accesses PCI config space via the 0xCF8 and 0xCFC registers. This causes system lockups on multiprocessor systems when multiple processes are trying to access PCI config space.
Also, we currently have code to use the platform specific access method that coordinates access, but it was #ifdef'd out. Egbert and I discussed this issue and we agreed that the #ifdefs were not necessary. The patch that I will attach next removes them and fixes this bug.
Created attachment 2285 [details] [review] Use system method to access PCI config space Credit for this patch goes to Dell.
Comment on attachment 2285 [details] [review] Use system method to access PCI config space Make the int10 module use the system method to access PCI config space. Egbert and I reviewed this patch. It has already been checked into HEAD.
This patch breaks Video POST code in i810 driver! xf86Int10InfoPtr pInt = xf86InitInt10(pI830->pEnt->index); xf86DrvMsg(pScrn->scrnIndex, X_INFO, "SetDisplayDevices failed, re-trying.\n"); /* Now perform our warm boot */ if (pInt) { pInt->num = 0xe6; xf86ExecX86int10 (pInt); xf86FreeInt10 (pInt); xf86DrvMsg(pScrn->scrnIndex, X_INFO, "Re-POSTing via int10.\n"); } This code with "Use system method to access PCI config space" messes up registers of video card. Without that patch it works OK.
Alan, any ideas on this one? I'm at a loss here why this change would cause problems with the i810.
Created attachment 2531 [details] [review] Initialize val to 0 I'm not sure yet, but a quick glance at the previous patch on this bug shows that maybe this will help.
This needs to be done in helper_exec.c and it all works. void x_outl(CARD16 port, CARD32 val) { #ifdef PRINT_PORT ErrorF(" outl(%#x, %8.8x)\n", port, val); #endif - if (!pciCfg1out(port, val)) + /* if (!pciCfg1out(port, val))*/ outl(Int10Current->ioBase + port, val); }
(In reply to comment #6) > Created an attachment (id=2531) [edit] > Initialize val to 0 > > I'm not sure yet, but a quick glance at the previous patch on this bug shows > that maybe this will help. I think the two cases are covered since val will be set in the pciCfg1in function if it returns 1, and if it returns 0, then val will be set by the call to inl().
(In reply to comment #7) > This needs to be done in helper_exec.c and it all works. > > void > x_outl(CARD16 port, CARD32 val) > { > #ifdef PRINT_PORT > ErrorF(" outl(%#x, %8.8x)\n", port, val); > #endif > - if (!pciCfg1out(port, val)) > + /* if (!pciCfg1out(port, val))*/ > outl(Int10Current->ioBase + port, val); > } > Unfortunately, this will cause lockups (especially on multiprocessor systems). The problem is that when the X server is trying to access PCI config space through CF8/CFC, there may be another process that is trying to do the same thing. If, for example, X sets the config addr and a second process sets it to something else, then when the X reads/writes the value with a call to x_inl/x_outl, the it will be reading/writing from the incorrect config space addr. The same is true if you reverse the roles of the two processes. Using the pciReadLong/pciWriteLong functions make the reading/writing of PCI config space atomic. I'm suprised this caused problems on the i810. Also, adding Alan back to CC list...
Alan, might this be caused by only having x_inl/x_outl use the system method? We could easily make the same changes to x_inb/x_outb and x_inw/x_outw.
(In reply to comment #9) > Unfortunately, this will cause lockups (especially on multiprocessor systems). > The problem is that when the X server is trying to access PCI config space > through CF8/CFC, there may be another process that is trying to do the same > thing. When I add ErrorF to see which address is used in x_outl, then I have full log of messages like this: outl(0x1800, 00061204) outl(0x1804, 00000001) that is far different address then 0xcf8 or 0xcfc. So it is strange for me why should pciCfg1out even influence the behaviour, but maybe I'm something missing.
Created attachment 2538 [details] [review] Write 0xCF8 without value This patch fixes the problem. In the patches case it never writes 0xCF8 until 0xCFC is written. I've not debugged exactly why, but obviously something needs 0xCF8 to be written. Lukas tested this patch and it works.
(In reply to comment #12) > This patch fixes the problem. In the patches case it never writes 0xCF8 until > 0xCFC is written. I've not debugged exactly why, but obviously something needs > 0xCF8 to be written. Lukas tested this patch and it works. Alan, this sound suspiciously like someone is setting the pci config space address with outl at cf8 and then using an inb/w or outb/w at cfc. I'll create a patch to make byte and word reads/writes of config space also use the system method.
Created attachment 2540 [details] [review] Use system method for byte and word accesses also Lukas, could you test this patch without Alan's patch to see if it also fixes the problem?
O.k, but it's not the driver that's mucking about with config space. It's more than likely the Video BIOS that's doing it.
(In reply to comment #15) > O.k, but it's not the driver that's mucking about with config space. It's more > than likely the Video BIOS that's doing it. Sorry, I meant the BIOS. I'm guessing that the BIOS on Lukas' system is trying to read/write bytes and/or words from config space instead of just longs. This is the first one that I've seen do this, but I'm not surprised. Anyway, I think we need to use the system method for all accesses to config space; otherwise, we will eventually end up with problems.
(In reply to comment #14) > Created an attachment (id=2540) [edit] > Use system method for byte and word accesses also > > Lukas, could you test this patch without Alan's patch to see if it also fixes > the problem? Just partly. Video registers are mostly restored ok (about 50 pixels stripe on the top of screen is ok, but the rest contains garbage and seems like different resolution)
(In reply to comment #17) > Just partly. Video registers are mostly restored ok (about 50 pixels stripe on > the top of screen is ok, but the rest contains garbage and seems like different > resolution) Anyway, tomorrow I will get new notebook, so I will be unable to test this issue any more. (It will have i915 video card, but maybe better bios)
> Anyway, tomorrow I will get new notebook, so I will be unable to test this issue > any more. (It will have i915 video card, but maybe better bios) Lukas, thank you very much for testing the patches. I've found a couple problems with my patch. I'll attach another one shortly. If you still have the laptop today, could you try the next patch out? Also, could you let me know the make and model of your laptop so that I can see if I can find a similar one here for testing?
Created attachment 2571 [details] [review] Use system method for byte/word accesses (take 2) This one fixes the masking problems with the config address.
(In reply to comment #20) > Created an attachment (id=2571) [edit] > Use system method for byte/word accesses (take 2) > > This one fixes the masking problems with the config address. Did not help. I have Acer TM 242FX laptop.
(In reply to comment #20) > Created an attachment (id=2571) [edit] > Use system method for byte/word accesses (take 2) > > This one fixes the masking problems with the config address. > However, it does not fail completely, after VT switch to text console and back it is ok. But this does not happen with previous version without pciCfg patches or with Alan's patch.
(In reply to comment #22) > However, it does not fail completely, after VT switch to text console and back > it is ok. But this does not happen with previous version without pciCfg patches > or with Alan's patch. That's very interesting. I'd like to figure out what's going on here, but that will probably not be possible until I get a system that can reproduce the problem. In the mean time, we could either combine Alan's and my later patch, or we could just use Alan's patch. Alan, any preferences?
For now, I think we should probably combine them. Undoubtably the system method is the right way, but at least my workaround still uses the system method to avoid the lockups. Thanks Kevin.
Egbert asked that this be assigned to him, so that he can look into it.
Just to say that it would appear that this bug is one of the two reasons for problems with Matrox G550 cards (and others) in FC3Tx and FC4 https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=161242 and others! The gcc volatile problem causes the Cont/Alt/F1-6 problem The removal of #Patch9337: xorg-x11-6.8.2-use-linux-native-pciscan-by-default.patch solves the Cont/Alt/F7 corruption problem See comment 132 among many others https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=161242#c132 I apologise if this cross reference is well known and under control
Could it be that the BIOS outl at 0xCF8 and inb at 0xCFD (or 0xCFE or 0xCFF) ? It would explain why Alan's workaround works whereas Kevin's patch misses inb.
Created attachment 3129 [details] [review] fixes endian problem with swapped r and b colors This patch is based on Kevin's patch. It considers odd video bioses that access bytes or words at 0xCF9, 0xCFA, 0xCFB and 0xCFD, 0xCFE, 0xCFF. Unfortunatly, I dont't have such a hardware. Can someone test this?
Created attachment 3130 [details] Xorg.0.log shows 1600x1200 mode settings and the error message Errr, sorry, it should be better with this one.
I have tested the extra patch on Athlon64 i386 FC4 G550 I used Mike's latest xorg-x11-6.8.2-43.src.rpm and the "standard" FC4 gcc, not the very latest modified one. I works !!!!!!!!!!! No screen corruption either at Cont/Alt/F7 or Cont/Alt/F1-6 I have only done superficial couple of boots/reboots Hope other people find the same Great work Regards John
Created attachment 3135 [details] Server side of SolarisIA extension Fixed interval in pciCfg1outb().
Oliver, great catch. I'll see if we can test this shortly, and post the results. Thanks!
We've tested Olivier's updated patch in rawhide for a while now and it appears to be working as expected. I'm going to check his patch in to CVS to get additional testing. If no further problems are found, we can close this bug. Unfortunately, it seems that fdo bugzilla has lost his patch as a result of the recent fdo hard disk failure, so I'll attach it here shortly plus a version with both fixes in a single patch that could be applied to the 6.8 branch Thanks again Olivier!
Created attachment 2987 [details] [review] Use system method for byte/word accesses (take 3) This is the patch from Olivier Baudron that was lost in the recent fdo disk crash.
Created attachment 2989 [details] [review] Use system method for all accesses This is the combined patch to be nominated for 6.8.3.
Comment on attachment 2989 [details] [review] Use system method for all accesses Updated patch to make the int10 module use the system method to access PCI config space. It has already received testing in rawhide and has been checked into CVS HEAD. This patch combines the previous two patches into one patch to be applied to the next 6.8 branch release.
marking as FIXED since this is applied to HEAD. for 6.8.3, search on the approval flag, not on bug open status.
Created attachment 3382 [details] [review] Patch to implement byte and word level pci config access The PCI spec only specifies that access to the standardized header of the PCI config space shall not have side effects when reading and writing from it. Data beyond that header doesn't seem to have that guarantee. The matrox vesa bios generates byte reads and writes beyond the standard header, and when using the linux /proc way of accessing PCI config space, everything is implemented as 32 bit reads and writes. This apparently breaks the vesa driver on matrox cards. The /proc method will do outw and outb if you do 2-byte and 1-byte reads from the /proc file, and this updated patch adds support for that. With this change, the vesa driver works on matrox again.
Reopening bug for discussion of updated patch.
We've had this patch in the Red Hat and Fedora RPMs for a while now and it's working pretty well. I'll commit this to HEAD in a few days unless somebody sees a problem with it. Kristian
Patch committed to HEAD.
Comment on attachment 3382 [details] [review] Patch to implement byte and word level pci config access This is a better fix for pci config space accesses.
Kristian, it may exist 2 small typos in your final patch: --- xorg-x11-6.8.2-use-linux-native-pciscan-by-default.patch 2006-03-02 21:19:21.000000000 +0100 +++ xorg-x11-6.8.2-use-linux-native-pciscan-by-default.patch 2006-03-02 21:25:19.000000000 +0100 @@ -121,7 +121,7 @@ +linuxPciCfgReadWord(PCITAG tag, int off) +{ + int fd; -+ CARD16 val = 0xff; ++ CARD16 val = 0xffff; + + if (-1 != (fd = linuxPciOpenFile(tag))) { + lseek(fd, off, SEEK_SET); @@ -163,7 +163,7 @@ + + if ((bus >= 0) && ((bus < pciNumBuses) || inProbe) && pciBusInfo[bus] && + pciBusInfo[bus]->funcs->pciReadWord) { -+ CARD32 rv = (*pciBusInfo[bus]->funcs->pciReadWord)(tag, offset); ++ CARD16 rv = (*pciBusInfo[bus]->funcs->pciReadWord)(tag, offset); + + return(rv); + } else {
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.