Bug 8513

Summary: 64-bit Base Address Register truncated to 32-bits
Product: xorg Reporter: Michael Werner <werner>
Component: Server/GeneralAssignee: Ian Romanick <idr>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: high    
Version: 7.2 (2007.02)   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xserver log file when BAR size is 512MB none

Description Michael Werner 2006-10-05 14:16:50 UTC
x86_64 system running Ubuntu 6.06 fails to start X server when a Hypertransport device
in the system advertises a 64-bit pci memory BAR sized larger than or equal to 4GB.
When the BAR size is 2GB then the Xserver runs.
Comment 1 Michael Werner 2006-10-05 14:58:35 UTC
In xf86str.h:

pciVideoRec has a  field "int size[6]" should be "unsigned long size[6]"
Comment 2 Michael Werner 2006-10-05 15:03:30 UTC
Nope, looks like the size is tracked in terms of bits so "int" works there.
Comment 3 Michael Werner 2006-12-27 13:31:39 UTC
Onboard ATI Rage XL chip support works.
However, Nvidia binary driver version 9631 causes board reset even with KDB enabled in kernel.
Comment 4 Michael Werner 2007-01-06 18:07:28 UTC
0000:02:00.0 VGA compatible controller: nVidia Corporation NV43 [GeForce 6600] (rev a2) (prog-if 00 
[VGA])
        Subsystem: ASUSTeK Computer Inc.: Unknown device 81b0
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Memory at f0000000 (32-bit, non-prefetchable) [size=64M]
        Memory at c000000000 (64-bit, prefetchable) [size=256M]
        Memory at f4000000 (64-bit, non-prefetchable) [size=16M]
        Expansion ROM at f5000000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 2
        Capabilities: [68] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
        Capabilities: [78] #10 [0001]

console log shows after running startx:
(WW) ****INVALID MEM ALLOCATION**** b: 0xc000000000 e: 0xc00fffffff correcting
Requesting insufficient memory window!: start: 0x0 end: 0xfffffff size 0xc010000000
Requesting insufficient memory window!: start: 0xf0000000 end: 0xf50fffff size 0xc010000000
(EE) Cannot find a replacement memory range
Comment 5 Michael Werner 2007-01-22 09:55:49 UTC
Created attachment 8477 [details]
Xserver log file when BAR size is 512MB 

Even though the Xserver and Nvidia driver appear to work, you can see a problem
in the log file
where the Nvidia framebuffer region 0xfce0000000 shows up at 0xe0000000 after
the mem window
is suppoedly "fixed" by the Xserver
Comment 6 Michael Werner 2007-01-22 10:03:13 UTC
could be a problem in memory window handling when 40-bit addresses are involved
Comment 7 Michael Werner 2007-01-25 13:12:40 UTC
lspci -xxx -s 2:0 before running startx

02:00.0 VGA compatible controller: nVidia Corporation NV43 [GeForce 6600] (rev a2)
00: de 10 41 01 47 01 10 00 a2 00 00 03 10 00 00 00
10: 00 00 00 f0 0c 00 00 e0 fc 00 00 00 04 00 00 f4
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 b0 81
30: 00 00 00 f5 60 00 00 00 00 00 00 00 00 01 00 00
40: 43 10 b0 81 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 00 00 00 01 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 02 00 00 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 01 00 c0 04 00 00
80: 10 28 00 00 01 4d 01 00 08 00 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 0c 08 40 c1 01 04 40 c1
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

lspci output after running startx:

00: de 10 41 01 47 01 10 00 a2 00 00 03 00 00 00 00
10: 00 00 00 f0 0c 00 00 e0 00 00 00 00 04 00 00 f4
20: 00 00 00 00 00 00 00 00 00 00 00 00 43 10 b0 81
30: 00 00 00 00 60 00 00 00 00 00 00 00 00 01 00 00
40: 43 10 b0 81 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 00 00 00 01 00 00 00 ce d6 23 00 00 00 00 00
60: 01 68 02 00 00 00 00 00 05 78 80 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 10 00 01 00 c0 04 00 00
80: 10 28 00 00 01 4d 01 00 08 00 01 11 00 00 00 00
90: 00 00 00 00 00 00 00 00 0c 08 40 c1 01 04 40 c1
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

You can see that the framebuffer BAR has been overwritten by Xorg
from 0xfce0000000 -> 0x00e0000000 

Here are the MTRR registers:
reg00: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
reg01: base=0x100000000 (4096MB), size=1024MB: write-back, count=1
reg02: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1

so one reason why this doesn't immediately kill the machine is that it just
happens I presume that the truncated address falls into the MMIO hole
in the 3-4GB range and no other device is using that truncated address range.

Here is the Opteron's address map:
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00: 22 10 01 11 00 00 00 00 00 00 00 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 03 00 00 00 00 00 3f 01 00 00 00 00 01 00 00 00
50: 00 00 00 00 02 00 00 00 00 00 00 00 03 00 00 00
60: 00 00 00 00 04 00 00 00 00 00 00 00 05 00 00 00
70: 00 00 00 00 06 00 00 00 00 00 00 00 07 00 00 00
80: 00 00 00 00 00 00 00 00 03 0a 00 00 00 0c 00 00
90: 03 00 fc 00 20 1f fc 00 03 00 f0 fc 20 ff ef fc
a0: 03 20 fc 00 10 1f fc 00 03 00 c0 fc 10 ff df fc
b0: 03 00 f0 00 00 1f f7 00 03 00 e0 fc 00 ff ef fc
c0: 30 f0 8f 01 37 e0 fb 01 03 40 00 00 20 30 00 00
d0: 03 40 00 00 10 30 00 00 13 10 00 00 00 30 00 00
e0: 03 00 00 02 03 01 40 40 03 02 80 82 00 00 00 00
f0: 01 40 00 c0 00 00 00 00 00 00 00 00 00 00 00 00

Comment 8 Daniel Stone 2007-02-27 01:33:54 UTC
Sorry about the phenomenal bug spam, guys.  Adding xorg-team@ to the QA contact so bugs don't get lost in future.
Comment 9 Adam Jackson 2007-05-17 19:08:17 UTC
Can you attach the output of 'lspci -v' from a configuration with a >4GB BAR?
Comment 10 Michael Werner 2007-05-17 22:34:19 UTC
Comment #4 shows a case where BAR > 4GB. The Nvidia framebuffer has a 40-bit address 0xc000000000
so that when the Xserver truncates  the Nvidia BAR it will be 0x0 and the machine hangs.
The BAR > 2G just has the side-effect of increasing the alignment of the Nvidia framebuffer as the
algorithm used by Linuxbios to allocate the prefmem64 resources is quite simple.
Comment 11 Michael Werner 2007-07-15 13:09:27 UTC
As stated above, the underlying problem is present even when there is no BAR > 2G is size.
dmesg reports the nvidia.ko might also be buggy, NVRM debug output reports an incorrect BAR as well:

[  368.383230] NVRM: probing 0x10de 0x141, class 0x30000
[  368.383242] PCI: Setting latency timer of device 0000:02:00.0 to 64
[  368.383365] NVRM: 02:00.0 10de:0141 - 0xf0000000 [size=64M]
[  368.383369] NVRM: 02:00.0 10de:0141 - 0xe0000000 [size=256M]
[  368.383381] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  100.14.11  Wed Jun 13 16:33:22 PDT 2007
[  368.383410] saved orig pats as 0x70406 0x70406
[  368.383798] changed pats to 0x70106 0x70406

It could just be the debug message itself is incorrect and is printing only BAR1 and ignoring BAR2
even though its a 64-bit BAR.

lspci reports:
02:00.0 VGA compatible controller: nVidia Corporation NV43 [GeForce 6600] (rev a2) (prog-if 00 [VGA])
        Subsystem: ASUSTeK Computer Inc. Unknown device 81b0
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Memory at f0000000 (32-bit, non-prefetchable) [size=64M]
        Memory at fce0000000 (64-bit, prefetchable) [size=256M]
        Memory at f4000000 (64-bit, non-prefetchable) [size=16M]
        Expansion ROM at f5000000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 2
        Capabilities: [68] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
        Capabilities: [78] Express Endpoint IRQ 0

Another interesting note is that after starting Xorg, the Xserver rewrites the 64-bit to be 0xe0000000
and then when Xorg is shutdown I've seen the BAR rewritten back to be 0xfce0000000. 
I'm  assuming nvidia.ko is using power management code and is calling pci_restore_state at some point. In other words, the kernel's pci layer still thinks the BAR is 0xfce0000000 and the 
Opteron's MMIO Address Map register shows (using lspci -xxx -s 0:18.1):

b0: 03 00 f0 00 00 1f f7 00 03 00 e0 fc 00 ff ef fc
---------------------------^^^^^^
So that requests to that HT address goes to the correct PCIe bridge (nodeid=0, link id =0)
which is just the CK804 Nvidia Southbridge.  Also, the bridges prefetchable memory base/limit registers were programmed by the BIOS consistent with that original BAR value:

00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        Memory behind bridge: f0000000-f50fffff
        Prefetchable memory behind bridge: 000000fce0000000-000000fcefffffff
        Capabilities: [40] Power Management version 2
        Capabilities: [48] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable-
        Capabilities: [58] HyperTransport: MSI Mapping
        Capabilities: [80] Express Root Port (Slot+) IRQ 0
Comment 12 Michael Werner 2007-07-16 11:24:59 UTC
debug output from nvidia kernel driver after running startx:
[  411.314410] NVRM: nv_kern_open...
[  411.314416] NVRM: nv_kern_ctl_open
[  411.314419] NVRM: nv_acpi_init: acpi_bus_register_driver() failed (-19)!
[  411.314422] NVRM: failed to register with the ACPI subsystem!
[  411.314789] NVRM: ioctl(0xd2, 0xd0b0b130, 0x48)
[  411.314804] NVRM: ioctl(0xc8, 0xdd9b2c00, 0xe0)
[  411.314928] NVRM: ioctl(0x22, 0xd0b0b290, 0xc)
[  411.322541] NVRM: ioctl(0x2a, 0xd0b0b1b0, 0x20)
[  411.322640] NVRM: ioctl(0x4d, 0xd0b0b150, 0x40)
[  411.322659] NVRM: ioctl(0x4d, 0xd0b0b150, 0x40)
[  411.322686] NVRM: ioctl(0x2a, 0xd0b0b0a0, 0x20)
[  411.323285] NVRM: nv_kern_open...
[  411.323289] NVRM: nv_kern_open on device 0
[  411.323294] NVRM: Incorrect BAR1 = 0x4000000c, restoring 0xe0000000
[  411.323298] NVRM: Incorrect BAR1 = 0x000000004000000c, restoring 0x00000000e0000000
[  411.323313] NVRM: RmInitAdapter: 2:0
[  411.323316] NVRM: RmSetupRegisters for 0x10de:0x141
[  411.323317] NVRM: pci config info:
[  411.323319] NVRM:   registers look  like: 0xf0000000 0x4000000
[  411.323322] NVRM:   fb        looks like: 0xe0000000 0x10000000
[  411.323324] NVRM: warning, kernel thinks our registers are 67108864, scaling back to 16777216
[  411.323355] NVRM: Successfully mapped framebuffer and registers
[  411.323357] NVRM: final mappings:
[  411.323359] NVRM:    regs: 0xf0000000 0x1000000 0x580000
Comment 13 Michael Werner 2007-07-16 11:30:52 UTC
changed summary to reflect broader problem found 
Comment 14 Michael Werner 2007-07-16 11:37:50 UTC
changed version to 7.2 
changed severity to major since it hangs the machine when the truncated address is 0x0
Comment 15 Michael Werner 2007-07-24 20:05:30 UTC
I just booted with no RPU and observed that after starting the Xserver
the machine is unusable. It behaves differently to the case where the RPU
BAR > 2GB but it still is unable to run the Xserver.

In this case the bios allocates 0xfcf0000000 as the framebuffer address
and 0xf000000 for BAR0 on the nvidia card. Hence the truncated FB address
is the same as BAR0 and it breaks.
Comment 16 Michael Werner 2007-08-30 10:42:56 UTC
pciBusAddrToHostAddr is broken on x86 64-bit Linux platforms because xf86GetOSOffsetFromPCI
does not check whether the BAR has the 64-bit memory attribute bit set. Therefore, the routine cannot
be doing comparisons against base addresses using only 32-bits of a 40-bit physical address.
Comment 17 Michael Werner 2007-08-30 14:39:54 UTC
changing xf86GetOSOffsetFromPCI to use pci_device_cfg_read_u32(dev, &savePtr, offset) after merging
the pcirework branch doesn't take into account whether the BAR refers to a 64-bit BAR or not.
Comment 18 Michael Werner 2007-09-21 10:57:28 UTC
After hacking the linuxbios code to allocate prefetchable PCI resources in the non-prefetchable region for PCI devices in the display class, the Xserver happily works again with the Nvidia card.
This hack effectively just means the framebuffer BAR is allocated below 4GB.

IMO, this is further evidence 64-bit BAR support is broken.

Below is part of Xorg log file showing the Nvidia framebuffer allocated at offset 0xd0000000
with size 256MB. 

        [18] -1 0       0xfcf0200000 - 0xfcf021ffff (0x20000) MX[B]
        [19] -1 0       0xfce0000000 - 0xfcffffffff (0x20000000) MX[B]
        [20] -1 0       0xe5000000 - 0xe501ffff (0x20000) MX[B](B)
        [21] -1 0       0xe4000000 - 0xe4ffffff (0x1000000) MX[B](B)
        [22] -1 0       0xd0000000 - 0xdfffffff (0x10000000) MX[B](B)
        [23] -1 0       0xe0000000 - 0xe3ffffff (0x4000000) MX[B](B)

02:00.0 VGA compatible controller: nVidia Corporation NV43 [GeForce 6600] (rev a2) (prog-if 00 [VGA])
        Subsystem: ASUSTeK Computer Inc. Unknown device 81b0
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Memory at e0000000 (32-bit, non-prefetchable) [size=64M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at e4000000 (64-bit, non-prefetchable) [size=16M]
        Expansion ROM at e5000000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 2
        Capabilities: [68] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable-
        Capabilities: [78] Express Endpoint IRQ 0
Comment 19 Ian Romanick 2007-09-21 11:05:23 UTC
Which bits is this with?  Have you tried core xserver since the pci-rework merge?  AFAIK, that should make this a non-issue.
Comment 20 Ian Romanick 2008-06-02 07:55:19 UTC
Michael,

Please respond with a status update.  I'm assuming that this problem no longer exists since PCI-rework landed in the trunk.  If I don't hear back soon, I'm going to continue this assumption and close the bug.

Thanks.
Comment 21 Ian Romanick 2009-06-22 18:41:26 UTC
I waited a year.  Closing as stale.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.