Bug 15360

Summary: [GM965] Fail to set up write-combining range with 4G (works with 2G)
Product: xorg Reporter: Bryce Harrington <bryce>
Component: Driver/intelAssignee: Wang Zhenyu <zhenyu.z.wang>
Status: RESOLVED NOTOURBUG QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: bugzilla, chris.sorisio, d13f00l, hugh, marcus, mika.fischer
Version: 7.3 (2007.09)   
Hardware: Other   
OS: All   
URL: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/210780
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg output with 4GB RAM
none
/proc/mtrr with 4GB RAM
none
dmesg output with 2GB RAM
none
/proc/mtrr with 2GB RAM
none
Script to run at boot fixing my specific issue none

Description Bryce Harrington 2008-04-04 18:42:05 UTC
I'm forwarding the following Ubuntu bug here:
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/210780

"Today I upgraded my RAM from 1GB to 4GB. Everything worked fine but I noticed that scrolling in Firefox and dragging windows is noticeably slower than before. I checked this again by removing 2GB and got the same results (i.e. 2GB -> fast scrolling, 4GB -> slower scrolling)"

Xorg.0.log w/ 2G:  http://launchpadlibrarian.net/13097075/Xorg.0.log.2GB
Xorg.0.log w/ 4G:  http://launchpadlibrarian.net/13097084/Xorg.0.log.4GB
xorg.conf:  http://launchpadlibrarian.net/13046569/xorg.conf
lspci:      http://launchpadlibrarian.net/13046585/lspci.txt

00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller [8086:2a02] (rev 03) (prog-if 00 [VGA controller])
	Subsystem: Samsung Electronics Co Ltd Unknown device [144d:c510]

Difference between the two log files includes:


-(II) intel(0): Kernel reported 488960 total, 1 used
-(II) intel(0): I830CheckAvailableMemory: 1955836 kB available
+(II) intel(0): Kernel reported 1006592 total, 1 used
+(II) intel(0): I830CheckAvailableMemory: 4026364 kB available

@@ -568,7 +568,7 @@ drmOpenByBusid: drmGetBusid reports pci:
 (II) intel(0): [drm] Initialized kernel agp heap manager, 33554432
 (II) intel(0): [dri] visual configs initialized
 (II) intel(0): Page Flipping disabled
-(==) intel(0): Write-combining range (0xd0000000,0x10000000)
+(WW) intel(0): Failed to set up write-combining range (0xd0000000,0x10000000)
 (II) intel(0): vgaHWGetIOBase: hwp->IOBase is 0x03d0, hwp->PIOOffset is 0x0000
 (WW) intel(0): EXA greedy migration mode enabled.
 (II) EXA(0): Forcing greedy migration option
Comment 1 Mika Fischer 2008-04-06 01:09:29 UTC
Hi, I'm the reporter of the Ubuntu bug.

I can add that this issue still occurs with a recent git version of the intel driver. The version I tried is this: ac763634069fe070b3afc073ce437959612d39fe

The Xorg.0.log for this version with 4GB of RAM can be found here:
http://launchpadlibrarian.net/13129535/Xorg.0.log.4GB.git20080318.ac763634

But there are only minor differences. I'll post the diff below (with hunks with changed dates or input event index removed).

If I can do anything to help resolving this bug, please let me know.

------
--- Xorg.0.log.4GB 2008-04-03 19:50:56.000000000 +0200
+++ Xorg.0.log.4GB.git20080318.ac763634 2008-04-05 10:24:11.000000000 +0200
@@ -11,7 +11,7 @@
 Release Date: 5 September 2007
 X Protocol Version 11, Revision 0
 Build Operating System: Linux Ubuntu (xorg-server 2:1.4.1~git20080131-1ubuntu6)
-Current Operating System: Linux arthur 2.6.24-12-generic #1 SMP Wed Mar 12 23:01:54 UTC 2008 i686
+Current Operating System: Linux arthur 2.6.24-14-generic #1 SMP Thu Apr 3 04:49:29 UTC 2008 i686
 Build Date: 30 March 2008 04:42:53PM

        Before reporting problems, check http://wiki.x.org
@@ -286,7 +286,7 @@
 (II) LoadModule: "intel"
 (II) Loading /usr/lib/xorg/modules/drivers//intel_drv.so
 (II) Module intel: vendor="X.Org Foundation"
- compiled for 1.4.0.90, module version = 2.2.1
+ compiled for 1.4.0.90, module version = 2.2.0
        Module class: X.Org Video Driver
        ABI class: X.Org Video Driver, version 2.0
 (II) LoadModule: "kbd"
@@ -462,6 +462,7 @@
 (==) intel(0): video overlay key set to 0x101fe
 (==) intel(0): Will not try to enable page flipping
 (==) intel(0): Triple buffering disabled
+(==) intel(0): Intel XvMC decoder disabled
 (==) intel(0): Using gamma correction (1.0, 1.0, 1.0)
 (==) intel(0): DPI set to (96, 96)
 (II) Loading sub module "fb"
@@ -553,12 +554,11 @@
 (II) intel(0): [drm] added 1 reserved context for kernel
 (II) intel(0): X context handle = 0x1
 (II) intel(0): [drm] installed DRM signal handler
-(==) intel(0): VideoRam: 262144 KB
 (**) intel(0): Framebuffer compression disabled
 (**) intel(0): Tiling enabled
+(==) intel(0): VideoRam: 262144 KB
 (II) intel(0): Attempting memory allocation with tiled buffers.
-(II) intel(0): Success.
-(II) intel(0): Increasing the scanline pitch to allow tiling mode (3008 -> 3072).
+(II) intel(0): Tiled allocation successful.
 (II) intel(0): [drm] Registers = 0xf0000000
 (II) intel(0): [drm] ring buffer = 0xd0000000
 (II) intel(0): [drm] mapped front buffer at 0xd0100000, handle = 0xd0100000
@@ -612,6 +612,7 @@
 (II) intel(0): RandR 1.2 enabled, ignore the following RandR disabled message.
 (**) Option "dpms"
 (**) intel(0): DPMS enabled
+(II) intel(0): Set up textured video
 (II) intel(0): Set up overlay video
 (II) intel(0): direct rendering: Enabled
 (--) RandR disabled
Comment 2 Wang Zhenyu 2008-04-08 00:24:13 UTC
Pls attach your dmesg.
Comment 3 Mika Fischer 2008-04-08 01:01:56 UTC
Created attachment 15752 [details]
dmesg output with 4GB RAM

If you need the dmesg output with 2GB as well, please tell me. I'll provide it in the evening then.
Comment 4 Wang Zhenyu 2008-04-08 01:24:01 UTC
And "cat /proc/mtrr" output please.
Comment 5 Wang Zhenyu 2008-04-08 01:42:23 UTC
We "luckily" have one Q965 box with this problem too. The dell machine has 1G memory, and installed fedora8 2.6.23 kernel. After boot, /proc/mtrr has this weird  line:
reg00: base=0x00000000 (   0MB), size=65536MB: write-back, count=1
...

which is obvious wrong and covering graphics memory address which caused wc setting failure in X.

I fixed it by 
        echo "disable=0" > /proc/mtrr
        echo "base=0x00000000 size=0x40000000 type=write-back" > /proc/mtrr
which set main memory (1G size) to wb. Then X start can set wc correctly with
...
reg05: base=0xc0000000 (3072MB), size= 256MB: write-combining, count=1

This problem might be some kernel bug to wrongly detect physical memory size, or bios bug also with wrong memory size reported.
Comment 6 Mika Fischer 2008-04-08 03:25:44 UTC
Created attachment 15756 [details]
/proc/mtrr with 4GB RAM

This is the output with /proc/mtrr with 4GB RAM.

I don't have such a weird line...

I'll try to get the output with 2GB soon.
Comment 7 Mika Fischer 2008-04-08 03:37:59 UTC
Created attachment 15758 [details]
dmesg output with 2GB RAM
Comment 8 Mika Fischer 2008-04-08 03:43:23 UTC
Created attachment 15759 [details]
/proc/mtrr with 2GB RAM

OK, I've attached dmesg output and contents of /proc/mtrr when I have only 2GB installed.

The 2GB case looks much more sane:
reg00: base=0x00000000 (   0MB), size=2048MB: write-back, count=1
reg01: base=0x7f700000 (2039MB), size=   1MB: uncachable, count=1
reg02: base=0x7f800000 (2040MB), size=   8MB: uncachable, count=1
reg03: base=0xd0000000 (3328MB), size= 256MB: write-combining, count=1

where probably reg00 is system memory and reg03 is the video memory.

In the 4GB case things look very differently:
reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1
reg01: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
reg02: base=0x100000000 (4096MB), size=1024MB: write-back, count=1
reg03: base=0xbf700000 (3063MB), size=   1MB: uncachable, count=1
reg04: base=0xbf800000 (3064MB), size=   8MB: uncachable, count=1

Now reg01 is probably the system memory and reg03 and reg04 are the same as with 2GB. But instead of a 256MB range for video memory I've now got two with 1024MB, one of which is uncachable and overlaps the system memory...

I hope this is useful. I have no idea what to make of this.

Thanks for any help!
Comment 9 Wang Zhenyu 2008-04-08 18:41:28 UTC
(In reply to comment #8)
> 
> The 2GB case looks much more sane:
> reg00: base=0x00000000 (   0MB), size=2048MB: write-back, count=1
> reg01: base=0x7f700000 (2039MB), size=   1MB: uncachable, count=1
> reg02: base=0x7f800000 (2040MB), size=   8MB: uncachable, count=1
> reg03: base=0xd0000000 (3328MB), size= 256MB: write-combining, count=1
> 
> where probably reg00 is system memory and reg03 is the video memory.

Correct.

> 
> In the 4GB case things look very differently:
> reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1
> reg01: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
> reg02: base=0x100000000 (4096MB), size=1024MB: write-back, count=1
> reg03: base=0xbf700000 (3063MB), size=   1MB: uncachable, count=1
> reg04: base=0xbf800000 (3064MB), size=   8MB: uncachable, count=1
> 
> Now reg01 is probably the system memory and reg03 and reg04 are the same as
> with 2GB. But instead of a 256MB range for video memory I've now got two with
> 1024MB, one of which is uncachable and overlaps the system memory...
> 

yeah, and that 1G range caused failure to set graphics memory from 0xd0000000 to wc. This seems to be a bios issue, maybe you can try upgrade bios or report this to vendor? or you might use my method to workaround this for now. 

This is not our driver bug, so I'll close it.
Comment 10 Mika Fischer 2008-04-09 01:43:40 UTC
(In reply to comment #9)
> > In the 4GB case things look very differently:
> > reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1
> > reg01: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
> > reg02: base=0x100000000 (4096MB), size=1024MB: write-back, count=1
> > reg03: base=0xbf700000 (3063MB), size=   1MB: uncachable, count=1
> > reg04: base=0xbf800000 (3064MB), size=   8MB: uncachable, count=1
> > 
> > Now reg01 is probably the system memory and reg03 and reg04 are the same as
> > with 2GB. But instead of a 256MB range for video memory I've now got two with
> > 1024MB, one of which is uncachable and overlaps the system memory...
> 
> yeah, and that 1G range caused failure to set graphics memory from 0xd0000000
> to wc.

But the main memory also overlaps this region (reg01). Isn't this also a problem?
I always thought that the kernel used the region >3GB for its purposes. Maybe it marked this region as uncachable and moved the real memory to >4GB (the other 1024MB range) somehow?

> This seems to be a bios issue, maybe you can try upgrade bios

I just did. Unfortunately it did not change anything.

> or report this to vendor?

I'll try to do this. But I don't expect much help from Samsung. They're not exactly known to be linux-friendly.

> or you might use my method to workaround this for now. 

I'd love to but I don't quite understand what I have to do for this. Shall I remove the two 1024MB lines? Are you sure that's safe? And add the 256B wc line for the video memory? But then the main memory would still overlap the video memory...

I'd appreciate if you could give me a pointer. If I should ask elsewhere it'd be great if you could tell me where.

Thanks for all the help!
Comment 11 Bryce Harrington 2008-04-09 13:58:10 UTC
(Reopening as per remaining questions from original reporter)
Comment 12 Wang Zhenyu 2008-04-09 20:30:35 UTC
(In reply to comment #10)
> 
> But the main memory also overlaps this region (reg01). Isn't this also a
> problem?
> I always thought that the kernel used the region >3GB for its purposes. Maybe
> it marked this region as uncachable and moved the real memory to >4GB (the
> other 1024MB range) somehow?

Have you tried to remove reg00 first? reg01 also looks wrong.  

> I'd love to but I don't quite understand what I have to do for this. Shall I
> remove the two 1024MB lines? Are you sure that's safe? And add the 256B wc line
> for the video memory? But then the main memory would still overlap the video
> memory...
> 

What I think might be right to do is following e820 memory map in your dmesg.
Remove reg00 and reg01. From your dmesg "BIOS-e820: 0000000000100000 - 00000000bf6d0000 (usable)", you should set that range to wb. (like "BIOS-e820: 0000000100000000 - 0000000140000000 (usable)", which has already been set in reg02).

You can refer kernel's Document/mtrr.txt for more /proc/mtrr operations, and you may raise this issue to lkml, maybe other kernel people have encountered this problem too.



Comment 13 Mika Fischer 2008-04-10 05:30:48 UTC
(In reply to comment #12)
> What I think might be right to do is following e820 memory map in your dmesg.
> Remove reg00 and reg01. From your dmesg "BIOS-e820: 0000000000100000 -
> 00000000bf6d0000 (usable)", you should set that range to wb. (like "BIOS-e820:
> 0000000100000000 - 0000000140000000 (usable)", which has already been set in
> reg02).

I've been able to find a workaround, although it was not as simple as this.

If I remove reg00 (>3GB) first, the machine locks up hard instantly. So this line is definitely there for a reason. The reason probably being that reg01 (wb) overlaps this region.

So I first have to remove reg01 (0-4GB). Then add back (0-2GB) and (2-3GB) as wb ranges. Then I can remove reg00, and add back everything except the video memory as uncachable just in case.

After that the X server will happily add the write-combine range for me and everything is fast again.

I'll attach the script I used in case someone stumbles over this bug looking for a workaround.

> You can refer kernel's Document/mtrr.txt for more /proc/mtrr operations, and
> you may raise this issue to lkml, maybe other kernel people have encountered
> this problem too.

OK, I'll write a main to the kernel list with my problem and the workaround to see if anything can be done about it.

Other than that there's really nothing the intel driver can do about this, so we can close this bug again.

Thanks again for all the help! :)
Comment 14 Mika Fischer 2008-04-10 05:31:28 UTC
Created attachment 15804 [details]
Script to run at boot fixing my specific issue
Comment 15 Gordon Jin 2008-07-03 22:28:35 UTC
*** Bug 16591 has been marked as a duplicate of this bug. ***
Comment 16 Dylan 2008-09-27 08:17:55 UTC
I had an option in my BIOS, called "Memory Remap Feature"
I turned it off, it's some stupid hack for Windows XP.  Now I don't get that message and my MTRR tables look much better.

reg00: base=0x00000000 (   0MB), size=2048MB: write-back, count=1
reg01: base=0x80000000 (2048MB), size=1024MB: write-back, count=1
reg02: base=0xc0000000 (3072MB), size= 256MB: write-back, count=1
reg03: base=0xc7e00000 (3198MB), size=   2MB: uncachable, count=1
reg04: base=0xc8000000 (3200MB), size= 128MB: uncachable, count=1
reg05: base=0xd0000000 (3328MB), size= 256MB: write-combining, count=1
Comment 17 D. Hugh Redelmeier 2008-09-27 08:50:03 UTC
Re comment #16 from Dylan:

I think that if you turn off Remap, you lose RAM.  My understanding is that Remap maps the top of your 4GiB (or whatever) of RAM to addresses above 4GiB, leaving some of the 3GiB of address space available for PCI devices and the like.  Without remap, this chunk of memory is thrown away.

I have written a program to reorganize MTRRs.  Have a look at it.
 ftp://ftp.cs.utoronto.ca/pub/hugh/mtrr-uncover-2008sept27.tgz
(When I make changes, the date portion of the name changes.)
Comment 18 Dylan 2008-09-27 10:33:04 UTC
*** Bug 17782 has been marked as a duplicate of this bug. ***
Comment 19 Dylan 2008-09-27 11:29:21 UTC
No dice with that app, it says my hardware needs 11 entries, but only it only supports 8.
Comment 20 D. Hugh Redelmeier 2008-09-27 20:20:38 UTC
Re Dylan's posting #19:

I showed how to invoke mtrr-uncover in message #6 of bug 17782:
$ ./mtrr-uncover 0x0d0000000-0x0dfffffff


When you tell it the range you care about, many fewer MTRRs are required.

To actually get the effect you want, you need to change the MTRRs before X starts.
Comment 21 Luis A. Florit 2009-07-10 05:25:04 UTC
Hi,

I have filled a bug report some days ago, that seems closely related to this one. Please, take a look:

https://bugzilla.redhat.com/show_bug.cgi?id=510169

I would like to know two things:

1. Is my bug a duplicate of this one?
2. I've read this bug, but I didn't understand how it would help to fix mine. If this bug is related to mine, could you please indicate me how to fix mine?

Thanks a lot!
L.
Comment 22 Wang Zhenyu 2009-07-12 19:24:38 UTC
(In reply to comment #21)
> Hi,
> 
> I have filled a bug report some days ago, that seems closely related to this
> one. Please, take a look:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=510169
> 
> I would like to know two things:
> 
> 1. Is my bug a duplicate of this one?

I think yes, although you're using a more recent kernel than this one.

> 2. I've read this bug, but I didn't understand how it would help to fix mine.
> If this bug is related to mine, could you please indicate me how to fix mine?
> 

check your /proc/mtrr after boot to see if it's sane with your mem config or not,
as this normally means a bios config bug. There's example here to fix mtrr if something really go wrong.


Comment 23 Luis A. Florit 2009-07-12 19:38:03 UTC
> > 1. Is my bug a duplicate of this one?
> 
> I think yes, although you're using a more recent kernel than this one.

In fact, after I downgraded mesa stuff, foobillard does not freeze my box anymore.
 
> > 2. I've read this bug, but I didn't understand how it would help to fix mine.
> > If this bug is related to mine, could you please indicate me how to fix mine?
> > 
> 
> check your /proc/mtrr after boot to see if it's sane with your mem config or
> not,

Which mem config? The one in grub? I have 4GB of ram, which mem config I should add?

> as this normally means a bios config bug. There's example here to fix mtrr if
> something really go wrong.

This is my /proc/mtrr:

reg00: base=0x000000000 (    0MB), size= 4096MB, count=1: write-back
reg01: base=0x100000000 ( 4096MB), size= 1024MB, count=1: write-back
reg02: base=0x0bdd00000 ( 3037MB), size=    1MB, count=1: write-through
reg03: base=0x0bde00000 ( 3038MB), size=    2MB, count=1: uncachable
reg04: base=0x0be000000 ( 3040MB), size=   32MB, count=1: uncachable
reg05: base=0x0c0000000 ( 3072MB), size= 1024MB, count=1: uncachable


But I am not sure what does this mean, nor what I should change (if anything)...

Thanks!!
L.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.