Kernel: ( drm-intel-nightly) b5a833707960154164cf450647c76547be43a167
Bug detailed description:
It fails on sandybridge with -nightly branch. It doesn't happen on -fixes branch.
mismatch at 254208: -378754475
The last known bad commit: b5a833707960154164cf450647c76547be43a167( Merge: afef67f 4a8dece)
The last known good commit: 032e254cefb0485c95aceca269be499b91f48aa0(Merge: 8c74a16 b6e0e54)
Can you please bisect this regression?
Bisect shows:7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 is the first bad commit
Author: Jianguo Wu <email@example.com>
Date: Mon Oct 8 16:33:06 2012 -0700
mm: fix-up zone present pages
I think zone->present_pages indicates pages that buddy system can management,
it should be:
zone->present_pages = spanned pages - absent pages - bootmem pages,
but is now:
zone->present_pages = spanned pages - absent pages - memmap pages.
spanned pages: total size, including holes.
absent pages: holes.
bootmem pages: pages used in system boot, managed by bootmem allocator.
memmap pages: pages used by page structs.
This may cause zone->present_pages less than it should be. For example,
numa node 1 has ZONE_NORMAL and ZONE_MOVABLE, it's memmap and other
bootmem will be allocated from ZONE_MOVABLE, so ZONE_NORMAL's
present_pages should be spanned pages - absent pages, but now it also
minus memmap pages(free_area_init_core), which are actually allocated from
ZONE_MOVABLE. When offlining all memory of a zone, this will cause
zone->present_pages less than 0, because present_pages is unsigned long
type, it is actually a very large integer, it indirectly caused
zone->watermark[WMARK_MIN] becomes a large
integer(setup_per_zone_wmarks()), than cause totalreserve_pages become a
large integer(calculate_totalreserve_pages()), and finally cause memory
allocating failure when fork process(__vm_enough_memory()).
[root@localhost ~]# dmesg
-bash: fork: Cannot allocate memory
I think the bug described in
is also caused by wrong zone present pages.
This patch intends to fix-up zone->present_pages when memory are freed to
buddy system on x86_64 and IA64 platforms.
Two things to test:
- Can you please check whether reverting the bisected commit on top of dinq resolves the issue?
- Before we report this problem upstream it's good to test whether it's fixed already. I've pushed out a for-QA branch with latestet dinq, -fixes and upstream git from Linus all merged together. Please test that.
Created attachment 70052 [details]
It works well when revert the bisect commit.
It also fails on for-QA branch.
Test on commit 104ec25077751a0abbd9f523a48b7f84e6842ea3
commit:104ec25077751a0abbd9f523a48b7f84e6842ea3(Merge: c8928b6 9924a19)
For paranoia: Can you please run a memtester on the affected box, to rule out memory corruptions?
I also observe the bug on a SNB i5-2520m (32-bit PAE with 3GiB), and can confirm the revert fixes gem_tiled_swapping.
Can you please test the patch at https://lkml.org/lkml/2012/11/5/866 ?
Patch worksforme. I see it already is in mmotm, so close?
Hmm, machine later died completely whilst idle. Possibly unrelated, but unlikely...
looks like Chris has answered. So clearnin needinfo.
Offending patch has been reverted in upstream Linus' git:
Author: Andrew Morton <firstname.lastname@example.org>
Date: Fri Nov 16 14:15:06 2012 -0800
revert "mm: fix-up zone present pages"
It's not yet in any of the branches merged together with -nightly though.
Fixed on -nightly branch.
Still happens on -queued branch.
*** Bug 59095 has been marked as a duplicate of this bug. ***
Closing old verified.