System Environment: -------------------------- Arch: i386 Platform: Sandybridge Mesa: (master)5cbc0f00368b9ddc127007be2bd7f60940aa93ed Kernel: ( drm-intel-nightly) b5a833707960154164cf450647c76547be43a167 Bug detailed description: ------------------------- It fails on sandybridge with -nightly branch. It doesn't happen on -fixes branch. output: mismatch at 254208: -378754475 The last known bad commit: b5a833707960154164cf450647c76547be43a167( Merge: afef67f 4a8dece) The last known good commit: 032e254cefb0485c95aceca269be499b91f48aa0(Merge: 8c74a16 b6e0e54) Reproduce steps: ---------------- 1 ./gem_tiled_swapping
Can you please bisect this regression?
Bisect shows:7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 is the first bad commit commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 Author: Jianguo Wu <wujianguo@huawei.com> Date: Mon Oct 8 16:33:06 2012 -0700 mm: fix-up zone present pages I think zone->present_pages indicates pages that buddy system can management, it should be: zone->present_pages = spanned pages - absent pages - bootmem pages, but is now: zone->present_pages = spanned pages - absent pages - memmap pages. spanned pages: total size, including holes. absent pages: holes. bootmem pages: pages used in system boot, managed by bootmem allocator. memmap pages: pages used by page structs. This may cause zone->present_pages less than it should be. For example, numa node 1 has ZONE_NORMAL and ZONE_MOVABLE, it's memmap and other bootmem will be allocated from ZONE_MOVABLE, so ZONE_NORMAL's present_pages should be spanned pages - absent pages, but now it also minus memmap pages(free_area_init_core), which are actually allocated from ZONE_MOVABLE. When offlining all memory of a zone, this will cause zone->present_pages less than 0, because present_pages is unsigned long type, it is actually a very large integer, it indirectly caused zone->watermark[WMARK_MIN] becomes a large integer(setup_per_zone_wmarks()), than cause totalreserve_pages become a large integer(calculate_totalreserve_pages()), and finally cause memory allocating failure when fork process(__vm_enough_memory()). [root@localhost ~]# dmesg -bash: fork: Cannot allocate memory I think the bug described in http://marc.info/?l=linux-mm&m=134502182714186&w=2 is also caused by wrong zone present pages. This patch intends to fix-up zone->present_pages when memory are freed to buddy system on x86_64 and IA64 platforms.
Two things to test: - Can you please check whether reverting the bisected commit on top of dinq resolves the issue? - Before we report this problem upstream it's good to test whether it's fixed already. I've pushed out a for-QA branch with latestet dinq, -fixes and upstream git from Linus all merged together. Please test that.
Created attachment 70052 [details] dmesg
It works well when revert the bisect commit. It also fails on for-QA branch. Test on commit 104ec25077751a0abbd9f523a48b7f84e6842ea3 commit:104ec25077751a0abbd9f523a48b7f84e6842ea3(Merge: c8928b6 9924a19)
For paranoia: Can you please run a memtester on the affected box, to rule out memory corruptions?
I also observe the bug on a SNB i5-2520m (32-bit PAE with 3GiB), and can confirm the revert fixes gem_tiled_swapping.
Can you please test the patch at https://lkml.org/lkml/2012/11/5/866 ?
Patch worksforme. I see it already is in mmotm, so close?
Hmm, machine later died completely whilst idle. Possibly unrelated, but unlikely...
looks like Chris has answered. So clearnin needinfo.
Offending patch has been reverted in upstream Linus' git: commit 5576646f3c1abd60d72d19829de6f5d8c2ca8ecf Author: Andrew Morton <akpm@linux-foundation.org> Date: Fri Nov 16 14:15:06 2012 -0800 revert "mm: fix-up zone present pages" It's not yet in any of the branches merged together with -nightly though.
Fixed on -nightly branch. Still happens on -queued branch.
Verified.Fixed.
*** Bug 59095 has been marked as a duplicate of this bug. ***
Closing old verified.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.