|Summary:||[SNB regression]i-g-t gem_tiled_swapping fails|
|Product:||DRI||Reporter:||lu hua <huax.lu>|
|Component:||DRM/Intel||Assignee:||Daniel Vetter <daniel>|
|Status:||CLOSED FIXED||QA Contact:||Intel GFX Bugs mailing list <intel-gfx-bugs>|
|Priority:||high||CC:||ben, bingx.a.yan, chris, daniel, jbarnes, yi.sun|
|i915 platform:||i915 features:|
Description lu hua 2012-11-08 08:02:43 UTC
System Environment: -------------------------- Arch: i386 Platform: Sandybridge Mesa: (master)5cbc0f00368b9ddc127007be2bd7f60940aa93ed Kernel: ( drm-intel-nightly) b5a833707960154164cf450647c76547be43a167 Bug detailed description: ------------------------- It fails on sandybridge with -nightly branch. It doesn't happen on -fixes branch. output: mismatch at 254208: -378754475 The last known bad commit: b5a833707960154164cf450647c76547be43a167( Merge: afef67f 4a8dece) The last known good commit: 032e254cefb0485c95aceca269be499b91f48aa0(Merge: 8c74a16 b6e0e54) Reproduce steps: ---------------- 1 ./gem_tiled_swapping
Comment 1 Daniel Vetter 2012-11-08 15:25:10 UTC
Can you please bisect this regression?
Comment 2 lu hua 2012-11-13 05:48:39 UTC
Bisect shows:7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 is the first bad commit commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 Author: Jianguo Wu <email@example.com> Date: Mon Oct 8 16:33:06 2012 -0700 mm: fix-up zone present pages I think zone->present_pages indicates pages that buddy system can management, it should be: zone->present_pages = spanned pages - absent pages - bootmem pages, but is now: zone->present_pages = spanned pages - absent pages - memmap pages. spanned pages: total size, including holes. absent pages: holes. bootmem pages: pages used in system boot, managed by bootmem allocator. memmap pages: pages used by page structs. This may cause zone->present_pages less than it should be. For example, numa node 1 has ZONE_NORMAL and ZONE_MOVABLE, it's memmap and other bootmem will be allocated from ZONE_MOVABLE, so ZONE_NORMAL's present_pages should be spanned pages - absent pages, but now it also minus memmap pages(free_area_init_core), which are actually allocated from ZONE_MOVABLE. When offlining all memory of a zone, this will cause zone->present_pages less than 0, because present_pages is unsigned long type, it is actually a very large integer, it indirectly caused zone->watermark[WMARK_MIN] becomes a large integer(setup_per_zone_wmarks()), than cause totalreserve_pages become a large integer(calculate_totalreserve_pages()), and finally cause memory allocating failure when fork process(__vm_enough_memory()). [root@localhost ~]# dmesg -bash: fork: Cannot allocate memory I think the bug described in http://marc.info/?l=linux-mm&m=134502182714186&w=2 is also caused by wrong zone present pages. This patch intends to fix-up zone->present_pages when memory are freed to buddy system on x86_64 and IA64 platforms.
Comment 3 Daniel Vetter 2012-11-13 10:21:12 UTC
Two things to test: - Can you please check whether reverting the bisected commit on top of dinq resolves the issue? - Before we report this problem upstream it's good to test whether it's fixed already. I've pushed out a for-QA branch with latestet dinq, -fixes and upstream git from Linus all merged together. Please test that.
Comment 5 lu hua 2012-11-14 05:46:24 UTC
It works well when revert the bisect commit. It also fails on for-QA branch. Test on commit 104ec25077751a0abbd9f523a48b7f84e6842ea3 commit:104ec25077751a0abbd9f523a48b7f84e6842ea3(Merge: c8928b6 9924a19)
Comment 6 Daniel Vetter 2012-11-14 09:40:32 UTC
For paranoia: Can you please run a memtester on the affected box, to rule out memory corruptions?
Comment 7 Chris Wilson 2012-11-14 09:56:46 UTC
I also observe the bug on a SNB i5-2520m (32-bit PAE with 3GiB), and can confirm the revert fixes gem_tiled_swapping.
Comment 8 Daniel Vetter 2012-11-14 13:48:40 UTC
Can you please test the patch at https://lkml.org/lkml/2012/11/5/866 ?
Comment 9 Chris Wilson 2012-11-14 14:34:53 UTC
Patch worksforme. I see it already is in mmotm, so close?
Comment 10 Chris Wilson 2012-11-14 15:21:04 UTC
Hmm, machine later died completely whilst idle. Possibly unrelated, but unlikely...
Comment 11 Gordon Jin 2012-11-15 02:28:11 UTC
looks like Chris has answered. So clearnin needinfo.
Comment 12 Daniel Vetter 2012-12-05 20:27:28 UTC
Offending patch has been reverted in upstream Linus' git: commit 5576646f3c1abd60d72d19829de6f5d8c2ca8ecf Author: Andrew Morton <firstname.lastname@example.org> Date: Fri Nov 16 14:15:06 2012 -0800 revert "mm: fix-up zone present pages" It's not yet in any of the branches merged together with -nightly though.
Comment 13 lu hua 2012-12-10 05:34:08 UTC
Fixed on -nightly branch. Still happens on -queued branch.
Comment 14 lu hua 2012-12-27 08:12:25 UTC
Comment 15 Daniel Vetter 2013-01-08 08:09:22 UTC
*** Bug 59095 has been marked as a duplicate of this bug. ***
Comment 16 Elizabeth 2017-10-06 14:47:52 UTC
Closing old verified.