56859 – [SNB regression]i-g-t gem_tiled_swapping fails

Bug 56859 - [SNB regression]i-g-t gem_tiled_swapping fails

Summary: [SNB regression]i-g-t gem_tiled_swapping fails

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	unspecified
Hardware:	All Linux (All)

Importance:	high major
Assignee:	Daniel Vetter
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Duplicates (1):	59095 (view as bug list)
Depends on:
Blocks:

Reported:	2012-11-08 08:02 UTC by lu hua
Modified:	2017-10-06 14:47 UTC (History)
CC List:	6 users (show)

See Also:
i915 platform:
i915 features:

Attachments
dmesg (49.48 KB, text/plain) 2012-11-14 05:45 UTC, lu hua	no flags	Details
View All

Description lu hua 2012-11-08 08:02:43 UTC

System Environment:
--------------------------
Arch:           i386
Platform:       Sandybridge
Mesa:	(master)5cbc0f00368b9ddc127007be2bd7f60940aa93ed
Kernel:	( drm-intel-nightly) b5a833707960154164cf450647c76547be43a167

Bug detailed description:
-------------------------
It fails on sandybridge with -nightly branch. It doesn't happen on -fixes branch.
output:
mismatch at 254208: -378754475

The last known bad commit: b5a833707960154164cf450647c76547be43a167( Merge: afef67f 4a8dece)
The last known good commit: 032e254cefb0485c95aceca269be499b91f48aa0(Merge: 8c74a16 b6e0e54)

Reproduce steps:
----------------
1 ./gem_tiled_swapping

Comment 1 Daniel Vetter 2012-11-08 15:25:10 UTC

Can you please bisect this regression?

Comment 2 lu hua 2012-11-13 05:48:39 UTC

Bisect shows:7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 is the first bad commit
commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125
Author: Jianguo Wu <wujianguo@huawei.com>
Date:   Mon Oct 8 16:33:06 2012 -0700

    mm: fix-up zone present pages

    I think zone->present_pages indicates pages that buddy system can management,
    it should be:

        zone->present_pages = spanned pages - absent pages - bootmem pages,

    but is now:
        zone->present_pages = spanned pages - absent pages - memmap pages.

    spanned pages: total size, including holes.
    absent pages: holes.
    bootmem pages: pages used in system boot, managed by bootmem allocator.
    memmap pages: pages used by page structs.

    This may cause zone->present_pages less than it should be.  For example,
    numa node 1 has ZONE_NORMAL and ZONE_MOVABLE, it's memmap and other
    bootmem will be allocated from ZONE_MOVABLE, so ZONE_NORMAL's
    present_pages should be spanned pages - absent pages, but now it also
    minus memmap pages(free_area_init_core), which are actually allocated from
    ZONE_MOVABLE.  When offlining all memory of a zone, this will cause
    zone->present_pages less than 0, because present_pages is unsigned long
    type, it is actually a very large integer, it indirectly caused
    zone->watermark[WMARK_MIN] becomes a large
    integer(setup_per_zone_wmarks()), than cause totalreserve_pages become a
    large integer(calculate_totalreserve_pages()), and finally cause memory
    allocating failure when fork process(__vm_enough_memory()).

    [root@localhost ~]# dmesg
    -bash: fork: Cannot allocate memory

    I think the bug described in

      http://marc.info/?l=linux-mm&m=134502182714186&w=2

    is also caused by wrong zone present pages.

    This patch intends to fix-up zone->present_pages when memory are freed to
    buddy system on x86_64 and IA64 platforms.

Comment 3 Daniel Vetter 2012-11-13 10:21:12 UTC

Two things to test:

- Can you please check whether reverting the bisected commit on top of dinq resolves the issue?

- Before we report this problem upstream it's good to test whether it's fixed already. I've pushed out a for-QA branch with latestet dinq, -fixes and upstream git from Linus all merged together. Please test that.

Comment 4 lu hua 2012-11-14 05:45:10 UTC

Created attachment 70052 [details]
dmesg

Comment 5 lu hua 2012-11-14 05:46:24 UTC

It works well when revert the bisect commit.

It also fails on for-QA branch.
Test on commit 104ec25077751a0abbd9f523a48b7f84e6842ea3
commit:104ec25077751a0abbd9f523a48b7f84e6842ea3(Merge: c8928b6 9924a19)

Comment 6 Daniel Vetter 2012-11-14 09:40:32 UTC

For paranoia: Can you please run a memtester on the affected box, to rule out memory corruptions?

Comment 7 Chris Wilson 2012-11-14 09:56:46 UTC

I also observe the bug on a SNB i5-2520m (32-bit PAE with 3GiB), and can confirm the revert fixes gem_tiled_swapping.

Comment 8 Daniel Vetter 2012-11-14 13:48:40 UTC

Can you please test the patch at https://lkml.org/lkml/2012/11/5/866 ?

Comment 9 Chris Wilson 2012-11-14 14:34:53 UTC

Patch worksforme. I see it already is in mmotm, so close?

Comment 10 Chris Wilson 2012-11-14 15:21:04 UTC

Hmm, machine later died completely whilst idle. Possibly unrelated, but unlikely...

Comment 11 Gordon Jin 2012-11-15 02:28:11 UTC

looks like Chris has answered. So clearnin needinfo.

Comment 12 Daniel Vetter 2012-12-05 20:27:28 UTC

Offending patch has been reverted in upstream Linus' git:

commit 5576646f3c1abd60d72d19829de6f5d8c2ca8ecf
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Fri Nov 16 14:15:06 2012 -0800

    revert "mm: fix-up zone present pages"

It's not yet in any of the branches merged together with -nightly though.

Comment 13 lu hua 2012-12-10 05:34:08 UTC

Fixed on -nightly branch.
Still happens on -queued branch.

Comment 14 lu hua 2012-12-27 08:12:25 UTC

Verified.Fixed.

Comment 15 Daniel Vetter 2013-01-08 08:09:22 UTC

*** Bug 59095 has been marked as a duplicate of this bug. ***

Comment 16 Elizabeth 2017-10-06 14:47:52 UTC

Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.