Bug 77474

Summary: [PNV/IVB/HSW/BDW]igt/gem_tiled_swapping is slow
Product: DRI Reporter: Guo Jinxian <jinxianx.guo>
Component: DRM/IntelAssignee: Chris Wilson <chris>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: low CC: chris, intel-gfx-bugs, przanoni, yi.sun
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on: 72742    
Bug Blocks:    
Attachments:
Description Flags
gem_tiled_swapping dmesg
none
dmesg none

Description Guo Jinxian 2014-04-15 09:18:00 UTC
Created attachment 97383 [details]
gem_tiled_swapping dmesg

system Environment:
--------------------------
Platform: PNV IVB HSW
Kernel:(drm-intel-nightly)cf8c74f4e33678bc5151df4b68cb7f8c2b51bf6e

Bug detailed description:
-----------------------------
igt/gem_tiled_swapping unable to finish testing within 10 minutes on -nightly kernel.


output:
IGT-Version: 1.6-g43c2ed7 (x86_64) (Linux: 3.14.0_drm-intel-nightly_cf8c74_20140414+ x86_64)
Using 4861 1MiB objects (available RAM: 3733/3862, swap: 3999)


Reproduce steps:
----------------------------
1. ./gem_tiled_swapping
Comment 1 Imre Deak 2014-04-15 09:47:35 UTC
This looks similar to bug 72742 (CC'ing Chris). The dmesg contains HSW unclaimed register warnings (CC'ing Paulo).

For the unclaimed register warnings could you provide a dmesg on the HSW with drm.debug=0xe kernel option?
Comment 2 Chris Wilson 2014-04-15 09:59:32 UTC
The dmesg is irrelevant for this bug though. Those warns should be filed seperately.

gem_tiled_swapping is meant to be slow, it has to thrash the swapfile and make sure we exercise all memory.
Comment 3 Guo Jinxian 2014-04-16 02:50:46 UTC
Created attachment 97441 [details]
dmesg

Update dmesg on HSW with "drm.debug=0xe"
Comment 4 Guo Jinxian 2014-04-17 02:53:41 UTC
This case spends about 25 minutes. Thanks.

[root@x-hsw24 tests]# date;./gem_tiled_swapping;date
Thu Apr 17 10:18:39 CST 2014
IGT-Version: 1.6-g43c2ed7 (x86_64) (Linux: 3.14.0_drm-intel-nightly_45912b_20140416+ x86_64)
Using 4861 1MiB objects (available RAM: 3723/3862, swap: 3999)
Test assertion failure function check_bo, file gem_tiled_swapping.c:113:
Last errno: 0, Success
Failed assertion: data[j] == j
mismatch at 96022: 97430
Subtest threaded: FAIL
Segmentation fault (core dumped)
Thu Apr 17 10:43:21 CST 2014
Comment 5 Chris Wilson 2014-04-17 07:27:37 UTC
Why haven't you reported the failure? ARGH.
Comment 6 Guo Jinxian 2014-04-21 02:34:23 UTC
(In reply to comment #5)
> Why haven't you reported the failure? ARGH.

Checked on latest -nightly, the case was passed, but the test spend 53 minutes.

[root@x-hsw24 tests]# date;./gem_tiled_swapping;date
Mon Apr 21 09:36:03 CST 2014
IGT-Version: 1.6-g78e4c2b (x86_64) (Linux: 3.14.0_drm-intel-nightly_1e771b_20140420+ x86_64)
Using 4861 1MiB objects (available RAM: 3690/3862, swap: 3999)
Subtest threaded: SUCCESS
Mon Apr 21 10:29:59 CST 2014
You have new mail in /var/spool/mail/root
Comment 7 Daniel Vetter 2014-05-19 13:58:58 UTC
Please retest with latest -nightly.
Comment 8 Chris Wilson 2014-05-20 08:58:23 UTC
commit ceabbba524fb43989875f66a6c06d7ce0410fe5c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Mar 25 13:23:04 2014 +0000

    drm/i915: Include bound and active pages in the count of shrinkable objects
    
    When the machine is under a lot of memory pressure and being stressed by
    multiple GPU threads, we quite often report fewer than shrinker->batch
    (i.e. SHRINK_BATCH) pages to be freed. This causes the shrink_control to
    skip calling into i915.ko to release pages, despite the GPU holding onto
    most of the physical pages in its active lists.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=72742
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Robert Beckett <robert.beckett@intel.com>
    Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 9 liulei 2014-05-28 00:57:58 UTC
I test on latest -nightly. it takes about 47min to finish. I checkout kernel to that commit  ceabbba524fb43989875f66a6c06d7ce0410fe5c. Result is as below:
--------------------------------
[root@x-bdw tests]# time  ./gem_tiled_swapping
IGT-Version: 1.6-gff3c122 (x86_64) (Linux: 3.14.0_kcloud_ceabbb_20140521+ x86_64)
Using 4832 1MiB objects (available RAM: 3718/3865, swap: 3871)
Subtest threaded: SUCCESS

real    47m4.744s
user    121m46.095s
sys     2m8.743s
Comment 10 Chris Wilson 2014-07-19 11:09:49 UTC
As a point of reference:

=0 ickle:/opt/xorg/src/intel-gpu-tools/tests$ time sudo ./gem_tiled_swapping 
IGT-Version: 1.7-gd0dc2c5 (x86_64) (Linux: 3.16.0-rc5+ x86_64)
Using 9598 1MiB objects (available RAM: 7558/7905, swap: 6773)
Subtest threaded: SUCCESS

real    13m6.927s
user    0m3.881s
sys     4m17.993s

Using 2x the amount of data, my machine is about 4x quicker. An order of magnitude difference suggests machine configuration - but just maybe it is worth restesting.
Comment 11 Guo Jinxian 2014-07-21 03:01:04 UTC
(In reply to comment #10)
> As a point of reference:
> 
> =0 ickle:/opt/xorg/src/intel-gpu-tools/tests$ time sudo ./gem_tiled_swapping 
> IGT-Version: 1.7-gd0dc2c5 (x86_64) (Linux: 3.16.0-rc5+ x86_64)
> Using 9598 1MiB objects (available RAM: 7558/7905, swap: 6773)
> Subtest threaded: SUCCESS
> 
> real    13m6.927s
> user    0m3.881s
> sys     4m17.993s
> 
> Using 2x the amount of data, my machine is about 4x quicker. An order of
> magnitude difference suggests machine configuration - but just maybe it is
> worth restesting.

The test result on latest -nightly(8734408c113bb38234ed03ec51c723b3deff579b) shows below:
[root@x-bdw01 tests]# time ./gem_tiled_swapping
IGT-Version: 1.7-g4d4f4b2 (x86_64) (Linux: 3.16.0-rc5_drm-intel-nightly_873440_20140721+ x86_64)
Using 4877 1MiB objects (available RAM: 3702/3882, swap: 3983)
Subtest threaded: SUCCESS

real    12m17.901s
user    0m1.578s
sys     33m10.704s

It still unable to finish in 10 minutes.
Comment 12 Chris Wilson 2014-07-21 06:11:28 UTC
What is the test like on production hardware?
Comment 13 Guo Jinxian 2014-07-21 07:23:02 UTC
(In reply to comment #12)
> What is the test like on production hardware?

The result on production hardware on HSW and IVB platforms were passed.

[root@x-hsw27 tests]# time ./gem_tiled_swapping
IGT-Version: 1.7-g4d4f4b2 (x86_64) (Linux: 3.16.0-rc5_drm-intel-nightly_873440_20140721+ x86_64)
Using 8168 1MiB objects (available RAM: 7490/7669, swap: 1999)
Killed

real    0m6.091s
user    0m1.149s
sys     0m3.213s

[root@x-ivb6 tests]# time ./gem_tiled_swapping
IGT-Version: 1.7-g3f50598 (x86_64) (Linux: 3.16.0-rc5_drm-intel-nightly_873440_20140721+ x86_64)
Using 8382 1MiB objects (available RAM: 7727/7883, swap: 1999)
Killed

real    0m9.995s
user    0m1.609s
sys     0m3.441s
Comment 14 Chris Wilson 2014-07-21 07:27:10 UTC
Killed is not good, but I presume that it was just a victim of bug 72742
Comment 15 Thomas Wood 2014-12-11 18:01:14 UTC
Please re-test with a version of intel-gpu-tools that includes commit 42b02c2.
Comment 16 Thomas Wood 2014-12-17 14:48:47 UTC
(In reply to Thomas Wood from comment #15)
> Please re-test with a version of intel-gpu-tools that includes commit
> 42b02c2.

With this commit, the test takes under 10 minutes to run on a haswell system, so this issue should be fixed.
Comment 17 Guo Jinxian 2014-12-18 08:00:18 UTC
Verified

[root@x-hsw24 tests]# time ./gem_tiled_swapping
IGT-Version: 1.9-g6262f35 (x86_64) (Linux: 3.18.0_drm-intel-nightly_664366_20141218+ x86_64)
Using 640 1MiB objects (available RAM: 352/3861, swap: 3999)
Subtest non-threaded: SUCCESS (15.362s)
Subtest threaded: SUCCESS (396.799s)

real    6m56.492s
user    0m0.228s
sys     0m19.193s
Comment 18 Elizabeth 2017-10-06 14:38:41 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.