Bug 80578 - [All Regression]igt/gem_fence_upload/thread-contention costs long time to execute sporadically
Summary: [All Regression]igt/gem_fence_upload/thread-contention costs long time to exe...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: high normal
Assignee: Mika Kuoppala
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-27 02:53 UTC by Guo Jinxian
Modified: 2017-09-04 10:27 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (105.80 KB, text/plain)
2014-06-27 02:53 UTC, Guo Jinxian
no flags Details

Description Guo Jinxian 2014-06-27 02:53:36 UTC
Created attachment 101834 [details]
dmesg

==System Environment==
--------------------------
Regression: Yes. 

The result was failed before Bug 80079

Non-working platforms: All

==kernel==
--------------------------
origin/drm-intel-nightly: 1087d4bf01e79523898c6c31615bf0c369e0039a(fails)
    drm-intel-nightly: 2014y-06m-25d-13h-11m-05s integration manifest
origin/drm-intel-next-queued: 91565c85b66db820f01894a971d39aaef60c4325(fails)
    drm/i915: Don't try to look up object for non-existent fb
origin/drm-intel-fixes: 8525a235c96a548873c6c5644f50df32b31f04c6(fails)
    drm/i915: vlv_prepare_pll is only needed in case of non DSI interfaces

==Bug detailed description==
-----------------------------
igt/gem_fence_upload/thread-contention costs long time to execute sporadically

Output:
root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# time ./gem_fence_upload --run-subtest thread-contention
IGT-Version: 1.7-g7ef5372 (x86_64) (Linux: 3.16.0-rc2_drm-intel-nightly_1087d4_20140626+ x86_64)
Contended upload rate for 1 linear threads:     866.974MiB/s
Contended upload rate for 1 tiled threads:      888.099MiB/s
Contended upload rate for 2 linear threads:     587.835MiB/s
Contended upload rate for 2 tiled threads:      513.421MiB/s
Contended upload rate for 4 linear threads:     438.407MiB/s
Contended upload rate for 4 tiled threads:      422.278MiB/s
Contended upload rate for 8 linear threads:       2.486MiB/s
Contended upload rate for 8 tiled threads:        2.449MiB/s
Contended upload rate for 16 linear threads:      2.313MiB/s
Contended upload rate for 16 tiled threads:       2.328MiB/s
Contended upload rate for 32 linear threads:      3.088MiB/s
Contended upload rate for 32 tiled threads:       3.089MiB/s
Contended upload rate for 64 linear threads:      3.837MiB/s
Contended upload rate for 64 tiled threads:       3.838MiB/s
Test assertion failure function thread_contention, file gem_fence_upload.c:321:
Last errno: 0, Success
Failed assertion: linear[1] > 0.75 * linear[0]
Subtest thread-contention: FAIL

real    19m50.070s
user    0m0.142s
sys     78m47.751s

==Reproduce steps==
---------------------------- 
1. ./gem_fence_upload --run-subtest thread-contention
Comment 1 Chris Wilson 2014-07-11 19:23:22 UTC
Fwiw, the proposed fix on lkml is:

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index dacc32142fcc..a417733ea9f4 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -285,7 +285,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 {
        struct task_struct *owner;
-       bool on_cpu = true;
+       bool on_cpu = false;
 
        if (need_resched())
                return 0;
Comment 2 Chris Wilson 2014-07-22 09:32:03 UTC
commit 37e9562453b813d2ea527bd9531fef2c3c592847
Author: Jason Low <jason.low2@hp.com>
Date:   Fri Jul 4 20:49:32 2014 -0700

    locking/rwsem: Allow conservative optimistic spinning when readers have lock
    
    Commit 4fc828e24cd9 ("locking/rwsem: Support optimistic spinning")
    introduced a major performance regression for workloads such as
    xfs_repair which mix read and write locking of the mmap_sem across
    many threads. The result was xfs_repair ran 5x slower on 3.16-rc2
    than on 3.15 and using 20x more system CPU time.
    
    Perf profiles indicate in some workloads that significant time can
    be spent spinning on !owner. This is because we don't set the lock
    owner when readers(s) obtain the rwsem.
    
    In this patch, we'll modify rwsem_can_spin_on_owner() such that we'll
    return false if there is no lock owner. The rationale is that if we
    just entered the slowpath, yet there is no lock owner, then there is
    a possibility that a reader has the lock. To be conservative, we'll
    avoid spinning in these situations.
    
    This patch reduced the total run time of the xfs_repair workload from
    about 4 minutes 24 seconds down to approximately 1 minute 26 seconds,
    back to close to the same performance as on 3.15.
    
    Retesting of AIM7, which were some of the workloads used to test the
    original optimistic spinning code, confirmed that we still get big
    performance gains with optimistic spinning, even with this additional
    regression fix. Davidlohr found that while the 'custom' workload took
    a performance hit of ~-14% to throughput for >300 users with this
    additional patch, the overall gain with optimistic spinning is
    still ~+45%. The 'disk' workload even improved by ~+15% at >1000 users.
    
    Tested-by: Dave Chinner <dchinner@redhat.com>
    Acked-by: Davidlohr Bueso <davidlohr@hp.com>
    Signed-off-by: Jason Low <jason.low2@hp.com>
    Signed-off-by: Peter Zijlstra <peterz@infradead.org>
    Cc: Tim Chen <tim.c.chen@linux.intel.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Link: http://lkml.kernel.org/r/1404532172.2572.30.camel@j-VirtualBox
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
Comment 3 Guo Jinxian 2014-08-01 05:39:56 UTC
Verified on latest -nightly(c95053d599112ec3b8c27a632e3c1544558891a4)

root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# time ./gem_fence_upload --run-subtest thread-contention
IGT-Version: 1.7-gde1e877 (x86_64) (Linux: 3.16.0-rc6_drm-intel-nightly_c95053_20140731+ x86_64)
Contended upload rate for 1 linear threads:     977.577MiB/s
Contended upload rate for 1 tiled threads:      1029.535MiB/s
Contended upload rate for 2 linear threads:     561.581MiB/s
Contended upload rate for 2 tiled threads:      548.894MiB/s
Contended upload rate for 4 linear threads:     415.698MiB/s
Contended upload rate for 4 tiled threads:      419.414MiB/s
Contended upload rate for 8 linear threads:     443.218MiB/s
Contended upload rate for 8 tiled threads:      429.411MiB/s
Contended upload rate for 16 linear threads:    398.678MiB/s
Contended upload rate for 16 tiled threads:     413.333MiB/s
Contended upload rate for 32 linear threads:    402.941MiB/s
Contended upload rate for 32 tiled threads:     384.500MiB/s
Contended upload rate for 64 linear threads:    379.671MiB/s
Contended upload rate for 64 tiled threads:      45.291MiB/s
Test assertion failure function thread_contention, file gem_fence_upload.c:321:
Failed assertion: linear[1] > 0.75 * linear[0]
Subtest thread-contention: FAIL

real    0m30.227s
user    0m1.788s
sys     1m3.923s


About the failure, which tracked by bug 80079
Comment 4 Jari Tahvanainen 2017-09-04 10:27:14 UTC
Closing old verified+fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.