Bug 103804 - [IGT] benchmark/gem_exec_nop does not permit to select execution ring
Summary: [IGT] benchmark/gem_exec_nop does not permit to select execution ring
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-17 22:30 UTC by Dmitry Rogozhkin
Modified: 2018-01-05 16:48 UTC (History)
4 users (show)

See Also:
i915 platform: ALL
i915 features: GEM/execlists


Attachments

Description Dmitry Rogozhkin 2017-11-17 22:30:37 UTC
Looking into the code igt/benchmark/gem_exec_nop should permit to select a RING to load. However, this feature is not functional. For example, assuming that i915 PMU patches https://patchwork.freedesktop.org/series/29735/ are applied for the kernel, try:

# perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-busy/ -a ./gem_exec_nop -e rcs
  4.433

 Performance counter stats for 'system wide':

     2,002,891,967 ns   i915/rcs0-busy/
           280,244 ns   i915/vcs0-busy/
           118,222 ns   i915/vcs1-busy/
           361,440 ns   i915/vecs0-busy/
           365,253 ns   i915/bcs0-busy/

       3.033127723 seconds time elapsed

# perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-busy/ -a ./gem_exec_nop -e vcs
  4.531

 Performance counter stats for 'system wide':

     2,005,028,005 ns   i915/rcs0-busy/
           304,735 ns   i915/vcs0-busy/
           100,476 ns   i915/vcs1-busy/
           348,364 ns   i915/vecs0-busy/
           383,365 ns   i915/bcs0-busy/

       3.048972240 seconds time elapsed

# perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-busy/ -a ./gem_exec_nop -e bcs
  4.548

 Performance counter stats for 'system wide':

     2,003,302,067 ns   i915/rcs0-busy/
           229,991 ns   i915/vcs0-busy/
            50,410 ns   i915/vcs1-busy/
           249,257 ns   i915/vecs0-busy/
           267,072 ns   i915/bcs0-busy/

       3.050740036 seconds time elapsed

# perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-busy/ -a ./gem_exec_nop -e vecs
  4.547

 Performance counter stats for 'system wide':

     2,002,918,507 ns   i915/rcs0-busy/
           251,940 ns   i915/vcs0-busy/
           134,314 ns   i915/vcs1-busy/
           345,163 ns   i915/vecs0-busy/
           366,121 ns   i915/bcs0-busy/

       3.054508956 seconds time elapsed

# perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-busy/ -a ./gem_exec_nop -e all
  4.488

 Performance counter stats for 'system wide':

     2,004,461,103 ns   i915/rcs0-busy/
           194,267 ns   i915/vcs0-busy/
           104,581 ns   i915/vcs1-busy/
           306,019 ns   i915/vecs0-busy/
           291,113 ns   i915/bcs0-busy/

       3.061850018 seconds time elapsed

So, you see that the load goes always to rcs0. The reason seems to be the commit:

commit 05ca171aa9a6902614241f9685de2f62f30126d8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 3 10:43:09 2016 +0100

    benchmarks/gem_exec_nop: Extend submission to check write inter-engine sync

    Currently, we look at the throughput for submitting a read batch to a
    single engine or any. The kernel optimises for this by allowing multiple
    engine to read at the same time, but writes are exclusive to a single
    engine. So lets try to measure the impact of inserting the barriers
    between writes on different engines.

    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

which actually shadowed the RING parameter in the loop function:

static int loop(unsigned ring, int reps, int ncpus, unsigned flags) {

       all_nengine = 0;
        for (ring = 1; ring < 16; ring++) {
                execbuf.flags &= ~ENGINE_FLAGS;
                execbuf.flags |= ring;
                if (__gem_execbuf(fd, &execbuf) == 0)
                        all_engines[all_nengine++] = ring;
        }

        if (ring == -1) {
                nengine = all_nengine;
                memcpy(engines, all_engines, all_nengine*sizeof(engines[0]));
        } else {
                nengine = 1;
                engines[0] = ring;
        }

}
Comment 1 Dmitry Rogozhkin 2017-11-20 19:54:07 UTC
https://patchwork.freedesktop.org/patch/189372/ - will this patch do?
Comment 2 Dmitry Rogozhkin 2017-11-22 00:35:04 UTC
The patch referenced in comment 1 was merged. Bug is fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.