Looking into the code igt/benchmark/gem_exec_nop should permit to select a RING to load. However, this feature is not functional. For example, assuming that i915 PMU patches https://patchwork.freedesktop.org/series/29735/ are applied for the kernel, try: # perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-busy/ -a ./gem_exec_nop -e rcs 4.433 Performance counter stats for 'system wide': 2,002,891,967 ns i915/rcs0-busy/ 280,244 ns i915/vcs0-busy/ 118,222 ns i915/vcs1-busy/ 361,440 ns i915/vecs0-busy/ 365,253 ns i915/bcs0-busy/ 3.033127723 seconds time elapsed # perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-busy/ -a ./gem_exec_nop -e vcs 4.531 Performance counter stats for 'system wide': 2,005,028,005 ns i915/rcs0-busy/ 304,735 ns i915/vcs0-busy/ 100,476 ns i915/vcs1-busy/ 348,364 ns i915/vecs0-busy/ 383,365 ns i915/bcs0-busy/ 3.048972240 seconds time elapsed # perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-busy/ -a ./gem_exec_nop -e bcs 4.548 Performance counter stats for 'system wide': 2,003,302,067 ns i915/rcs0-busy/ 229,991 ns i915/vcs0-busy/ 50,410 ns i915/vcs1-busy/ 249,257 ns i915/vecs0-busy/ 267,072 ns i915/bcs0-busy/ 3.050740036 seconds time elapsed # perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-busy/ -a ./gem_exec_nop -e vecs 4.547 Performance counter stats for 'system wide': 2,002,918,507 ns i915/rcs0-busy/ 251,940 ns i915/vcs0-busy/ 134,314 ns i915/vcs1-busy/ 345,163 ns i915/vecs0-busy/ 366,121 ns i915/bcs0-busy/ 3.054508956 seconds time elapsed # perf stat -e i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-busy/ -a ./gem_exec_nop -e all 4.488 Performance counter stats for 'system wide': 2,004,461,103 ns i915/rcs0-busy/ 194,267 ns i915/vcs0-busy/ 104,581 ns i915/vcs1-busy/ 306,019 ns i915/vecs0-busy/ 291,113 ns i915/bcs0-busy/ 3.061850018 seconds time elapsed So, you see that the load goes always to rcs0. The reason seems to be the commit: commit 05ca171aa9a6902614241f9685de2f62f30126d8 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Jun 3 10:43:09 2016 +0100 benchmarks/gem_exec_nop: Extend submission to check write inter-engine sync Currently, we look at the throughput for submitting a read batch to a single engine or any. The kernel optimises for this by allowing multiple engine to read at the same time, but writes are exclusive to a single engine. So lets try to measure the impact of inserting the barriers between writes on different engines. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> which actually shadowed the RING parameter in the loop function: static int loop(unsigned ring, int reps, int ncpus, unsigned flags) { all_nengine = 0; for (ring = 1; ring < 16; ring++) { execbuf.flags &= ~ENGINE_FLAGS; execbuf.flags |= ring; if (__gem_execbuf(fd, &execbuf) == 0) all_engines[all_nengine++] = ring; } if (ring == -1) { nengine = all_nengine; memcpy(engines, all_engines, all_nengine*sizeof(engines[0])); } else { nengine = 1; engines[0] = ring; } }
https://patchwork.freedesktop.org/patch/189372/ - will this patch do?
The patch referenced in comment 1 was merged. Bug is fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.