I have observed a strange phenomena where benchmarks scores can vary, by a differing degree, between benchmark runs. By looking at what is happening with the system I have noticed the fast case only ever uses the RCS engine, while the slow case uses BCS at 100% and RCS at a lower percentage (depends on the benchmark exactly how much). I have already chatted with Kenneth (and some other people) about this and apparently this is somewhat known and caused by different BO upload paths. It was supposed to be alleviated in the master branch but for me the results are exactly the same (bad) as what I originally discovered. The most obvious example is the gl_driver test: Slow run, 17.5% RCS busy, 100% BCS busy: ======================================== root@e31:~/benchmarks/gfxbench3_desktop# INTEL_DEBUG=perf ~/bin/run-mesa ~/mesa ./gfxbench-driver.sh Running following GfxBench 3.x test-cases: - gl_driver In following resolutions: - 1920x1080 Fullscreened: - 1 On/offscreen: - on COMMAND: build/linux/gfxbench_Release/mainapp/mainapp -w 1920 -ow 1920 -h 1080 -oh 1080 -t gl_driver -fullscreen 1 ATTENTION: default value of option vblank_mode overridden by environment. Scanning index buffer to compute index buffer bounds. Use glDrawRangeElements() to avoid this. CPU mapping a busy "statebuffer" BO stalled and took 0.017 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.035 ms. CPU mapping a busy "streamed data" BO stalled and took 0.012 ms. CPU mapping a busy "statebuffer" BO stalled and took 0.010 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.022 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.013 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.025 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.038 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.039 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.018 ms. #Name, FPS, Score, Unit, Width, Height, GL version: GLB30_gl_driver, 34.2, 1026.0, frames, 1920, 1080, 3.0 Mesa 17.3.0-devel (git-31fb7bbe0b) Results are in: - gfxbench-result-fullscreen-1.csv Full output logs are in: - gfxbench-result-fullscreen-1.txt Fast run, 75% RCS busy, 0% BCS busy: ===================================== root@e31:~/benchmarks/gfxbench3_desktop# INTEL_DEBUG=perf ~/bin/run-mesa ~/mesa ./gfxbench-driver.sh Running following GfxBench 3.x test-cases: - gl_driver In following resolutions: - 1920x1080 Fullscreened: - 1 On/offscreen: - on COMMAND: build/linux/gfxbench_Release/mainapp/mainapp -w 1920 -ow 1920 -h 1080 -oh 1080 -t gl_driver -fullscreen 1 ATTENTION: default value of option vblank_mode overridden by environment. Scanning index buffer to compute index buffer bounds. Use glDrawRangeElements() to avoid this. CPU mapping a busy "batchbuffer" BO stalled and took 0.010 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.020 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.021 ms. CPU mapping a busy "streamed data" BO stalled and took 0.019 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.015 ms. CPU mapping a busy "statebuffer" BO stalled and took 0.017 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.037 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.036 ms. CPU mapping a busy "statebuffer" BO stalled and took 0.021 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.029 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.019 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.017 ms. CPU mapping a busy "streamed data" BO stalled and took 0.019 ms. CPU mapping a busy "statebuffer" BO stalled and took 0.016 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.023 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.011 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.014 ms. CPU mapping a busy "statebuffer" BO stalled and took 0.015 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.011 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.013 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.015 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.027 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.010 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.015 ms. CPU mapping a busy "statebuffer" BO stalled and took 0.011 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.031 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.038 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.023 ms. CPU mapping a busy "batchbuffer" BO stalled and took 0.026 ms. #Name, FPS, Score, Unit, Width, Height, GL version: GLB30_gl_driver, 101.4, 3042.0, frames, 1920, 1080, 3.0 Mesa 17.3.0-devel (git-31fb7bbe0b) Results are in: - gfxbench-result-fullscreen-1.csv Full output logs are in: - gfxbench-result-fullscreen-1.txt Other tests show a smaller margin between slow and fast modes, but the crux of the problem is still there. I think this is quite a problem for benchmarking, and unless people will say something along the lines of "Don't use gfxbench3/4 ever because A B C", I think it would be good to get to the bottom of this and having something which behaves somewhat more predictably. I also wonder if it is possible that something in the kernel (i915) is causing Mesa to behave like this? Because I am pretty sure I used these benchmarks before, but I don't remember noticing this issue before. 3x difference in gl_driver certainly looks like something one would have thought people would have noticed before if it was old behaviour?
Created attachment 134902 [details] [review] Only use staged uploads for the same batch. An idea to cut out the flip flops on staging uploads.
(In reply to Chris Wilson from comment #1) > Created attachment 134902 [details] [review] [review] > Only use staged uploads for the same batch. > > An idea to cut out the flip flops on staging uploads. No apparent effect in testing with this one.
Btw, should I be seeing "Using a blit copy to avoid stalling on..." messages since I have INTEL_DEBUG=perf turned on? Or there is some other path wo/ perf_debug which does blitter uploads as well?
(In reply to Tvrtko Ursulin from comment #3) > Btw, should I be seeing "Using a blit copy to avoid stalling on..." messages > since I have INTEL_DEBUG=perf turned on? Or there is some other path wo/ > perf_debug which does blitter uploads as well? Yes... And you still see high BCS usage on master? That too shouldn't happen for brw_blorp_copy_buffers, so another indication of barking up the wrong tree. Let's see if we can perf_debug() the switch from RCS to BCS.
Tvrtko, do you see the same issue also with the offscreen version of the test? Benchmarks shouldn't normally be doing uploads after test startup, unless its benchmark for texture upload. The only thing that I've seen using a lot of blitter during test run-time is X server, when it does copy of the non-vsynched frame. This would be most visible when using Intel DDX with DRI2. What X server and X driver you're using? Intel DDX one, or modesetting? If former, do you use DRI2 or DRI3? (LIBGL_DEBUG=verbose should output whether Mesa uses DRI2 or DRI3.)
On that machine I have the Intel DDX with DRI 3 turned on, and Mesa confirms it is using DRI 3. Offscreen version of the test does not seem to suffer from this problem. So I guess user error of some sort?
(In reply to Tvrtko Ursulin from comment #6) > On that machine I have the Intel DDX with DRI 3 turned on, and Mesa confirms > it is using DRI 3. > > Offscreen version of the test does not seem to suffer from this problem. > > So I guess user error of some sort? I think the dual results issue we've discussed is still real. You could test also with modesetting, to make sure blitter usage really goes away, and if yes, whether that makes the performance results more consistent (mostly/partly CPU bound tests like gl_driver are still going to have at least 5x more variance than GPU bound tests have). Even if the cause would be Intel DDX instead of Mesa, it's quite suspicious that it would randomly use blitter for frame copies.
Can't repro with modesetting. Let's see if I can move the bug to xorg/driver/intel..
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-intel/issues/150.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.