Arch: i386 Platform: Pineview Libdrm: (master)2.4.22-1-g96214860bb0a5e11e7d346351a1be248e3716144 Mesa: (master)d0bfb3c5144a9434efd4d53ced149d42016b5bdc Xserver: (master)xorg-server-1.9.0-120-g853d7ebfa3e2d281d92890a39010ff5787a00ffd Xf86_video_intel: (master)2.12.902-18-g5472359d6860af655a3c286d30558540376c9fdb Cairo: (master)cb0bc64c16b3a38cbf0c622830c18ac9ea6e2ffe Kernel: (drm-intel-next) 2d7b8366ae4a9ec2183c30e432a4a9a495c82bcd Bug detailed description: -------------------------------------- On our Pineview platform, the 2D performance will drop 40%~55% if we test with compiz enabled. I test with x11perf. x11perf -aa10text: 1070k(no compiz) 460k(with compiz) x11perf -rgb10text: 699k(no compiz) 450k(with compiz) Reproduce steps: -------------------------------------- 1. xinit& 2. x11perf -aa10text
450k, you should be happy! ;-) More seriously, rgb10text should be well over 1Mglyphs/s on PNV on bare X. Time to learn perf. :) If you haven't already have the tool installed (should be available with something like a perf or linux-tools package), then go into the kernel source directory cd tools/perf && make. Then do: 1. xinit& 2. sudo perf record -f -g -a x11perf -rgb10text I think symbol resolution is at report time, so then do sudo perf report > rgb10text.txt and attach that file. Thanks.
Created attachment 39438 [details] the log file got by perf when test x11perf.
I have got some data with perf running x11perf, one is in gnome desktop without compiz and another with compiz. The performance data is : x11perf -aa10text: 1020k (without compiz) 446k (with compiz) and the log got with perf is in attachment.
Ok, that doesn't show the processor hotspots I have seen in the past when tuning the glyph performance, the no-compiz profile is in line with what I see here. Similarly with compiz there are no true hotspots, so the throughput drop is purely due to extra rendering latency incurred through the compositor round-trip. There maybe some room for improving the batching between the compositor/X, but the only way to truly eliminate the compositor latency is by moving to Wayland [viz a compositing X server].
I'm currently seeing 800k/1400k aa10text on PineView with and without mutter respectively. Considering the overheads, I'm lowering my acceptance threshold for mutter/compiz to 40% of raw speed. (There's the damage computation, plus the smaller batches and extra copies which all add up). The only way to rectify this is to integrate the compositor with X, a story I have heard before.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.