Most display functions are faster in Xorg 7.2/7.3 than in Xorg 7.0, but some common image functions are substantially slower. I have linked (see URL) to example code that is written in C++/Qt4 and Python/Gtk2 to demonstrate this issue very clearly when comparing Debian Etch (Xorg 7.0) and the current pre-release of Debian Lenny (Xorg 7.2). Details are included in the tarball. I reported this issue to the Xorg mailing list (See Subject: Regression Problem in Xorg 7.3 on December 12, 2007) and ran some additional tests suggested by the users there which directed this report to the Intel driver component. Specifically, running the example code using the VESA drivers or using Xvfb demonstrate that Xorg 7.2 is faster than Xorg 7.0. Running the scripts using the i810/intel driver shows that Xorg 7.2 is much slower than Xorg 7.0 when displaying overlaying images. I am available to run any other suggested tests and will be watching this bug report for patches or suggestions. Tony
Can you get profiles with something like sysprof or oprofile?
Created attachment 13109 [details] An opreport of Xorg 7.0 while running the image_perf.py
Created attachment 13110 [details] An opreport of Xorg 7.2 while running the image_perf.py
I uploaded two oprofile reports while I was running the image_perf.py Python/Gtk2 example script from my tarball. This script displays an image and turns on and off a transparent overlay 500 times. It runs in about 17 seconds on Etch and about 25 seconds on Lenny (on a 1.2GHz Toughbook CF-18). Let me know if you would like something else run or more information on the issue.
You should use EXA if you're limited by Render performance. It's on by default in the 2.2 driver.
I *am* running with EXA... The image_perf.py benchmark running with "AccelMethod" "EXA": ideal@dhcp-141:~/Xorg_Performance/pygtk$ ./image_perf.py Time to flip the blue overlay 500 times: 25.657996 The image_perf.py benchmark running with "AccelMethod" "XAA": ideal@dhcp-141:~/Xorg_Performance/pygtk$ ./image_perf.py Time to flip the blue overlay 500 times: 81.217041 <--- LOOK I know it is a total inconvenience to download my tarball (See URL above) and run my benchmark Python/Gtk2 utilities, but I swear you'll see exactly what I'm talking about. Using Gtk.Image to place the images and then using show_now() and hide() functions are uselessly slow. Ditto using Qt4's Qpixmap object and toggling the setVisible() function. These are probably the most common methods to display images on the two most common libraries to interface with X Window in Linux... and the performance is nearly twice as bad now! This performance regression is absolutely killing us because it we can't run the newest Intel hardware on Xorg-7.0 with acceleration working, and because of this bug the performance on Xorg 7.2+ is so poor that it makes our application virtually unusable.
So this regression is caused by XAA->EXA, instead of Xorg 7.0->7.2/7.3?
Gordon, if you are asking me, I don't know enough about X internals to make that call. I can only point out what I'm seeing when trying to run an application that worked well in Xorg-7.0 and now doesn't in Xorg-7.2 *regardless* of which acceleration method is used.
So, by running image_perf.py, Xorg 7.0 (with XAA): 17.24 Xorg 7.3 (with XAA): 81.22 Xorg 7.3 (with EXA): 25.65 It's not caused by using EXA in intel 2.2.0 release.
I guess I can take off this bug, with cworth joining.
Thanks Gordon, For performance of the Intel driver, we'll be looking to Keith Packard's recent "UXA" work, (which is EXA but with the problematic migration code removed). That will be landing shortly after which I'll reevaluate this bug. Obviously, there was a non-EXA-specific performance regression as well, here, but optimizing the XAA experience is just not interesting at this point. And thanks, Anthony, for the test case. This should be quite useful as we do further tuning. -Carl
Hello again a year and a half later. I still seeing this regression issue in the latest Xorg. Here are my versions: Hardware CPU: Intel(R) Pentium(R) M processor 1.20GHz Graphics: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller (rev 04) Software Base OS: Debian Lenny (gcc 4.3.2) Xorg: 7.4/1.6.0 from built from source Intel: 2.6.3 built from source Kernel 2.6.29.1 built from kernel.org source NOTE: The regression exists on all intel graphics hardware tested up to GM45. So I ran my image_perf.py to benchmark the image display regression again: XAA: debian:~/Xorg_Performance/pygtk$ DISPLAY=:0.0 ./image_perf.py Time to flip the blue overlay 500 times: 53.669110 EXA: debian:~/Xorg_Performance/pygtk$ DISPLAY=:0.0 ./image_perf.py Time to flip the blue overlay 500 times: 97.717825 UXA: debian:~/Xorg_Performance/pygtk$ DISPLAY=:0.0 ./image_perf.py Time to flip the blue overlay 500 times: 86.004156 Wow you actually managed to speed up XAA! Too bad I hear you guys are planning on dumping out both XAA and EXA... that really fills me with dread and despair. During the 30 minutes I spent testing this again I still noted the font display anomaly that kept us from using EXA (outside of the performance problems). Namely EXA doesn't display the true type font correctly while XAA and UXA modes both do it just fine with the same code. (This isn't part of my regression case, just something I noticed while testing) Also, the UXA mode crashed (as in kernel oops) two different times when I starting X after changing the AccelMethod from XAA or EXA. The UXA method seems to work fine after a clean reboot, but $diety help you if you run another method first and gunk up the kernel module. Now, I know this is older Intel hardware, but in most cases I would think that this would also make it better understood hardware. I guess I'm wrong. I see the same things on the box with the GM45 chipset. I just wanted my performance numbers to be on the same box from the original bug report 1.5 years ago. I don't really hold out hope any more of you guys ever fixing this (or the EXA font display issue that's been there for 1.5 years as well, or the UXA kernel oops I'm seeing today for the first time)... but I'll still include my Xorg.log files and a kernel oops for giggles.
Created attachment 25432 [details] kernel oops from messages
Created attachment 25434 [details] Xorg.log from the XAA run
Created attachment 25435 [details] Xorg.log from the EXA run
Created attachment 25437 [details] Xorg.log from the UXA run
Tested on master: Time with vesa: 7.750472 Time with UXA: 6.438378 With 2.6.3, UXA+KMS would do dri_bo_map on the screen to do the software fallback of that image (sadly, the client is using SHM pixmaps, which are very harmful if your intention is to do repeated draws of the same image). With the current code, UXA+KMS does drm_intel_bo_map_gtt, which speeds up the software fallback significantly.
Thanks for looking into this Eric. The problem that led to this bug isn't simply displaying the same image over and over. The application where we first saw this issue is a touchscreen application that doesn't use standard GUI widgets. Because the operator may be using gloves, the buttons and menus must be sized appropriately. To accomplish this the application utilizes large-ish images to provide buttons and menus in a way similar to the examples included in the test case zip file. In the application, the user is pressing a button image on a screen and expecting a menu to appear. When this application was first developed three years ago, the Xorg version in Debian Etch was very responsive to displaying overlaying images like this. To support newer hardware, it was necessary to use newer and newer kernel / Xorg versions. At one point the delay to overlay a menu was so significant that the user was left thinking the menu was not activated and pressed the button again and again. A quarter second or half-second delay may seem insignificant, but it can *really* impact the user experience. This issue affected the perception of our software and made users complain that newer versions "felt slow" compared to older version even though we could demonstrate quantitatively that the actual performance of the application was faster. This has been a sore point with the customer for over a year now and has been brought up over and over at meetings. I am glad the functions are now faster and look forward to trying out the latest version. Hopefully performance will continue to improve from now on. I would request that at least some attention be paid to support the older Intel chips (i810/i915) better. The performance is so bad now with UXA on those platforms as to be truly useless at this point. Running a Python/GTK application I can actually watch GTK buttons and widgets draw in on the screen.
We finally have a decent desktop benchmarking tool (cairo-perf-trace), and we've had major performance wins thanks to finally being able to quantify outside of microbenchmarks. I've got a 10% win in firefox queued to land post 2.8 that will affect some general GTK rendering as well. Sorry for the rough times -- a lot of it has been due to rewriting the whole stack, and I think we're set up to continue seeing performance wins at this point as things have settled down. However, we still need someone to work on the GTK side to fix it to not use SHM pixmaps for reused images.
Anthony, the benchmarking tool you pointed to earlier has gone. I would very much like to verify that we perform reasonably for your use case. Hopefully better late than never!
Wow! Thanks for the follow-up, no matter how late it is. I actually found the old tarball (God, I'm such a packrat...) and stuck it back in the previous link location. I was very happy to see that at least the pygtk code still just works (under current Debian Unstable). The qt4 code may need some tweaks, but I can't compile/run on my box right now. When I get a sec, I'll grab those old platforms and run apples-to-apples numbers on the current software load (based on Squeeze currently with a planned upgrade to Wheezy late this year). Thanks again!
This is what I currently see on t61, the most recent machine supported by UMS/XAA I have. (It's a bad choice of machine for a variety of other reasons though :-p) crestline (965gm) image_perf.py ./pixmap_perf.py ./image_perf.py etch (baseline 915): 25.002718 17.241337 lenny (baseline 915): 12.012802 25.642715 xaa: 0.513530 6.157524 exa: 0.443099 15.865394 uxa: 0.433677 12.950995 sna: 0.397831 8.594444 xaa/exa on xorg-1.5 with -intel-2.6 uxa/sna on xorg-1.12 with -intel-2.20 Judging by that I still have a small but still significant regression from the UMS/XAA heyday. Can you share any recent results from your baseline machine?
Ok, the result for sna is actually bistable depending upon the order of execution (migration heuristics at play), it oscillates between: 6s and 8s for image_perf.py. So the potential to perform as well as xaa is hidden in there...
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.