Created attachment 137456 [details] test case After updating Fedora from mesa-17.2.4 to 17.3.5 I noticed throughput in XPutImage/XShmPutImage based workloads dropped significantly. I'd noticed this before with a self-compiled version of mesa-18.0-rc, but thought this had something to do with the chosen compile flags. One test went from 40fps to 15fps (very small XPutImage requests immideatly followed by XRenderComposite), while other degraded by about 30%. System details: * AMD Kaveri 7650k * 4k + FullHD displays * linux 4.15.3 * radeon kernel driver How to test: Run the attached java program and enable the "antialising" checkbox: java -jar JGears2.jar
currently bisecting...
The commit causing this regression is: 8b3a257851905ff444d981e52938cbf2b36ba830 is the first bad commit commit 8b3a257851905ff444d981e52938cbf2b36ba830 Author: Marek Olšák <marek.olsak@amd.com> Date: Tue Jul 18 16:08:44 2017 -0400 radeonsi: set a per-buffer flag that disables inter-process sharing (v4) For lower overhead in the CS ioctl. Winsys allocators are not used with interprocess-sharable resources. v2: It shouldn't crash anymore, but the kernel will reject the new flag. v3 (christian): Rename the flag, avoid sending those buffers in the BO list. v4 (christian): Remove setting the kernel flag for now Reviewed-by: Marek Olšák <marek.olsak@amd.com> :040000 040000 b775b6b0ea5b971d2165a644ea8912c120f54431 2e4b2737f37ede2bbdbbe6815fe0fa562177c2b7 M src x11per -putimage10 regressed from 75 kOps/s to 22 kOps/s after this patch running Xephyr with vblank disabled. before: 400000 reps @ 0.0130 msec ( 77000.0/sec): PutImage 10x10 square after: 120000 reps @ 0.0457 msec ( 21900.0/sec): PutImage 10x10 square
Thanks for the bisection Clemens. For the future feel free to add the commit author/reviewer in the CC list. It should help flag the issue amongst the dozens of others.
just some unrelated, interesting numbers: Sync time adjustment is 0.0355 msecs. 8000000 reps @ 0.0012 msec (816000.0/sec): ShmPutImage 10x10 square 8000000 reps @ 0.0012 msec (818000.0/sec): ShmPutImage 10x10 square These are the results achieved by a Geforce-8800GTS (11 years old dGPU) using the proprietary driver in the same system. Confirms my subjective experience - the glamor based open-source driver stack is really slow for some operations. It seems the proprietary nvidia driver has way lower driver overhead (considering the 10x10 putimage won't saturate the GPU).
Marek, any ideas? My Polaris 20 is somewhat faster, but by no means like Nvidia blob. git revert xxx do NOT work, clean. Someone on Phoronix mentioned that fglrx was even much faster then Mesa git before your commit.
Strange, after tinkering around with my system, I cannot reproduce the issue anymore. Even with Mesa-17.3.x x11perf -shnmput10 is now at ~70-80kOps/s - so maybe it was a configuration issue that was somehow triggered by the commit in question? This still leaves the question to be answered, how/why the nvidia blob can be magnitudes faster for XPutImage based workloads.
(In reply to Clemens Eisserer from comment #6) > This still leaves the question to be answered, how/why the nvidia blob can > be magnitudes faster for XPutImage based workloads. If somebody wants to improve this, the place to start is probably glamor rather than the drivers.
> If somebody wants to improve this, > the place to start is probably glamor rather than the drivers. I wonder, what could glamor do better (especially for small uploads) than call into glTexSubImage2D?
So, shmput10 is now equally fast with Mesa-17.3.6 and Mesa-27.2.4 - however the real-world workload still suffers. Please have a look at http://93.83.133.214/downloads/JXRenderMark-1.0.1.zip - it is a stand-alone benchmark which emulates the XRender sequences generated by the Java XRender backend. CentOS-7 + updates (Mesa 17.0.1): ./render 3 32 3 32 3 32 3 32 3 32 3 32 3 32 18621.335408 Ops/s; put composition (!); 32x32 18901.781304 Ops/s; put composition (!); 32x32 18903.572785 Ops/s; put composition (!); 32x32 Fedora 27 + updates (Mesa 17.3.6): ./render 3 32 3 32 3 32 3 32 3 32 3 32 3 32 [ce@localhost temp]$ ./JXRenderMark-1.0.1 3 32 3 32 3 32 3 32 3 32 3 32 6938.738245 Ops/s; put composition (!); 32x32 6825.050537 Ops/s; put composition (!); 32x32 6955.692404 Ops/s; put composition (!); 32x32 So there it is ... the slowdown of factor 2,5 :/
I bisected the regression again, this time with the benchmark mentioned in the post above (JXRenderMark) and I was agin led to the following commit: [ce@localhost mesa]$ git bisect good 8b3a257851905ff444d981e52938cbf2b36ba830 is the first bad commit commit 8b3a257851905ff444d981e52938cbf2b36ba830 Author: Marek Olšák <marek.olsak@amd.com> Date: Tue Jul 18 16:08:44 2017 -0400 radeonsi: set a per-buffer flag that disables inter-process sharing (v4) So regardless of different manifestations, this commit seems to introduce regressions for antialiased rendering using the Xrender Java2D backend.
https://patchwork.freedesktop.org/patch/210907/ helps for this benchmark with the r600 driver, but radeonsi already has the same code... Clemens, are you still seeing the problem with current Mesa Git master?
(In reply to Clemens Eisserer from comment #10) > I bisected the regression again, this time with the benchmark mentioned in > the post above (JXRenderMark) and I was agin led to the following commit: > > [ce@localhost mesa]$ git bisect good > 8b3a257851905ff444d981e52938cbf2b36ba830 is the first bad commit > commit 8b3a257851905ff444d981e52938cbf2b36ba830 > Author: Marek Olšák <marek.olsak@amd.com> > Date: Tue Jul 18 16:08:44 2017 -0400 > > radeonsi: set a per-buffer flag that disables inter-process sharing (v4) > > > So regardless of different manifestations, this commit seems to introduce > regressions for antialiased rendering using the Xrender Java2D backend. 8b3a257851905ff444d981e52938cbf2b36ba830 indeed regressed performance, but it was fixed later. The regression is not reproducible with branches 17.3, 18.0, and master.
For my kaveri-system I got the following numbers (composition manager disabled, Xephyr): ./JXRenderMark-1.0.1 3 32 3 32 3 32 #amdgpu, IOMMU enabled 12325.077581 Ops/s; put composition (!); 32x32 # mesa-17.2.4 self-compiled 10582.511406 Ops/s; put composition (!); 32x32 # mesa-17.3.6, fedora 27, updates repo 8636.834555 Ops/s; put composition (!); 32x32 # mesa-18.1.0-devel self-compiled #radeon, IOMMU enabled 12060.500868 Ops/s; put composition (!); 32x32 # mesa-17.2.4, self-compiled 6330.459659 Ops/s; put composition (!); 32x32 # mesa-17.3.6, fedora 27, updates repo 6100.570157 Ops/s; put composition (!); 32x32 # mesa-18.1.0-devel self-compiled So amdgpu didn't regress as badly as radeon, but performance is constantly decreasing.
Can you test this patch? https://patchwork.freedesktop.org/patch/210920/
I can't hardly see any changes.(In reply to Marek Olšák from comment #14) > Can you test this patch? > https://patchwork.freedesktop.org/patch/210920/ I see hardly any changes with radeonsi on RX 580.
some here, on my Kaveri 7650k results with the patch are basically unchanged : amdgpu: 8557.942992 Ops/s; put composition (!); 32x32 should I test with radeon too? Dieter: Just to be curious, which values do you obtain with your polaris GPU?
(In reply to Clemens Eisserer from comment #16) > some here, on my Kaveri 7650k results with the patch are basically unchanged > : > > amdgpu: > 8557.942992 Ops/s; put composition (!); 32x32 > > should I test with radeon too? > > Dieter: Just to be curious, which values do you obtain with your polaris GPU? RX580 (DC enabled) 'cpupower frequency-set -g performance' composit (faster): !!! ;-) ./JXRenderMark-1.0.1 3 32 3 32 3 32 3 32 3 32 3 32 29845.626072 Ops/s; put composition (!); 32x32 30745.957643 Ops/s; put composition (!); 32x32 30922.973502 Ops/s; put composition (!); 32x32 30460.302141 Ops/s; put composition (!); 32x32 30330.232018 Ops/s; put composition (!); 32x32 30757.257217 Ops/s; put composition (!); 32x32 without (slower): 28507.546115 Ops/s; put composition (!); 32x32 29570.588821 Ops/s; put composition (!); 32x32 29909.051450 Ops/s; put composition (!); 32x32 29839.934108 Ops/s; put composition (!); 32x32 30024.853684 Ops/s; put composition (!); 32x32 29852.673826 Ops/s; put composition (!); 32x32
(In reply to Dieter Nützel from comment #17) > (In reply to Clemens Eisserer from comment #16) > > some here, on my Kaveri 7650k results with the patch are basically unchanged > > : > > > > amdgpu: > > 8557.942992 Ops/s; put composition (!); 32x32 > > > > should I test with radeon too? > > > > Dieter: Just to be curious, which values do you obtain with your polaris GPU? > > RX580 (DC enabled) 'cpupower frequency-set -g performance' > > composit (faster): !!! ;-) > ./JXRenderMark-1.0.1 3 32 3 32 3 32 3 32 3 32 3 32 > 29845.626072 Ops/s; put composition (!); 32x32 > > 30745.957643 Ops/s; put composition (!); 32x32 > > 30922.973502 Ops/s; put composition (!); 32x32 > > 30460.302141 Ops/s; put composition (!); 32x32 > > 30330.232018 Ops/s; put composition (!); 32x32 > > 30757.257217 Ops/s; put composition (!); 32x32 > > without (slower): > 28507.546115 Ops/s; put composition (!); 32x32 > > 29570.588821 Ops/s; put composition (!); 32x32 > > 29909.051450 Ops/s; put composition (!); 32x32 > > 29839.934108 Ops/s; put composition (!); 32x32 > > 30024.853684 Ops/s; put composition (!); 32x32 > > 29852.673826 Ops/s; put composition (!); 32x32 This was with Marek's patch from Comment 14.
Possibly related problem on r300 code paths: https://bugs.freedesktop.org/show_bug.cgi?id=110781 https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/1099745-how-to-tell-if-a-driver-is-gallium-or-just-mesa-slow-renderng-with-radeon/page10 https://bbs.archlinux32.org/viewtopic.php?pid=5973#p5973 It took me a whole lot of time to analyse the source of the problem and this is the commit that causes slowdown for me too. For me it was really useful to do an strace before and after this commit and I find the GEM_CREATE numbers rise from around 7-11 to about thousands when just doing 10 seconds of glxgears which is clearly wrong and causes my slowdown. Maybe would be useful to test if the problem is also related to GEM/TTL in this case? Just informing you because I have found this earlier issue when googling the commit hash... prenex
Hi Richard, Unfortunatly there was very little interest in tackling the issue itself, despite bisecting it was real pain. For me the problem was "fixed" by switching to amdgpu, a luxury the r300/r600 code paths don't have - so I guess the report is still valid. Thanks for re-opening it.
Let's assume this is the same as bug 110781, which is now fixed. *** This bug has been marked as a duplicate of bug 110781 ***
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.