Bug 26599 - torcs performance regression (git 47136fa)
Summary: torcs performance regression (git 47136fa)
Status: RESOLVED DUPLICATE of bug 28771
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/Radeon (show other bugs)
Version: 7.5 (2009.10)
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: xf86-video-ati maintainers
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-02-16 20:40 UTC by Dave Witbrodt
Modified: 2010-08-22 21:37 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg.0.log (27.67 KB, text/plain)
2010-02-16 20:41 UTC, Dave Witbrodt
no flags Details

Description Dave Witbrodt 2010-02-16 20:40:30 UTC
Overview:

I recently built two packages from the xf86-video-ati git tree to
update my system:

    xf86-video-ati-6.12.99+git20100205.4f9d171  [Feb. 5]
    xf86-video-ati-6.12.99+git20100215.47136fa  [Feb. 15]

Using xorg-server builds of 1.7.4.902 and 1.7.5, I have found a serious
regression in performance:  using the game 'torcs', which features a
frame rate performance indicator in the upper right corner, my average
frame rate drops from ~50 fps to ~20 fps.  This may not be a reliable
benchmark, but game play clearly becomes sluggish and the poor frame rate
is quite visible and obvious.

  Steps to reproduce:

1.  Install 'torcs', build and install radeon from xf86-video-ati at
commit 4f9d171, and play the game observing the reported frame rate.

2.  Build and install radeon from xf86-video-ati at commit 47136fa, and
play the game observing the reported frame rate.

  Actual results:

Performance on my particular system is cut back by 1/2 to 2/3 from the
Feb. 5 version to the Feb. 15 version.  This behavior is 100% reproducible.

  Expected results:

Performance should have been the same -- or even better, given the two
optimization commits by Pauli Nieminen (78e7047 and 3ec25e5).


  System info:

GPU:
    GIGABYTE GV-R485MC-1GI Radeon HD 4850 (RV770)
    1GB VRAM, 256-bit GDDR3
    PCI Express 2.0 x16

Kernel + architecture:  [uname -r -m]
    2.6.33-rc8-git.100213.desktop.kms  x86_64

Linux distribution:
    Debian unstable

Machine:  self-built
    MSI 790FX-GD70 motherboard
        socket AM3
        AMD 790FX and SB750 Chipset
    OCZ OCZ3P1600EB4GK 2x2GB DDR3 1600
    AMD Phenom II X4 955

Software versions:
    xf86-video-ati:     [see above]

    mesa:               mesa-7.7+100211-git04d3571       [from git]

    libdrm:             libdrm-2.4.17-1-dw+2.6.33-rc8    [from Debian git repo]

    xorg-server:        xorg-server-1.7.4.902-0upstream  [from tarball]
                        xorg-server-1.7.5-0upstream      [from tarball]
                        xserver-xorg-core-2:1.7.5-1      [from Debian unstable]

    torcs:              1.3.1-2                          [from Debian unstable]


  Additional Information:

I actually updated the "radeon" driver and the X server to the following versions at the same time:

    xf86-video-ati-6.12.99+git20100215.47136fa
    xorg-server-1.7.5-0upstream

I wasn't sure which package (or both) was causing the problem.  I tried each
combination of old/new DDX driver with old/new X server, and found that the
X server version had little or no effect on performance with the Feb. 5 DDX,
but both (1.7.4.902 & 1.7.5) X servers performed miserably with the Feb. 15
DDX.

I don't actually play games much, except to test the performance of new
hardware and software.  Since November 2009, with each update of kernels,
mesa packages, X server packages, and DDX packages, I have tested using a long
(and growing) list of 3D games -- looking for crashes, bugs, and/or performance
problems; this is part of my own testing of Linux support for HD 4850 (RV770),
which I purchased last Fall.  In this case, it happens to be 'torcs' that
best reveals the difference between versions of the "radeon" driver.

I would like to bisect this with git, since there are only 6 new commits
between these two versions of xf86-video-ati.  Unfortunately I have to work
for the next few days.  I may get a chance to try a bisect on Friday, though
maybe someone else will acknowledge the problem and provide a fix before then.

Wasn't sure what to set for "Severity" and "Priority," so I left them at the
Bugzilla defaults.
Comment 1 Dave Witbrodt 2010-02-16 20:41:15 UTC
Created attachment 33350 [details]
Xorg.0.log
Comment 2 Alex Deucher 2010-02-16 23:22:24 UTC
This is caused by the new vline support.  Frame rates have dropped due to the new vline support which prevents tearing.  I suppose we could add an option to disable it.
Comment 3 Michel Dänzer 2010-02-17 00:33:41 UTC
Probably better would be to implement DRI2 page flipping support, which should allow eliminating tearing with little if any performance impact (at least with triple buffering), at least for fullscreen apps.
Comment 4 Dave Witbrodt 2010-02-18 20:31:44 UTC
I made a local branch in git off of the vline commit, reverted it in the branch, and merged master onto the branch.  Performance is restored, plus everything seems a little bit faster than before!  (Maybe Pauli's stuff causing that?  Or maybe hallucinating....)  

From what I understand about vline, it's something I want.  Is the performance hit something that an app (like 'torcs') could find a workaround for, or is the hit unavoidable?

For now I will keep merging updates to my "no-vline" branch.
Comment 5 Alex Deucher 2010-02-18 20:37:49 UTC
(In reply to comment #4)
> I made a local branch in git off of the vline commit, reverted it in the
> branch, and merged master onto the branch.  Performance is restored, plus
> everything seems a little bit faster than before!  (Maybe Pauli's stuff causing
> that?  Or maybe hallucinating....)  
> 

His changes only affect Xv.

> From what I understand about vline, it's something I want.  Is the performance
> hit something that an app (like 'torcs') could find a workaround for, or is the
> hit unavoidable?

It avoids tearing on GL buffers swaps by waiting until the scanout is past the part of the screen being updated.
Comment 6 Dave Witbrodt 2010-03-09 20:04:04 UTC
Updating this report with more recent experiences:

On Mar. 8 I upgraded from Mesa 7.8 (git be1b7d1) to Mesa 7.9-devel (git 3ca9336), and notice an all around performance increase.  Immediately following that, I built, installed, and booted the newly-released 2.6.34-rc1 kernel.

Just booting that new kernel to a command-line-only runlevel revealed a noticeable performance improvement.  Running X revealed that the performance gains were across the board:  everything was running faster.

I've been building my own xf86-video-ati packages with the "vline" feature reverted, but with these performance gains I decided to test it again.

I built and ran radeon from git commit 3a44f1c (Mar. 9) in two versions -- one with "vline" reverted, and one with "vline" included.  As before, the "vline" version is perceivably more sluggish than the version with that feature reverted.  However, the most dramatic differences in my original post here were seen in the game 'torcs'; when I tried that game again, it was playable this time (last time, the sluggishness made it nearly impossible to play).  Interestingly, I reported before that the 'torcs' frame rate seemed to be capped at 30 fps (half my monitor's vert refresh rate), but this time the game was often able to jump into the 50 fps range for short bursts.

Other apps I've been testing with, besides 'torcs', are also affected negatively... but less noticeably so.

The "vline" feature itself definitely works as intended.  No cutting/tearing ever occurs.  I wasn't having bad problems with tearing without "vline," but in games like 'torcs' (and even 'prboom') the difference is noticeable.  I definitely prefer the "vline" version for clarity, but the "no-vline" version still outperforms it enough that I'll be sticking with my git branch with the reversion for a little while longer.

I am using quite fast hardware here, Radeon HD 4850, so I wouldn't be surprised if less powerful cards were impacted more seriously.
Comment 7 Dave Witbrodt 2010-04-01 21:25:40 UTC
Last tried to test performance status on 3/23, but #27284 caused me to cancel testing until it was fixed.

Today I saw that the fix had arrived for "radeon," and that the latest DRM work was accepted into the kernel.  I am currently running:

linux:           2.6.34-rc3-git+100401.42be79e
libdrm:          2.4.18-2+git100315.a88e94d
mesa:            7.9-devel+git100310.40adcd6
xf86-video-ati:  6.12.192+git100401.476a1c6

Performance on many applications has actually improved significantly since 6.12.192 was released in mid-March.  The vline feature still caps the framerate of 'torcs' at 30 fps most of the time, sometimes slipping into the low 20's, but the playability of the game is no longer as seriously affected as it was when I first opened this bug report in February.

Vline itself works great and no cutting of frames occurs, though I still keep a local git branch with vline removed so I can compare versions of the driver with and without it.
Comment 8 AttilaN 2010-05-08 06:04:02 UTC
After upgrading to Ubuntu Lucid, fullscreen video playback is slow/choppy. A git bisect identified commit 78e7047 ("Allocate Xv buffers to GTT.") as the culprit. Reverted that one on the master branch and video playback was smooth again.

However, if I do other, seemingly unrelated stuff like start (and close) Google Earth, or log in to & out of a guest session, then play the video again, fullscreen playback is bad again.

Perhaps someone can explain?

(not sure if it matters, I'm on the 2.6.34-rc6 kernel)
Comment 9 AttilaN 2010-05-08 07:14:50 UTC
(In reply to comment #8)
> However, if I do other, seemingly unrelated stuff like start (and close) Google
> Earth, or log in to & out of a guest session, then play the video again,
> fullscreen playback is bad again.

I'm seeing this behavior on 2.6.33/34 kernels but not on 2.6.32.
Also, a weird workaround is that I play a HD video (50 fps), after which the other, regular video plays fine again, fullscreen.
Comment 10 Michel Dänzer 2010-05-11 03:07:59 UTC
(In reply to comment #8)
> After upgrading to Ubuntu Lucid, fullscreen video playback is slow/choppy.

This report is about torcs. Please track your problem in another report or just discuss it on the xorg-driver-ati mailing list.

P.S. Dave, can this report be resolved per comment #7?
Comment 11 Dave Witbrodt 2010-05-12 18:12:32 UTC
[Hmmm, I replied via email yesterday, but apparently that didn't work... since nothing appeared here in 24 hours.]

For the record, this bug was about the "vline" feature in the radeon driver, and 'torcs' was merely the metric being used to identify and measure the impact it had on performance.

Regarding closing the bug:  I assumed that the developers would close the "bug" after Alex identified (in comment #2) that the problem was caused by vline... and it was actually a feature, not a bug.

When the bug was NOT closed by any developer, I assumed (maybe wrongly?) that the impact of vline (versus radeon with vline removed) was still of some interest.  As a result, I began reporting my results once or twice a month of testing my latest radeon update from git with and without the vline feature.


In hindsight, I think this was not a bug; it was true that when vline was first introduced that the 'torcs' track I use for testing went from playable to unplayable.  Vline does cause a decrease in max frame rate, but something else got fixed (who knows what?  DRM?  Mesa?) which rendered 'torcs' playable even with vline.


I sympathize with AttilaN, though:  one of the programs I use for testing OpenGL performance (prboom) has suffered a horrible performance regression since January or February.  The problem is, I've only been testing 'prboom' using one map which doesn't expose the regression.  I accidentally played another map a few days ago -- previously perfectly playable -- which was now experiencing performance issues that made it difficult or impossible to play.

Wanting to report a bug, I tried a great variety of older packages (from Debian, and locally built from git) -- Linux kernels/DRM, radeon drivers, Mesa libraries, libdrm, X servers -- going as far back as the beginning of March.  I was not able to find a combination of old software that made the performance regression disappear.

At this point, I'm totally baffled:  I don't know when the problems were introduced, or in what software.  I also just discovered that a different track in 'torcs' only allows me 5 frames per second (on HD 4850 hardware!), even with vline removed from radeon.  I'm afraid I can't locate the cause of these regressions, and for now have given up trying.  I guess all I can do is hope the developers are seeing the same problems, and are able to find fixes or workarounds.

None of this has much to do with vline, though.  If a developer closes this bug, I will understand.  If not, I will continue to report here what performance differences I'm seeing in the radeon driver with and without the vline feature.


Thanks,
Dave W.
Comment 12 Dave Witbrodt 2010-08-22 04:21:30 UTC
Bug #28771 seems to be about the same issue.  If they are the same, can this bug be merged with that one?

BTW, subsequent performance improvements allowed me to use vline, and I am no longer maintaining a local branch of xf86-video-ati with the vline feature reverted.  (I was doing so only for testing purposes, but haven't updated my novline branch for months.)

If a merge is not appropriate, feel free to close this bug -- as Michel Dänzer suggested.
Comment 13 Alex Deucher 2010-08-22 21:37:56 UTC

*** This bug has been marked as a duplicate of bug 28771 ***


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.