Bug 7771 - Insanely slow startup on Radeon X300 SE
Summary: Insanely slow startup on Radeon X300 SE
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/Radeon (show other bugs)
Version: 7.0.0
Hardware: x86 (IA32) Linux (All)
: high normal
Assignee: xf86-video-ati maintainers
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-08-04 13:34 UTC by Alon Ziv
Modified: 2007-08-23 03:57 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
xorg.conf (2.90 KB, text/plain)
2006-08-06 13:30 UTC, Alon Ziv
no flags Details
Latest Xorg.log (55.94 KB, text/plain)
2006-08-06 13:32 UTC, Alon Ziv
no flags Details
Sane looping patch (1.06 KB, patch)
2007-07-22 10:18 UTC, Alon Ziv
no flags Details | Splinter Review
Sane looping patch, take 2 (1.99 KB, patch)
2007-07-24 11:52 UTC, Alon Ziv
no flags Details | Splinter Review
Take 3 (1.99 KB, patch)
2007-07-25 05:55 UTC, Alon Ziv
no flags Details | Splinter Review
Take 3, for real this time :-/ (1.65 KB, patch)
2007-07-26 08:57 UTC, Alon Ziv
no flags Details | Splinter Review
Take 4 (3.25 KB, patch)
2007-07-28 00:43 UTC, Alon Ziv
no flags Details | Splinter Review
Take 5: this actually works :-/ (3.70 KB, patch)
2007-07-30 12:49 UTC, Alon Ziv
no flags Details | Splinter Review

Description Alon Ziv 2006-08-04 13:34:27 UTC
X.org startup on my card is insanely slow--it spends around 20 seconds in
RADEONRestoreMemMapRegisters() (between the "Map Changed !" and "Map applied"
messages).

System is Ubuntu Dapper; the ATI driver version is 1:6.5.7.3-0ubuntu7.

Display controller is identified by "lspci -v" as

0000:01:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60 [Radeon
X300 (PCIE)] (prog-if 00 [VGA])
        Subsystem: Unknown device 196d:1070
        Flags: bus master, fast devsel, latency 0, IRQ 5
        Memory at d8000000 (32-bit, prefetchable) [size=128M]
        I/O ports at ac00 [size=256]
        Memory at feaf0000 (32-bit, non-prefetchable) [size=64K]
        Expansion ROM at fea00000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 2
        Capabilities: [58] #10 [0001]
        Capabilities: [80] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-

0000:01:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE]
        Subsystem: Unknown device 196d:1071
        Flags: bus master, fast devsel, latency 0
        Memory at feae0000 (32-bit, non-prefetchable) [disabled] [size=64K]
        Capabilities: [50] Power Management version 2
        Capabilities: [58] #10 [0001]
Comment 1 Erik Andren 2006-08-06 10:49:17 UTC
Please post your xorg.conf and xorg log.
Do you have the same problem if you use the newest version of the ati xorg
driver (6.6.1)? 
Comment 2 Alon Ziv 2006-08-06 13:30:45 UTC
Created attachment 6476 [details]
xorg.conf

My current xorg.conf (redirecting ATI drivers to home directory)
Comment 3 Alon Ziv 2006-08-06 13:32:25 UTC
Created attachment 6477 [details]
Latest Xorg.log

Latest Xorg.log

Note that the ATI driver is the bog-standard version 6.6.0--it incorrectly
identifies itself as version 6.5.7
Comment 4 Alon Ziv 2006-08-06 13:36:29 UTC
(In reply to comment #1)
> Do you have the same problem if you use the newest version of the ati xorg
> driver (6.6.1)? 

I cannot build version 6.6.1 of the driver with my current version of X (7.0.0),
apparently due to API changes related to xf86_libc.h.
Comment 5 Michel Dänzer 2006-08-06 14:37:39 UTC
(In reply to comment #4)
> I cannot build version 6.6.1 of the driver with my current version of X (7.0.0),
> apparently due to API changes related to xf86_libc.h.

Current xf86-video-ati git should build and work with 7.0. If it doesn't, you
could try the ati-1-0-branch. If the problem persists, please try to isolate
where exactly it's spending so much time by adding additional output between the
lines you mentioned.
Comment 6 Alon Ziv 2006-08-07 10:10:46 UTC
(In reply to comment #5)
> Current xf86-video-ati git should build and work with 7.0.
Well, it doesn't.  Dave Aielie's latest cleanup patches (commit 58c6aac0) broke
it; I succeeded in compiling the latest by reverting some of these (removing
some extra #include's and re-introducing the #undef's at the head of
theatre_detect.c).

Should I open a separate bug on this?  (I don't think so, but...)


And now to actually testing this latest driver...
Comment 7 Alon Ziv 2006-08-07 12:04:47 UTC
(In reply to comment #5)
> If the problem persists, please try to isolate
> where exactly it's spending so much time by adding additional output between the
> lines you mentioned.

OK, done that...
The culprit is RADEONWaitForVerticalSync2().  I do not have a second monitor (as
correctly identified by the driver--the log lists "Monitor -- NONE" for the
secondary), and yet this function waits for its vsync until it times out--2000
calls to usleep(1).  And given that my kernel is configured with HZ=250, every
such usleep(1) appears to end up using two time slices of 4ms each...  Resulting
in a total wait of 16 seconds.

I am not sure how to correctly solve this; as an experiment, I have commented
out this call, and indeed the radeon driver's startup time went down to 1.5
seconds (total X startup time went down to ~5 seconds).
Comment 8 Michel Dänzer 2006-08-16 02:17:17 UTC
I'm afraid it's not so simple; this is used to make sure the secondary display
controller no longer makes any requests to the memory controller, which could
theoretically happen even if no secondary display is actually being used. If we
can't come up with a better test for the secondary controller triggering
vertical blank events, the loop should probably time out after a fixed amount of
time, say 100ms.
Comment 9 Daniel Stone 2007-02-27 01:33:07 UTC
Sorry about the phenomenal bug spam, guys.  Adding xorg-team@ to the QA contact so bugs don't get lost in future.
Comment 10 Alon Ziv 2007-07-22 10:17:05 UTC
OK, I'm finally back to hacking on this bug...

I am attaching a different take on the fix: this time, I simply change the loop in RADEONWaitForVerticalSync2 so it actually waits only RADEON_TIMEOUT microseconds.  The older code assumed a usleep(1) really sleeps for 1us, which is very wrong--the minimum actual sleep period on Linux appears to be around 2ms.

IMO all wait loops in this code (and elsewhere...) should change to this kind of waiting; as they stand they all wait for far longer than designed (or desired).
Comment 11 Alon Ziv 2007-07-22 10:18:22 UTC
Created attachment 10834 [details] [review]
Sane looping patch
Comment 12 Michel Dänzer 2007-07-23 03:33:57 UTC
+        if (tv2.tv_sec - tv.tv_sec > RADEON_TIMEOUT/1000000 ||
+            (tv2.tv_sec - tv.tv_sec == 0 && tv2.tv_usec > tv.tv_usec))

The second part of this would seem to generally pass as soon as tv2.tv_usec != tv.tv_usec. It might be better to compute the usec difference first and then test that.
Comment 13 Alon Ziv 2007-07-24 11:52:28 UTC
Created attachment 10871 [details] [review]
Sane looping patch, take 2

OK, this version of the patch abstracts timeout handling a bit better...

I also changed the timeout to 200ms, instead of 2 seconds.  IMO even this is too much (I can't see any hardware taking more than ~10ms to respond), but whatever.
Comment 14 Michel Dänzer 2007-07-25 01:52:20 UTC
(In reply to comment #13)
> OK, this version of the patch abstracts timeout handling a bit better...

I'm afraid it still doesn't handle tv_usec wraparounds properly though. There are also some minor cosmetic concerns such as using uppercase for the names of proper functions.

> I also changed the timeout to 200ms, instead of 2 seconds.  IMO even this is
> too much (I can't see any hardware taking more than ~10ms to respond), but
> whatever.

In this particular case, waiting for vertical blank can take up to 16 ms (at 60 Hz refresh rate). Other cases such as waiting for engine idle might legitimately take much longer. I'd suggest passing the timeout value as an argument to radeon_timedout() (and using an internal struct timeval in both functions so not every caller has to provide its own).
Comment 15 Alon Ziv 2007-07-25 05:50:51 UTC
I believe you are wrong regarding wraparound handling: wraparound is handled in RADEON_INIT_TIMEOUT, and the following comparison (in RADEON_TIMEDOUT) is modeled exactly after the timercmp #define from /usr/include/sys/time.h.

Re usage of an internal timeval value: it would need to be a global variable, as it is shared between the two functions.  Such a change will also break the possibility to use two timeout values in parallel (e.g. nested).  IMO it is a bad idea.

I am attaching a further-cleaned-up patch.
Comment 16 Alon Ziv 2007-07-25 05:55:34 UTC
Created attachment 10882 [details] [review]
Take 3
Comment 17 Michel Dänzer 2007-07-26 00:35:17 UTC
(In reply to comment #15)
> I believe you are wrong regarding wraparound handling: wraparound is handled in
> RADEON_INIT_TIMEOUT, and the following comparison (in RADEON_TIMEDOUT) is
> modeled exactly after the timercmp #define from /usr/include/sys/time.h.

Ah yes, I see now.

> Re usage of an internal timeval value: it would need to be a global variable,
> as it is shared between the two functions.  

That shouldn't be a problem as long as the timeout is handled within a single driver entrypoint, as the X server is single threaded.

> Such a change will also break the possibility to use two timeout values in
> parallel (e.g. nested).

Is there any existing such case, or any expectation it'll be useful in the future?

OTOH, I guess leaving the struct timeval argument and passing in a global or RADEONInfoRec member by default wouldn't be too bad.

> I am attaching a further-cleaned-up patch.

I don't see any difference to take 2, did you upload the right file?
Comment 18 Alon Ziv 2007-07-26 08:57:13 UTC
Created attachment 10888 [details] [review]
Take 3, for real this time :-/

Ouch...  I goofed with the previous attachment.  This is the real take 3.
Comment 19 Alon Ziv 2007-07-28 00:43:02 UTC
Created attachment 10909 [details] [review]
Take 4

Further cleanups: RADEONWaitForVerticalSync() and RADEONWaitForVerticalSync2()  are now handled with the same logic.  Also documented the rationale behind the patch and added a signoff line (I know, X.org doesn't use these...  But I do.)
Comment 20 Michel Dänzer 2007-07-30 01:42:57 UTC
Looks good, I'd like to push it but there are still some minor issues:

* The minimum actual sleep time probably depends on CONFIG_HZ and related factors.
* The timeout value in RADEONWaitForVerticalSync() is missing a digit. Might be worth using a define such as RADEON_VSYNC_TIMEOUT to avoid mistakes like this.
* In standard cases like this, I still think it would be better to use a global struct timeval or one in RADEONInfoRec instead of adding a struct timeval declaration for each caller.

Do you want to do another spin, or are you fine with me pushing with these fixed?
Comment 21 Alon Ziv 2007-07-30 08:03:48 UTC
Ah...  I would better leave it in your capable hands now; it's probably easier for you, being more familiar with this code base.

After all, I have proved my point, in the proper open-source way (i.e., with code) :)
Comment 22 Alon Ziv 2007-07-30 12:49:25 UTC
Created attachment 10918 [details] [review]
Take 5: this actually works :-/

Take 4 didn't actually work (I found out I didn't install it, by accident).  Spelling error :-(

This one does a few other cleanups as well, e.g. by using symbolic constants for the timeout values.  I still use a separate timeval struct, though; I detest globals too much.
Comment 23 Alon Ziv 2007-08-11 07:34:42 UTC
ping?
Comment 24 Michel Dänzer 2007-08-23 03:20:30 UTC
Pushed (with some cosmetical changes to the log and a compile warning fix) to GIT master and 6.7 branch, thanks.
Comment 25 Alon Ziv 2007-08-23 03:45:58 UTC
Am I imagining things, or is the fix in the committed version broken?

+#include <sys/time.h>          /* For
+#include <time.h>               * gettimeofday() */

The second #include is entirely commented out...
Comment 26 Michel Dänzer 2007-08-23 03:57:13 UTC
Right, so the #include <time.h> seems superfluous... I've removed it on the master branch.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.