Bug 6991 - ppracer "doing" level locks r300 after drawing n frames
ppracer "doing" level locks r300 after drawing n frames
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/r300
x86 (IA32) Linux (All)
: high critical
Assigned To: Default DRI bug account
Depends on:
  Show dependency treegraph
Reported: 2006-05-22 05:01 UTC by Aapo Tahkola
Modified: 2009-08-24 12:23 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:

r300 initialization voodoo (14.24 KB, text/x-csrc)
2006-06-24 08:01 UTC, Jerome Glisse
R300 dynamic clock initialization (2.06 KB, text/x-c)
2006-06-26 07:15 UTC, Jerome Glisse
Log for a minimal X session with a ppracer hang. (41.25 KB, text/plain)
2006-06-26 16:59 UTC, multinymous
xorg.conf for above log. (1.45 KB, text/plain)
2006-06-26 17:00 UTC, multinymous
Badly rendered glxgears (for comment 19). (47.95 KB, image/jpeg)
2006-06-27 07:48 UTC, multinymous

Note You need to log in before you can comment on or make changes to this bug.
Description Aapo Tahkola 2006-05-22 05:01:25 UTC
Reproducible with earlier versions?
Comment 1 multinymous 2006-06-23 21:15:18 UTC
This problem is reproducible on my machine with latest CVS/GIT versions of Mesa,
X.org and DRM.

Reverting Mesa to CVS as of 2006-04-05 fixes the ppracer hang, but the hangs
persist with other applications (Google Earth).
Comment 2 Jerome Glisse 2006-06-24 03:32:57 UTC
Does the lockup happen always at the same time ?
Comment 3 multinymous 2006-06-24 04:24:43 UTC
At least roughly so, but I'm not sure it's exactly the same frame.
Comment 4 Jerome Glisse 2006-06-24 08:01:13 UTC
Created attachment 6030 [details]
r300 initialization voodoo

Please try launching this program before X
(go to console stop X, launch the program, rerun X)
then try if you still see lockup. You will have to
edit the source to modify #define ADDR youaddress
where your address is given by lspci -v
then second line of memory at ie:

0000:01:00.1 Display controller: ATI Technologies Inc RV350 NJ [Radeon 9800 XT]
	Subsystem: Micro-Star International Co., Ltd.: Unknown device 9561
	Flags: bus master, stepping, 66MHz, medium devsel, latency 64
	Memory at e8000000 (32-bit, prefetchable) [size=128M]
	Memory at fe9e0000 (32-bit, non-prefetchable) [size=64K]
	Capabilities: <available only to root>

put 0xfe9e0000

then gcc r300init.c -o initr300
./initr300 (as root)
Comment 5 multinymous 2006-06-24 09:16:25 UTC
Should I feed the getchar() calls?
If I do so from textmode, I get this:

S 01
RD error 0x0000001F get 0x00000013
RD error 0x151557FF get 0x1515577F
RD error 0x151557FF get 0x1515577F
S 02
S 04
S 05
S 06

Textmode is corrupted, and running X gives a corrupted display and a hang.

lspci -v:

01:00.0 VGA compatible controller: ATI Technologies Inc M22 [Radeon Mobility
M300] (prog-if 00 [VGA])
        Subsystem: IBM Unknown device 056e
        Flags: bus master, fast devsel, latency 0, IRQ 11
        Memory at c0000000 (32-bit, prefetchable) [size=128M]
        I/O ports at 3000 [size=256]
        Memory at b0100000 (32-bit, non-prefetchable) [size=64K]
        [virtual] Expansion ROM at b0120000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 2
        Capabilities: [58] Express Endpoint IRQ 0
        Capabilities: [80] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
        Capabilities: [100] Advanced Error Reporting

ADDR set to 0xb0100000.
Comment 6 Jerome Glisse 2006-06-24 09:31:29 UTC
Could you comment out code btw S01 & S02 and see if this work.
Comment 7 Jerome Glisse 2006-06-24 09:37:46 UTC
Try also commenting out code up to S05
Comment 8 multinymous 2006-06-24 10:07:17 UTC
Commenting out the coe between the "S 01" and "S 02" printfs still corrupts
textmode and X.

With "S 01" through "S 05" comment out there is no corruption, but things
behaves the same as not running initr300t: ppracer hangs with current Mesa CVS
but not with CVS 2006-04-05, and googleearth hangs even with CVS 2006-04-05.

BTW, I ran initr300 after loading the radeon DRM module (from textmode).
Comment 9 mrsteven 2006-06-26 05:31:22 UTC
Do you have page flipping enabled? If you have, disable it and see if it 
Comment 10 multinymous 2006-06-26 07:07:35 UTC
All of my reports are with the default EnablePageFlip = off.
Comment 11 Jerome Glisse 2006-06-26 07:15:37 UTC
Created attachment 6047 [details]
R300 dynamic clock initialization

Could you try this one, like the other change the addr at top of file to
reflect your addr. This sime you can launch it without leaving X but launch it
with no application running and do sync before and as root.

Or you may try the lastest radeon driver in git (the same fix goes in few hour
Comment 12 multinymous 2006-06-26 13:36:39 UTC
ppracer still crashes. with the latest CVS/GIT versions of Mesa+DRM+X.org,
(including the "radeon: force CP and VIP clocks on some r300 and rv100 chips"
change to xf86-video-ati). 

BTW, the crash is not fully deterministic. Here, if I pick the first ppracer
practice track and let it run without touching anything, the crash varies
between 10 to 12 seconds into the game.
Comment 13 Jerome Glisse 2006-06-26 16:23:53 UTC
Could you attach your xorg.conf and one Xorg.0.log to the bug please. Did you
try running one Xorg server with fglrx first and then another one with open
driver althought in your case this doesn't seems to help we might miss another
initialization voodoo.
Comment 14 multinymous 2006-06-26 16:59:48 UTC
Created attachment 6056 [details]
Log for a minimal X session with a ppracer hang.

Using current CVS/GIT of Mesa, DRM and X.org. Ignore the fact that X is on :1
instead of :0. This is the first instance of X invoked since reboot.
Comment 15 multinymous 2006-06-26 17:00:15 UTC
Created attachment 6057 [details]
xorg.conf for above log.
Comment 16 multinymous 2006-06-26 17:04:01 UTC
Log and xorg.conf attached.
I don't have a working fglrx installation at the moment.
Comment 17 Elie Morisse 2006-06-26 22:40:16 UTC
Try with fglrx and check your card temperature. I had a bad surprise when i
found out that after i runned the first r300init lockups remains even with fglrx
and Windows. Further investigation pointed out that r300init modified some
memory timings parameters. Those very aggressive parameters mistreated the AGP
stuff of my MB ( an A7N8X ) and then my second MB ( an A7N8X-E D ). And that
distorded MB mistreated my second GC ( another 9800 Pro ). Result : 2 MB and 2
GC almost dead, unusable ( artifacts , no 3D, crash after 30 minutes even in 2D ).
Comment 18 Jerome Glisse 2006-06-27 01:23:22 UTC
Could you try disabling AIGLX Option “AIGLX” “false” in ServerLayout section. Is
there anythings after a lockup in your kernel log file (grep for drm), you might
need to mount your partition with sync option in order to get interesting log.
Comment 19 multinymous 2006-06-27 07:46:53 UTC
The ppracer hangs persists when r300 is loaded after fglrx without reboot.
Moreover, r300's 3D rendering is now corrupted (see attachment). The laptop's
fan doesn't speed up, so if there is overheating it's too brief to be detected.

Running today's CVS/GIT of Mesa/X.org/DRM, with Option "AIGLX" "false".

ppracer runs fine with fglrx 2.25.18, BTW.

A couple of suggestions:
1. ppracer ran OK in Mesa CVS 2006-04-05. You can try to isolate the change
since then.
2. If you think it's a clocks issue, you may want to see what fglrx's "aticonfig
--set-power-state" does, it might re-init the relevant registers.
Comment 20 multinymous 2006-06-27 07:48:26 UTC
Created attachment 6060 [details]
Badly rendered glxgears (for comment 19).
Comment 21 multinymous 2006-06-28 07:46:37 UTC
The last buildable Mesa CVS on which ppracer runs well is 2006-04-07.
The first buildable Mesa CVS demonstrating the hangup is 2006-05-03.

Inbetween these dates I can't build Mesa CVS:
../common/dri_util.c: In function ‘driCreateNewDrawable’:
../common/dri_util.c:634: error: ‘__DRIdrawable’ has no member named ‘copySubBuffer’

Running with -dumbSched and 'Option "Silkenmouse" "false"' makes no difference.
Comment 22 Jerome Glisse 2006-07-25 10:39:39 UTC
Could you try ppracer with disabling fog, one of the change
btw the two version is fog. If this doesn't do anythings try
launching ppracer with setting R300_SPAN_DISABLE_LOCKING 
environement variable to 1.
Comment 23 multinymous 2006-07-26 00:23:25 UTC
No change with fog disabled or R300_SPAN_DISABLE_LOCKING=1, it still hangs
(current CVS).

BTW, note the link to bug 7371. With the old non-crashing version I also got the
colored trails.
Comment 24 multinymous 2006-07-31 11:04:58 UTC
The hang is indeed triggered by rendering of the snow tracks, as suggested in
bug 7371. The hangs go away when I set "set track_marks false" in
~/.ppracer/options, and come back when I flip it to "true".

Does this help in isolating the DRI bug?
Comment 25 Aapo Tahkola 2006-07-31 16:26:53 UTC
(In reply to comment #24)
> The hang is indeed triggered by rendering of the snow tracks, as suggested in
> bug 7371. The hangs go away when I set "set track_marks false" in
> ~/.ppracer/options, and come back when I flip it to "true".
> Does this help in isolating the DRI bug?

Should be fixed in CVS now. Please reopen if not.

(In reply to comment #20)
> Created an attachment (id=6060) [edit]
> Badly rendered glxgears (for comment 19).

I recall this happens if color tiling is not enabled.
Doesn't seem worth while tracking down IMHO.
Comment 26 multinymous 2006-07-31 17:31:49 UTC
Current CVS fixes the ppracer hang on my box.

Also, either this or some other recent commit solved the Google Earth hang.
Comment 27 ajax at nwnk dot net 2009-08-24 12:23:55 UTC
Mass version move, cvs -> git