Bug 26051 - Regression in 7.6, glxgears locks the machine (rv280)
Regression in 7.6, glxgears locks the machine (rv280)
Status: RESOLVED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/r200
7.6
x86-64 (AMD64) Linux (All)
: medium normal
Assigned To: Default DRI bug account
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-01-14 13:10 UTC by ken moffat
Modified: 2010-02-24 13:46 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
glxinfo without the reverted commit (6.33 KB, text/plain)
2010-01-26 14:12 UTC, ken moffat
Details
glxinfo with 25b492b... reverted (6.33 KB, text/plain)
2010-01-26 14:13 UTC, ken moffat
Details
Print visual used by glxgears (452 bytes, patch)
2010-01-28 00:55 UTC, Michel Dänzer
Details | Splinter Review
stderr from glxgears with mesa-7.6.1 after a cold boot (573 bytes, text/plain)
2010-02-04 09:55 UTC, ken moffat
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ken moffat 2010-01-14 13:10:00 UTC
Building a fresh system, xorg-7.5, in particular xorg-server-1.7.4 (built without dbus or hal), xf86-video-ati-6.12.4. xf86-input-evdev-2.3.2, gcc-4.4.2, linux kernel 2.6.32.2.

 I was running some of my later buildscripts, after I'd got xscreensaver installed, and twice I left it running and returned to a blank screen and only MagicSysRQ functioning.  Nothing in the logs.

 Did some testing, established that glxinfo reported
direct rendering: yes
 then attempted to run glxgears.  The small application window appeared, with a
black background, but no gears, and from that point only MagicSysRQ worked.
This was repeatable.

 I've now ripped out 7.6.1 and built 7.6 (and rebuilt the server against the 7.6 headers), and it works fine - no problem with glxgears, left the screensaver running for 30 minutes without any sign of a problem.

 This is an old RV280 [Radeon 9200 SE].

ĸen [or 'ken' if you can't read that]
Comment 1 Fabio Pedretti 2010-01-15 00:30:07 UTC
It would be useful if you could do a git bisect to find the problematic commit.
Comment 2 ken moffat 2010-01-15 11:20:32 UTC
Bad news for me: I was getting ready to start bisecting, and left the screensaver running (with 7.6) - when I came back, the screen was again black and the keyboard non-operational.  After rebooting, I gave glxgears another try - this time the window appeared but stayed black and the keyboard was again dead.

So, 7.6 has the same problem, but not reliably.  So this isn't a regression in 7.6.1.  Not sure where to go from here (maybe the problem isn't in Mesa).
Comment 3 ken moffat 2010-01-16 12:32:35 UTC
(In reply to comment #2)
> So, 7.6 has the same problem, but not reliably.  So this isn't a regression in
> 7.6.1.  Not sure where to go from here (maybe the problem isn't in Mesa).
> 
 I'm starting to think that 7.5 with current xorg is good, and I've got a test
pattern that might prove that.  Unfortunately, the pattern takes several hours
and needs separate runs and times with the box switched off trying to replicate
whatever caused the problem to eventually show - runs are mostly glxgears, but
with reboots / power off and just letting random screensavers run under load.

 At the moment, I'm attempting to "prove" 7.5 passes this, then I'll rebuild 7.6
and retest to "prove" that is bad (i.e. to try to prove the tests mark as "bad" a
version I know eventually fails) - still don't understand how my initial results
with 7.6 were ok, and I'm a bit worried there is something else I haven't taken
into account.  Not yet sure if the tests will be reliable, or if bisecting this
will be practical.
Comment 4 ken moffat 2010-01-17 11:36:09 UTC
Corrected the description.

I *think* I've now got the test situation repeatable.  The machine needs to have
been powered off after a "good" version of mesa is used, merely rebooting seems to
save some "state" so that "bad" versions still work.

Test case is thus : run glxgears.  In a bad version, the machine locks up as soon
as the application window is drawn (before any gearwheels appear.

Will attempt to bisect.
Comment 5 Fabio Pedretti 2010-01-17 23:53:33 UTC
Can you try with mesa 7.7 or current git from master or mesa_7_7_branch to see if it's already fixed there?
Comment 6 ken moffat 2010-01-18 17:14:45 UTC
 Git tells me
28471cfa970702128d822c2ecbb1703eedbca245 is the first bad commit

 but all I can find about that is from when I tested it:
[28471cfa970702128d822c2ecbb1703eedbca245] Merge branch 'mesa_7_5_branch'

 I can't find references to that commit in git log, or in cgit.freedesktop,
much less work out how to see what changed.  As usual with git, I don't
think I'm cut out to use it.

 Will try 7.7 later.
Comment 7 ken moffat 2010-01-19 11:44:54 UTC
Correction: there is more than one bug.  The last version I tested was 'good' (i.e. I could run glxgears), so I left it in place.  Tonight, the machine locked up when a screensaver was running.
Comment 8 ken moffat 2010-01-25 17:08:43 UTC
The problem (specifically, glxgears locks up before showing any gear wheels) still exists in HEAD from last Friday (124a6b1958c630ea049025e2b72547096fdc8f2c).

I tried git-bisect against this, again it came back to 28471cfa970702128d822c2ecbb1703eedbca245 which is a merge, and as I'm sure you
already know, this usage of git (apparently, merging from cvs) is not nice for those of us asked to bisect.

 Some of the apparent commits in the range shown for the log from that merge seem to have wrong (SHA1?) IDs, or be missing.  This time, I diffed against the "last good" commit and identified 8 commits on cgit which seem to match the commit messages.

 To cut a long story short, when I got back to reverting
25b492b976632269dfa3de164545d50a53c090ce

|From: Michel Dänzer <daenzer@vmware.com>
|Date: Tue, 07 Jul 2009 11:52:35 +0000
|Subject: GLX/DRI1: Mark GLX visuals with depth != screen depth non-conformant.
|
|Such visuals are subject to automatic compositing in the X server, so DRI1
|can't render to them properly.

 I again have a working glxgears (the test includes letting the machine sit powered-off, not just a reboot - power off seems to be important).

 So, at this point I tried to revert it from HEAD to prove it would fix it.
Problem: it has already been reverted in HEAD (and, to repeat, that still
locks up - unless I mistested).

 At the moment, I have one other commit to try reverting :
|From 71633abafc935c25da9731bab48c228ceb9b4097 Mon Sep 17 00:00:00 2001
|From: Michel Dänzer <daenzer@vmware.com>
|Date: Tue, 07 Jul 2009 12:49:52 +0000
|Subject: gallium: Fixes for clobbering stencil values in combined |depth/stencil textures.
|
|Also fix one case where a 32 bit depth value was incorrectly converted to a
|combined depth/stencil value.

 But since my previous attempt isn't helping on HEAD, perhaps I'm doing something wrong ?  At the moment I feel like the narrator in "the third policeman" (google Flann O'Brien) - I'm beginning  to think I've died and
gone to a place for ungood people (or even ++ungood).

 What I really don't get is that neither of these commits are for r200
(so, apparently I've misidentified the area) - I can understand problems
in r200 hanging around because these agp cards are now old and probably
not much used by people close to the leading edge.  But don't people use
glxgears any more ?

 This is all on linux-2.6.32.2 - maybe I should upgrade that before
trying to get any further into this mess ?

 Thanks for listening.
Comment 9 Michel Dänzer 2010-01-26 02:44:35 UTC
(In reply to comment #8)
> The problem (specifically, glxgears locks up before showing any gear wheels)
> still exists in HEAD from last Friday
> (124a6b1958c630ea049025e2b72547096fdc8f2c).

'HEAD' in Git refers to the most recent commit of the currently checked out branch. 124a6b1958c630ea049025e2b72547096fdc8f2c is a rather old (June/July 2009) commit from the master branch.

> I tried git-bisect against this, again it came back to
> 28471cfa970702128d822c2ecbb1703eedbca245 which is a merge, and as I'm sure you
> already know, this usage of git (apparently, merging from cvs)

It has nothing to do with CVS. A merge in Git is simply the folding of the histories of several branches into a single one. Looking at the commit in question in gitk or another GUI frontend may help visualize this.

> is not nice for those of us asked to bisect.

Please try if the problem occurs with either of the parent commits of the merge commit. (Although the fact that git bisect identified the merge commit as the first bad one indicates that you already did in the course of the bisection, and neither of them had the problem)


>  To cut a long story short, when I got back to reverting
> 25b492b976632269dfa3de164545d50a53c090ce
> 
> |From: Michel Dänzer <daenzer@vmware.com>
> |Date: Tue, 07 Jul 2009 11:52:35 +0000
> |Subject: GLX/DRI1: Mark GLX visuals with depth != screen depth non-conformant.
> |
> |Such visuals are subject to automatic compositing in the X server, so DRI1
> |can't render to them properly.
> 
>  I again have a working glxgears (the test includes letting the machine sit
> powered-off, not just a reboot - power off seems to be important).
> 
>  So, at this point I tried to revert it from HEAD to prove it would fix it.
> Problem: it has already been reverted in HEAD (and, to repeat, that still
> locks up - unless I mistested).

AFAICT that change is still in the current master branch. I suspect 'HEAD' doesn't mean what you thought.

(As you don't seem to feel very comfortable with Git yet, it may be a good idea to run 'git status' frequently to verify the current state matches your expectation)


>  At the moment, I have one other commit to try reverting :
> |From 71633abafc935c25da9731bab48c228ceb9b4097 Mon Sep 17 00:00:00 2001
> |From: Michel Dänzer <daenzer@vmware.com>
> |Date: Tue, 07 Jul 2009 12:49:52 +0000
> |Subject: gallium: Fixes for clobbering stencil values in combined
> |depth/stencil textures.

This commit is Gallium specific and irrelevant for this problem.


>  What I really don't get is that neither of these commits are for r200

The former commit could be relevant, though it probably couldn't cause the problem itself but merely trigger it by making glxgears choose a different visual. If you can confirm that reverting this commit avoids the problem, please attach the glxinfo output from with and without the commit.


>  This is all on linux-2.6.32.2 - maybe I should upgrade that before
> trying to get any further into this mess ?

Shouldn't hurt, but don't expect any miracles.
Comment 10 ken moffat 2010-01-26 14:12:43 UTC
Created attachment 32834 [details]
glxinfo without the reverted commit
Comment 11 ken moffat 2010-01-26 14:13:47 UTC
Created attachment 32835 [details]
glxinfo with 25b492b... reverted
Comment 12 ken moffat 2010-01-26 14:18:14 UTC
Thanks for that calm, helpful, and thoughtful reply.  Indeed I should
have spotted that was an old commit.  Thanks for the comment on gitk,
it looks as if my plan to build git on a server without xorg (or tcl,
tk) was not the best idea.

As a non-developer, I have again misunderstood what git-bisect would do.
I apologise for the out of order comments.

Glxinfo attached for 28471cfa970702128d822c2ecbb1703eedbca245 and with
25b492b976632269dfa3de164545d50a53c090ce reverted - the only thing that
changed is the caveat for 32-bit GLX Visuals - 'Ncon' when not working,
'None' when working.

ken@bluesbreaker ~ $diff -u glxinfo-with*
--- glxinfo-without-revert	2010-01-26 21:47:14.000000000 +0000
+++ glxinfo-with-revert	2010-01-26 16:04:37.000000000 +0000
@@ -83,7 +83,7 @@
 0x87 24 dc  0 32  0 r  .  .  8  8  8  8  0 24  8  0  0  0  0  0 0 None
 0x88 24 dc  0 32  0 r  .  .  8  8  8  8  0 24  8 16 16 16 16  0 0 Slow
 0x89 24 dc  0 32  0 r  y  .  8  8  8  8  0 24  8 16 16 16 16  0 0 Slow
-0x6c 32 tc  0 32  0 r  y  .  8  8  8  8  0 24  0  0  0  0  0  0 0 Ncon
+0x6c 32 tc  0 32  0 r  y  .  8  8  8  8  0 24  0  0  0  0  0  0 0 None
 
 16 GLXFBConfigs:
    visual  x  bf lv rg d st colorbuffer ax dp st accumbuffer  ms  cav

Comment 13 Dan Nicholson 2010-01-26 16:13:17 UTC
(In reply to comment #12)
> Thanks for that calm, helpful, and thoughtful reply.  Indeed I should
> have spotted that was an old commit.  Thanks for the comment on gitk,
> it looks as if my plan to build git on a server without xorg (or tcl,
> tk) was not the best idea.

You can try tig (http://jonas.nitro.dk/tig/), which is in ncurses and has a graph mode ('g') that kind of looks like gitk but less pretty.
Comment 14 Pauli 2010-01-27 02:25:04 UTC
Is this all testing done with KMS disabled?

There is few possible reasons for locking that could be checked:

1. You have glxgears running from local console and mesa is trying to output something from inside the drmlocked output which causes recursive lock acquiring from different processes locking everything from graphics card. (Solution to this is to run glxgears from ssh connection with "DISPLAY=:0.0 glxgears" command)

2. I remember seeing similar locking problems that were solved when I cleanly rebuild correct versions of xf86-video-ati and mesa.

3. Try using master branch from xf86-video-ati.

But screensaver freezing sounds like incorrect command stream locking the graphics card to a infinite loop.
Comment 15 Michel Dänzer 2010-01-28 00:55:46 UTC
Created attachment 32866 [details] [review]
Print visual used by glxgears

(In reply to comment #12)
> Glxinfo attached for 28471cfa970702128d822c2ecbb1703eedbca245 and with
> 25b492b976632269dfa3de164545d50a53c090ce reverted - the only thing that
> changed is the caveat for 32-bit GLX Visuals - 'Ncon' when not working,
> 'None' when working.

Okay. Please apply this patch for mesa/progs/xdemos/glxgears and run it on a Mesa build which would result in the problem. It will just print out which visual it would use and exit.

I'm still wondering why git bisect blamed the merge commit rather than 25b492b976632269dfa3de164545d50a53c090ce. Can you try building 25b492b976632269dfa3de164545d50a53c090ce itself again and see if the problem occurs with that as well?
Comment 16 ken moffat 2010-01-31 14:18:11 UTC
(In reply to comment #15)
> Okay. Please apply this patch for mesa/progs/xdemos/glxgears and run it on a
> Mesa build which would result in the problem. It will just print out which
> visual it would use and exit.
> 
 Sorry for the delay replying, had to do my tax return, and then get back to
a state where I was fit to test things.

Using dc2914ab2645d2947898f96f9535f557c7c188cf from 21st January (the last
time I updated, and for which reverting the blamed merge doesn't help) I
got the following message, and then it locked up -
glxgears: Using visual 0x21

 Perhaps I should note that I used to use 16-bit depth until xorg-7.5,
and I now think I'm using 24-bit.

> I'm still wondering why git bisect blamed the merge commit rather than
> 25b492b976632269dfa3de164545d50a53c090ce. Can you try building
> 25b492b976632269dfa3de164545d50a53c090ce itself again and see if the problem
> occurs with that as well?
> 
 Will try this, but not tonight.


Comment 17 ken moffat 2010-01-31 14:20:04 UTC
(In reply to comment #13)
> (In reply to comment #12)
> 
> You can try tig (http://jonas.nitro.dk/tig/), which is in ncurses and has a
> graph mode ('g') that kind of looks like gitk but less pretty.
> 

 Thanks, Dan.  I'll give that a try (first time in a long while that I've
heard a tk app described as "pretty"!)
Comment 18 ken moffat 2010-01-31 14:39:58 UTC
(In reply to comment #14)
> Is this all testing done with KMS disabled?
> 

 Yes, I'm reluctant to try things in the kernel's "staging" area.

> There is few possible reasons for locking that could be checked:
> 
> 1. You have glxgears running from local console and mesa is trying to output
> something from inside the drmlocked output which causes recursive lock
> acquiring from different processes locking everything from graphics card.
> (Solution to this is to run glxgears from ssh connection with "DISPLAY=:0.0
> glxgears" command)

 That seems like a sledgehammer to crack a nut (I only run glxgears
as a minimal test that dri seems to be working).

> 
> 2. I remember seeing similar locking problems that were solved when I cleanly
> rebuild correct versions of xf86-video-ati and mesa.

 For builds during bisection, I updated (dri2proto, glproto, libdrm)
when I had to.  In the end, I stopped rebuilding xf86-video-ati
because it didn't seem to be necessary for the immediate glxgears
problem.

 Having said that, the eventual lockup in a random screensaver
persists even if I remove /usr/lib/dri so it probably isn't a
mesa problem (although it *was* the original aggravation that
got me looking at this).

> 
> 3. Try using master branch from xf86-video-ati.
> 
> But screensaver freezing sounds like incorrect command stream locking the
> graphics card to a infinite loop.
> 

 Looking at what changed since my last build (back in September)
my guess is that libdrm or xf86-video-ati are now the most likely
places to look for the random lockup.  But, I'd prefer to get the
glxgears issue out of the way first (if the problem is in libdrm,
I might have to build mesa-7.5 to bisect).

 I've seen a report from someone with a different old radeon that
xscreensaver locked the box for him too, but he was upgrading from
a *very* old system (back in the days when xscreensaver often
locked up on radeons) and used his existing workaround to just
set xscreensaver to blank the screen.  For me, it had been years
since I last saw the lockups, so I new this wasn't expected.

Comment 19 Michel Dänzer 2010-02-02 02:01:10 UTC
(In reply to comment #16)
> Using dc2914ab2645d2947898f96f9535f557c7c188cf from 21st January (the last
> time I updated, and for which reverting the blamed merge doesn't help)

BTW, does reverting commit 5ed440400573631f540701f3efd479d8c1592007 as well as 25b492b976632269dfa3de164545d50a53c090ce work around the problem with a current snapshot?

> I got the following message, and then it locked up -
> glxgears: Using visual 0x21

Whoops sorry, I meant to attach a patch with exit(1) after printing the information, but apparently I accidentally attached an old version.


> > I'm still wondering why git bisect blamed the merge commit rather than
> > 25b492b976632269dfa3de164545d50a53c090ce. Can you try building
> > 25b492b976632269dfa3de164545d50a53c090ce itself again and see if the problem
> > occurs with that as well?
> > 
>  Will try this, but not tonight.

Any chance yet? This would be very interesting.


(In reply to comment #17)
> (first time in a long while that I've heard a tk app described as "pretty"!)

The gitk UI most definitely isn't 'pretty' but highly functional. :)


(In reply to comment #18)
> > 1. You have glxgears running from local console and mesa is trying to output
> > something from inside the drmlocked output which causes recursive lock
> > acquiring from different processes locking everything from graphics card.
> > (Solution to this is to run glxgears from ssh connection with "DISPLAY=:0.0
> > glxgears" command)
> 
>  That seems like a sledgehammer to crack a nut (I only run glxgears
> as a minimal test that dri seems to be working).

Of course it's not a final solution, but it would be interesting if e.g. redirecting both stdout and stderr with something like

glxgears &>/dev/null

works around the problem.
Comment 20 ken moffat 2010-02-02 14:25:56 UTC
(In reply to comment #19)
> (In reply to comment #16)
> > Using dc2914ab2645d2947898f96f9535f557c7c188cf from 21st January (the last
> > time I updated, and for which reverting the blamed merge doesn't help)
> 
> BTW, does reverting commit 5ed440400573631f540701f3efd479d8c1592007 as well as
> 25b492b976632269dfa3de164545d50a53c090ce work around the problem with a current
> snapshot?
> 
 Yes, it does!

[...] 
> 
> > > I'm still wondering why git bisect blamed the merge commit rather than
> > > 25b492b976632269dfa3de164545d50a53c090ce. Can you try building
> > > 25b492b976632269dfa3de164545d50a53c090ce itself again and see if the problem
> > > occurs with that as well?
> > > 
> >  Will try this, but not tonight.
> 
> Any chance yet? This would be very interesting.
> 
25b492b9 works fine.

[...]
> (In reply to comment #18)
> > > 1. You have glxgears running from local console and mesa is trying to output
> > > something from inside the drmlocked output which causes recursive lock
> > > acquiring from different processes locking everything from graphics card.
> > > (Solution to this is to run glxgears from ssh connection with "DISPLAY=:0.0
> > > glxgears" command)
> > 
> >  That seems like a sledgehammer to crack a nut (I only run glxgears
> > as a minimal test that dri seems to be working).
> 
> Of course it's not a final solution, but it would be interesting if e.g.
> redirecting both stdout and stderr with something like
> 
> glxgears &>/dev/null
> 
> works around the problem.
> 
 Seems to - I had moved /usr/lib/dri from 7.6.1 to /usr/lib/dri.7.6.1
in case I had any use for it.  Symlinking /usr/lib/dri to it, and
then running 'glxgears & >/dev/null' works - I'm surprised that
I still see the output for the number of frames in 5 seconds.  
Comment 21 Michel Dänzer 2010-02-03 02:19:22 UTC
(In reply to comment #20)
> > BTW, does reverting commit 5ed440400573631f540701f3efd479d8c1592007 as well as
> > 25b492b976632269dfa3de164545d50a53c090ce work around the problem with a current
> > snapshot?
> > 
>  Yes, it does!
> 
> [...] 
> 
> 25b492b9 works fine.

The plot thickens... the problem seems related to using a visual with stencil bits.


> > Of course it's not a final solution, but it would be interesting if e.g.
> > redirecting both stdout and stderr with something like
> > 
> > glxgears &>/dev/null
> > 
> > works around the problem.
> > 
>  Seems to - I had moved /usr/lib/dri from 7.6.1 to /usr/lib/dri.7.6.1
> in case I had any use for it.  Symlinking /usr/lib/dri to it, and
> then running 'glxgears & >/dev/null' works - I'm surprised that
> I still see the output for the number of frames in 5 seconds.

That is indeed very surprising: due to the space between '&' and '>', this invocation doesn't actually redirect stdout or stderr AFAICT. If

LIBGL_DEBUG=verbose glxgears >/tmp/stdout.txt 2>/tmp/stderr.txt

works as well (with the exact same whitespace this time :), please attach the resulting std*.txt files.
Comment 22 ken moffat 2010-02-03 11:18:54 UTC
(In reply to comment #21)


> >  Seems to - I had moved /usr/lib/dri from 7.6.1 to /usr/lib/dri.7.6.1
> > in case I had any use for it.  Symlinking /usr/lib/dri to it, and
> > then running 'glxgears & >/dev/null' works - I'm surprised that
> > I still see the output for the number of frames in 5 seconds.
> 
> That is indeed very surprising: due to the space between '&' and '>', this
> invocation doesn't actually redirect stdout or stderr AFAICT. If
> 
> LIBGL_DEBUG=verbose glxgears >/tmp/stdout.txt 2>/tmp/stderr.txt
> 
> works as well (with the exact same whitespace this time :), please attach the
> resulting std*.txt files.
> 

 Apologies, I seem to have reverted to "bear of little brain" mode
yesterday. That was a *warm* reboot after running a good version.
With a cold start and correct syntax it still locks up.

 Would the verbose debugging info be useful in this situation ?
Comment 23 Michel Dänzer 2010-02-04 03:37:48 UTC
(In reply to comment #22)
> With a cold start and correct syntax it still locks up.
> 
>  Would the verbose debugging info be useful in this situation ?

The above leaves me unsure if you've actually tested redirecting all output to files.
Comment 24 ken moffat 2010-02-04 09:55:41 UTC
Created attachment 33073 [details]
stderr from glxgears with mesa-7.6.1 after a cold boot

Locked hard, wasn't able to use SysRq, stdout.txt was empty
Comment 25 ken moffat 2010-02-04 10:09:38 UTC
Apropos the stderr attachment -

First, I ran the build of recent mesa with the specified 2
commits reverted (i.e. working around the issue).  The output
on stderr was similar, apart from an XIO message where I
closed the window.

Stdout was: 3192 frames in 5.0 seconds = 638.292 FPS

Came out of xorg, swapped the /usr/lib/dri symlink to point
to 7.6.1.  startx, ran glxgears again for a "warm boot" : The
information was similar (different numbers of requests, and
frames, of course).

Kept the symlink pointing to 7.6.1, powered off, waited a
minute, booted, ran glxgears, created the file I've attached
(in ~/ because /tmp is volatile across reboots on my systems).
Waited 2 or 3 seconds, tried to sync etc but Alt-SysRq
had no effect and I had to hit the reset button.  The stdout
file from the 'cold' run is empty.
Comment 26 Michel Dänzer 2010-02-16 02:56:06 UTC
If commit 862488075c5537b0613753b0d14c267527fc6199 plus 25b492b976632269dfa3de164545d50a53c090ce applied manually doesn't lock up, please do another bisect between 862488075c5537b0613753b0d14c267527fc6199 and 9982821cdaf2205443c6297368eaab4115bf92f6, always with 25b492b976632269dfa3de164545d50a53c090ce applied manually.
Comment 27 ken moffat 2010-02-16 13:56:30 UTC
(In reply to comment #26)
> If commit 862488075c5537b0613753b0d14c267527fc6199 plus
> 25b492b976632269dfa3de164545d50a53c090ce applied manually doesn't lock up,

 It does.  The tree from 862488075c is ok, the tree from 25b492b is
similarly ok.  Adding that one commit to the tree from 862488075c
locks up in glxgears.


> please do another bisect between 862488075c5537b0613753b0d14c267527fc6199 and
> 9982821cdaf2205443c6297368eaab4115bf92f6, always with
> 25b492b976632269dfa3de164545d50a53c090ce applied manually.
> 

Comment 28 Michel Dänzer 2010-02-17 04:06:31 UTC
Ugh, thinko in comment #26 - 862488075c5537b0613753b0d14c267527fc6199 would need to be replaced by 666e5bf4a6728484b4bc0c7e2583f141f1f2b2b7, which is where mesa_7_5_branch branched off master. However, bisecting between 666e5bf4a6728484b4bc0c7e2583f141f1f2b2b7 and 9982821cdaf2205443c6297368eaab4115bf92f6 is probably pointless as the radeon-rewrite branch was merged between them...

So, it seems like most likely the lockups are due to a stencil related radeon-rewrite regression.

Do apps which actually use the stencil buffer (e.g. mesa/progs/tests/stencil*) still lock up with current code patched so that glxgears doesn't lock up?
Comment 29 ken moffat 2010-02-17 17:53:52 UTC
(In reply to comment #28)
> Ugh, thinko in comment #26 - 862488075c5537b0613753b0d14c267527fc6199 would
> need to be replaced by 666e5bf4a6728484b4bc0c7e2583f141f1f2b2b7, which is where
> mesa_7_5_branch branched off master. However, bisecting between
> 666e5bf4a6728484b4bc0c7e2583f141f1f2b2b7 and
> 9982821cdaf2205443c6297368eaab4115bf92f6 is probably pointless as the
> radeon-rewrite branch was merged between them...
> 
> So, it seems like most likely the lockups are due to a stencil related
> radeon-rewrite regression.
> 
> Do apps which actually use the stencil buffer (e.g. mesa/progs/tests/stencil*)
> still lock up with current code patched so that glxgears doesn't lock up?
> 

 stencilwrap locks, stencil_twoside doesn't run, stencil_wrap appears to be ok.
Full details, using 846cf495226be78b05e74064662eba4e7eb8280e from Tuesday,
reverting 5ed440400573631f540701f3efd479d8c1592007 and manually reverting
25b492... (dri_glx.c is no longer in x11 subdirectory).

stencil_twoside
GL_RENDERER = Mesa DRI R200 (RV280 5964) 20090101 AGP 8x  TCL
GL_VERSION = 1.3 Mesa 7.8-devel
Sorry, this program requires either GL_ATI_separate_stencil or OpenGL 2.0.

stencilwrap [ selected details written down by hand ]
Failed GL_KEEP test (got 247, expected 255)
then reported 'OK!' for GL_ZERO GL_REPLACE GL_INCR GL_DECR
GL_INVERT GL_INCR_WRAP_EXT GL_DECR_WRAP_EXT
then locked up, so either at end of GL_DECR_WRAP_EXT *after* reporting
'OK!' or in whatever comes next.

stencil_wrap
appears to work, 5 grey squares on blue background, message:
All 5 squares should be the same color.
Stencil bits = 8, maximum stencil value = 0x000000ff
(that last line repeats from time to time, possibly triggered by
moving windows).  Terminated by closing the window.

Comment 30 Michel Dänzer 2010-02-24 02:22:45 UTC
Do the lockups still happen with mesa Git as of commit 3f5bac8960a5c6d1f08f0dc849676139b9d6ce5c (master) or bc7e12e5e3c063b8f29fecad43d85b09fa6b205d (mesa_7_7_branch)?
Comment 31 ken moffat 2010-02-24 13:46:33 UTC
(In reply to comment #30)
> Do the lockups still happen with mesa Git as of commit
> 3f5bac8960a5c6d1f08f0dc849676139b9d6ce5c (master) or
> bc7e12e5e3c063b8f29fecad43d85b09fa6b205d (mesa_7_7_branch)?
> 

No!  Seems to be fixed by 3f5bac8960a5c6d1f08f0dc849676139b9d6ce5c.

On master, glxgears is ok, but 'make' failed in progs/tests
make: *** No rule to make target `../../src/mesa/glapi/gl_API.xml', needed by `getproclist.h'.  Stop.

I thought this was in fptexture.c but maybe I was mistaken,
after hacking the Makefile to not build fptexture, and then
getprocaddress, I have the following results for the stencil
tests -

still missing a dependency for stencil_twoside (or perhaps
it is now old), stencil_wrap is ok, stencilwrap still seems
to be hung after

Testing GL_DECR_WRAP_EXT...
OK!

but the box isn't locked and I can kill that test.

Similar results for bc7e12e5e3c063b8f29fecad43d85b09fa6b205d
except there is no problem building progs/tests there.

FWIW, I applied 3f5bac89 to 7.6.1 - last hunk in radeon_ioctl.c
doesn't apply (it tries to remove a call to radeonEmitState that
isn't there) but the remainder fixes 7.6.1.

Many thanks.  Closing.