Bug 50655

Summary:	[r600g][RV670 HD3870] Ioquake games causes GPU lockup (waiting for 0x00003039 last fence id 0x00003030)
Product:	Mesa	Reporter:	Bryan Quigley <gquigs+bugs>
Component:	Drivers/Gallium/r600	Assignee:	Default DRI bug account <dri-devel>
Status:	RESOLVED FIXED	QA Contact:
Severity:	major
Priority:	medium	CC:	archon-123, dinolib, maraeo, myckel, rminkler
Version:	git
Hardware:	x86-64 (AMD64)
OS:	Linux (All)
Whiteboard:
i915 platform:		i915 features:
Attachments:	kern.log syslog Xorg log weird screen good+bad git bisects possible fix 3 outputs of syslog: before the patch, after, and after really bad possible fix possible fix Possible fix for R600 hw deadlock flush fix 1/4 flush fix 2/4 flush fix 3/4 flush fix 4/4 new attempt 1/5 new attempt 2/5 new attempt 3/5 new attempt 4/5 new attempt 5/5 simple fix alternate simple fix better alternative fix

Description Bryan Quigley 2012-06-03 14:23:01 UTC

Created attachment 62474 [details]
kern.log

Tested and reproducible with Urban Terror, Warsow, and World of Padman.  Used phoronix test suite, and at some point during the run of each game, it would either freeze or eventually display weird output to the screen (attached). 

Occasionally a VT switch would let the game "appear" again.  Other times you can hear the sound of the game continue.

I am using the 3.4 kernel and drivers/X, etc from Xorg Edgers PPA, which for the ati driver would be 6.14.99+git20120525.b1e9c308 and mesa is  	8.1~git20120530.ff3eef1a.

You should be able to reproduce this by:
Installing phoronix test suite (http://www.phoronix-test-suite.com/?k=downloads)
and then running: phoronix-test-suite benchmark urbanterror
(or warsow or padman)

Comment 1 Bryan Quigley 2012-06-03 14:23:32 UTC

Created attachment 62475 [details]
syslog

Comment 2 Bryan Quigley 2012-06-03 14:23:51 UTC

Created attachment 62476 [details]
Xorg log

Comment 3 Bryan Quigley 2012-06-03 15:41:37 UTC

Created attachment 62478 [details]
weird screen

Comment 4 Alexandre Demers 2012-06-04 05:22:26 UTC

Would it be possible to test the same thing, but with kernel 3.2? I'd like to know if we are experiencing the same problem that I reported some time ago.

Comment 5 Alex Deucher 2012-06-04 05:33:01 UTC

Would it be possible to narrow down which component (kernel, ddx, or mesa) is causing the problem and bisect?  I'd guess it's a mesa issue.

Comment 6 Bryan Quigley 2012-06-04 07:37:58 UTC

I did test with a 3.2 and the same 3.4 kernel and the stable mesa/X/drivers that came with Precise.  This did not cause a crash..

I think I tested with 3.2 and the git mesa/X/drivers will cause the crash, I'll confirm tonight.

Comment 7 Bryan Quigley 2012-06-04 18:50:08 UTC

Just upgrading Mesa (which does pull in libdrm upgrades) causes the bug.. even on the 3.2 kernel without Xorg/Drivers upgraded... I think this confirms it is mesa bug..

Comment 8 Michel Dänzer 2012-06-05 09:02:35 UTC

(In reply to comment #7)
> I think this confirms it is mesa bug..

Would be great if you could bisect mesa Git then.

Comment 9 Bryan Quigley 2012-06-06 21:39:53 UTC

I think I did everything right in this bisect (I didn't the first attempt).

fbebd431ec4e2e461a0cbcd5f3a04a000b8f6bbf is the first bad commit
commit fbebd431ec4e2e461a0cbcd5f3a04a000b8f6bbf
Author: Marek Olšák <maraeo@gmail.com>
Date:   Fri Feb 3 05:05:31 2012 +0100

    r600g: move invariant register updates into start_cs for r6xx-r7xx

:040000 040000 dd9232a0c49e54e0cd536fa858dc131982dc2fbe 379e1d61c53d98a8706f32da5020dc22c0c0ee33 M	src

Comment 10 Bryan Quigley 2012-06-06 21:42:42 UTC

Created attachment 62689 [details]
good+bad git bisects

Both the good and bad git bisect logs, the good one had me run warsow, padman, and urbanterror looking for the bug.
The bad one missed some occurrences it seems.

Comment 11 Michel Dänzer 2012-06-07 00:40:00 UTC

Marek, any ideas? (bug 47116 might be related)

Comment 12 Alex Deucher 2012-06-07 07:16:57 UTC

I think I know what's going on here.  There's a hw bug on r6xx where you need to re-emit a CB register if some state further up the pipeline changes even if the CB state has not changed.  I remember fixing it in r600c, but I can't find the commit...

Comment 13 Alex Deucher 2012-06-07 07:20:16 UTC

IIRC, the fix is to always re-emit a CB reg between draw calls if some other state changed.

Comment 14 Marek Olšák 2012-06-07 09:19:52 UTC

(In reply to comment #11)
> Marek, any ideas? (bug 47116 might be related)

Sorry I've got none. All the regs were really invariant at the time I wrote the commit. A hardware bug like Alex suggested is one possible explanation...

Comment 15 Bryan Quigley 2012-08-15 14:49:33 UTC

Bug still occurs in git from yesterday.

I'm willing to test patches or even do some basic programming (no graphics experience).  I wasn't able to just revert the problem patch and am not sure which parts I should be trying to keep.

Comment 16 Marek Olšák 2012-08-24 01:14:51 UTC

Created attachment 66040 [details] [review]
possible fix

Could you please try this patch?

Comment 17 Bryan Quigley 2012-08-24 04:31:57 UTC

The patch doesn't seem to work.  It may have made the crash more likely to bring the system down, but I'd have to do more testing to confirm that.

Attaching 3 syslog results in 1 file containing:
Before the patch
After the patch
After the patch - broke so much it needed a restart

Comment 18 Bryan Quigley 2012-08-24 04:32:48 UTC

Created attachment 66047 [details]
3 outputs of syslog: before the patch, after, and after really bad

Comment 19 Bryan Quigley 2012-12-10 06:09:36 UTC

Would any other output help debug this?  Register dumps using avivotool?

Comment 20 Alex Deucher 2012-12-10 15:49:48 UTC

Created attachment 71271 [details] [review]
possible fix

Does this patch help?

Comment 21 Bryan Quigley 2012-12-10 21:21:38 UTC

Nope, but the patch didn't work as is, so I changed it to:
rctx->framebuffer.atom.dirty = true;

Which may not be what the patch was actually trying to do...

Comment 22 Marek Olšák 2012-12-11 02:01:04 UTC

(In reply to comment #21)
> Nope, but the patch didn't work as is, so I changed it to:
> rctx->framebuffer.atom.dirty = true;
> 
> Which may not be what the patch was actually trying to do...

Your modification is correct. So did it work or not?

Comment 23 Bryan Quigley 2012-12-11 04:15:33 UTC

No, the new patch doesn't fix it either.

Comment 24 Myckel Habets 2012-12-11 06:59:23 UTC

Some more info is in Bug 58058, because I think this is the same problem.

Comment 25 Alex Deucher 2012-12-11 17:15:29 UTC

Created attachment 71346 [details] [review]
possible fix

Try this patch.  It re-emits most of the invariant state at draw time.  If it helps, please try commenting out (change the #if 1 to #if 0) each new section until you are able to trigger the lock ups again so we can narrow down which state needs to be re-emitted at draw time.

Comment 26 Myckel Habets 2012-12-11 18:05:01 UTC

(In reply to comment #25)
> Created attachment 71346 [details] [review] [review]
> possible fix
> 
> Try this patch.  It re-emits most of the invariant state at draw time.  If
> it helps, please try commenting out (change the #if 1 to #if 0) each new
> section until you are able to trigger the lock ups again so we can narrow
> down which state needs to be re-emitted at draw time.

In my case it didn't help, although I had the impression it took longer before it hang (could also be random?)

Comment 27 Myckel Habets 2012-12-11 18:13:07 UTC

2nd time it took shorter for it to lock up. Some observations: screen gets distorted after one or more resets (wrong rendering order?). Resetting of the screen keeps going, also when switched in the console (tty interface), until X is killed/shutdown.

Comment 28 Bryan Quigley 2012-12-11 21:38:48 UTC

I confirm that patch 71346 didn't help either.

Comment 29 Bryan Quigley 2012-12-11 21:54:42 UTC

I get a similar lockup when starting Team Fortress 2 (native). It happens at startup so it's much easier to reproduce..

Comment 30 Andy Furniss 2012-12-13 16:17:36 UTC

I've just put my rv670 (HD3850) card back in my AGP box and can reliably get etqw to lock after a few seconds with waiting for fence.

I may be too different from the OP for this to be relevant to this bug differences -

AGP, 32 bit, running drm-fixes kernel, no writebacks and my bisect came up with a commit postdating the original report.

But for me - 

1eedebc65b02130ef7a27062a1ed67972a317a08 is first bad commit
commit 1eedebc65b02130ef7a27062a1ed67972a317a08
Author: Marek OlÅ¡Ã¡k <maraeo@gmail.com>
Date:   Thu Nov 1 02:00:37 2012 +0100

    r600g: re-enable handling of DISCARD_RANGE, improving performance
    
    It seems to work for me now. Even the graphics corruption is gone.
    
    This also boosts performance in Reaction Quake.

Gives a reliable rv670 lock up with etqw.

This is testing with mesa built with --disable-llvm (as R600_LLVM doesn't work at all on this card)

It may (or may not) be worth anyone testing with mesa master to try resetting it to the commit before the one above like -

make distclean
git clean -dfx
git reset --hard fa58644855e44830e0b91dc627703c236fa6712a

Comment 31 Bryan Quigley 2012-12-13 16:34:11 UTC

Andy,

How many times did you try it at that commit?  I ask because I orginally bisected it wrong because it didn't always reproduce consistantly for me.  (Would take >1 run)

I'll test it out though.

Comment 32 Andy Furniss 2012-12-13 17:08:27 UTC

Looks like this was a separate issue - I've just managed to get openarena to lock GPU with mesa set to before

r600g: re-enable handling of DISCARD_RANGE

Comment 33 Andy Furniss 2012-12-13 17:11:35 UTC

(In reply to comment #31)
> Andy,
> 
> How many times did you try it at that commit?  I ask because I orginally
> bisected it wrong because it didn't always reproduce consistantly for me. 
> (Would take >1 run)
> 
> I'll test it out though.

I am still testing - for etqw it looks good, but as I just posted I can after some time get openarena to lock.

Comment 34 Andy Furniss 2012-12-13 22:54:58 UTC

(In reply to comment #9)
> I think I did everything right in this bisect (I didn't the first attempt).
> 
> fbebd431ec4e2e461a0cbcd5f3a04a000b8f6bbf is the first bad commit
> commit fbebd431ec4e2e461a0cbcd5f3a04a000b8f6bbf
> Author: Marek Olšák <maraeo@gmail.com>
> Date:   Fri Feb 3 05:05:31 2012 +0100
> 
>     r600g: move invariant register updates into start_cs for r6xx-r7xx
> 
> :040000 040000 dd9232a0c49e54e0cd536fa858dc131982dc2fbe
> 379e1d61c53d98a8706f32da5020dc22c0c0ee33 M	src

This seems correct, I can get a lock after a few minutes on this commit, but have so far failed to lock on the one before it.

Comment 35 Myckel Habets 2012-12-16 21:12:44 UTC

(In reply to comment #30)

> make distclean
> git clean -dfx
> git reset --hard fa58644855e44830e0b91dc627703c236fa6712a

Ok, did this and rebuild everything, but problem stays in my case.

Comment 36 Bryan Quigley 2013-02-19 01:53:30 UTC

I believe this bug is now triggered much faster (within 10 seconds of starting one of these games).  But on the plus side it seems to usually just crash the game in question.  (Running xorg-edgers (git) on Ubuntu Raring, kernel 3.8) 

Excerpt from kern.log
346488] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
346502] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000ced last fence id 0x0000000000000cea)
347653] radeon 0000:01:00.0: Saved 121 dwords of commands on ring 0.
347664] radeon 0000:01:00.0: GPU softreset: 0x00000003
348246] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xE7730130
348253] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00FF0103
348259] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200000C0
348265] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x02000000
348271] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00040804
348277] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00028284
348283] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80878645
348289] radeon 0000:01:00.0:   R_008020_GRBM_SOFT_RESET=0x00007FEE
363165] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
378050] radeon 0000:01:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
378056] radeon 0000:01:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
378062] radeon 0000:01:00.0:   R_000E50_SRBM_STATUS      = 0x200080C0
378068] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
378074] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
378079] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
378085] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80100000
382501] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
400412] [drm] probing gen 2 caps for device 1022:9603 = 2/0
400422] [drm] PCIE gen 2 link speeds already enabled
405272] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
405369] radeon 0000:01:00.0: WB enabled
405379] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffdccc00
405388] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffdccc0c
436612] [drm] ring test on 0 succeeded in 0 usecs
436678] [drm] ring test on 3 succeeded in 1 usecs
438392] [drm] ib test on ring 0 succeeded in 0 usecs
438423] [drm] ib test on ring 3 succeeded in 1 usecs

End of an apitrace:
4700817 glClientActiveTextureARB(texture = GL_TEXTURE1)
4700818 glBindTexture(target = GL_TEXTURE_2D, texture = 0)
4700819 glActiveTextureARB(texture = GL_TEXTURE0)
4700820 glClientActiveTextureARB(texture = GL_TEXTURE0)
4700821 glBindTexture(target = GL_TEXTURE_2D, texture = 0)
4700822 glXMakeCurrent(dpy = 0xb7f7c80, drawable = 0, ctx = NULL) = True
4700823 glXDestroyContext(dpy = 0xb7f7c80, ctx = 0xb830e08)
4700339 glDrawElements(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_INT, indices = blob(72)) // incomplete

Comment 37 Myckel Habets 2013-02-19 22:42:51 UTC

Some attempts from my side:

I've been going back in the tree, to see if I could find a point where it doesn't show this bug. I've come as far as end 2011, but still it locks up (although it seems that it takes more time). With my last check (early 2011) I was unable to build the code, seems that it is incompatible going back that far. I'll see if I can find the spot where I can build it again and test it.

Comment 38 Bryan Quigley 2013-02-19 23:21:44 UTC

@Myckel Habets in comment #39

What do you mean by a lot more time?  I would test with 3 games, running 3 times each, automatically via phoronix stest suite.  

With the latest git mesa does it crash very quickly for you?

Comment 39 Erik Jørgensen 2013-02-21 23:03:24 UTC

Created attachment 75272 [details] [review]
Possible fix for R600 hw deadlock

Patch has been tested on a system with AMD K8 CPU and Radeon AGP card (AMD RV670 / Radeon HD 3850) with both 3.6.11-030611-generic kernel (from Ubuntu kernel PPA mainline) and kernel built from recent drm-fixes git in the testing. This patch may also be relevant to reported Bug 47116 .

Comment 40 Alex Deucher 2013-02-21 23:36:50 UTC

(In reply to comment #39)
> Created attachment 75272 [details] [review] [review]
> Possible fix for R600 hw deadlock
> 
> Patch has been tested on a system with AMD K8 CPU and Radeon AGP card (AMD
> RV670 / Radeon HD 3850) with both 3.6.11-030611-generic kernel (from Ubuntu
> kernel PPA mainline) and kernel built from recent drm-fixes git in the
> testing. This patch may also be relevant to reported Bug 47116 .

There is a lot of unrelated stuff going on in that patch.  Can you narrow down what part fixes the issue?

Comment 41 Alex Deucher 2013-02-22 00:14:42 UTC

Created attachment 75274 [details] [review]
flush fix 1/4

Please try this patch series.  The 4th patch is optional.  It just enables CP DMA assuming that the previous flushing fixes fix the CP DMA issues.

Comment 42 Alex Deucher 2013-02-22 00:15:11 UTC

Created attachment 75275 [details] [review]
flush fix 2/4

patch 2 of 4.

Comment 43 Alex Deucher 2013-02-22 00:15:47 UTC

Created attachment 75276 [details] [review]
flush fix 3/4

patch 3 of 4.

Comment 44 Alex Deucher 2013-02-22 00:16:21 UTC

Created attachment 75277 [details] [review]
flush fix 4/4

Optional patch to enable CP DMA on 6xx.

Comment 45 Alex Deucher 2013-02-22 00:18:12 UTC

*** Bug 47116 has been marked as a duplicate of this bug. ***

Comment 46 Andy Furniss 2013-02-22 01:03:32 UTC

(In reply to comment #39)
> Created attachment 75272 [details] [review] [review]
> Possible fix for R600 hw deadlock
> 
> Patch has been tested on a system with AMD K8 CPU and Radeon AGP card (AMD
> RV670 / Radeon HD 3850) with both 3.6.11-030611-generic kernel (from Ubuntu
> kernel PPA mainline) and kernel built from recent drm-fixes git in the
> testing. This patch may also be relevant to reported Bug 47116 .

Testing AGP HD3850 - this patch regresses etqw which since my previous post in this bug had become stable. GPU lock within seconds with or without llvm. Testing on 3.7.6 (purely because I have a separate issue with gpu locks provoking oops with current kernels).

It does however seem to fix openarena and nexuiz which without this patch would gpu lock, or really hard lock respectively after a couple of minutes. Haven't had time to test really long runs yet though.

Comment 47 Bryan Quigley 2013-02-22 05:24:05 UTC

The series of 4 patches by Alex (41-44) doesn't fix the issue for me.

The patch in Comment #39 does fix it for me!  I tested it repeatedly with 6 runs of padman, urbanterror and openarena each. (using 3.8 kernel)

Comment 48 Andy Furniss 2013-02-22 13:34:41 UTC

(In reply to comment #42)
> Created attachment 75275 [details] [review] [review]
> flush fix 2/4
> 
> patch 2 of 4.

This patch (patch 1 also applied) regresses etqw.

Comment 49 Alex Deucher 2013-02-22 14:36:10 UTC

Created attachment 75317 [details] [review]
new attempt 1/5

Another attempt to fix the issue.  Patch 5 is optional and not related to bug per se.

Comment 50 Alex Deucher 2013-02-22 14:36:38 UTC

Created attachment 75318 [details] [review]
new attempt 2/5

2/5

Comment 51 Alex Deucher 2013-02-22 14:37:05 UTC

Created attachment 75319 [details] [review]
new attempt 3/5

3/5

Comment 52 Alex Deucher 2013-02-22 14:37:35 UTC

Created attachment 75320 [details] [review]
new attempt 4/5

4/5

Comment 53 Alex Deucher 2013-02-22 14:38:07 UTC

Created attachment 75321 [details] [review]
new attempt 5/5

optional 5/5.

Comment 54 Alex Deucher 2013-02-22 15:36:39 UTC

Latest patches 1 and 4 alone are enough to fix the hangs for me on an rs780.

Comment 55 Alex Deucher 2013-02-22 16:06:19 UTC

actually just patch 4 alone seems to fix it.

Comment 56 Alex Deucher 2013-02-22 16:53:55 UTC

Created attachment 75331 [details] [review]
simple fix

Just this patch alone seems to fix the issue here.

Comment 57 Andy Furniss 2013-02-22 17:40:48 UTC

(In reply to comment #51)
> Created attachment 75319 [details] [review] [review]
> new attempt 3/5
> 
> 3/5

FWIW now it's obsolete this still regressed etqw.

Also tried 1+2+4 and 1+2+3+4 with openarena/nexuiz and still had lockups.

Will try 4 alone next.

Comment 58 Bryan Quigley 2013-02-22 18:19:55 UTC

The simple patch appears to have fixed it for me. (comment 56).  Just did 9 total runs, will test more later today.

Comment 59 Myckel Habets 2013-02-22 19:25:25 UTC

(In reply to comment #56)
> Created attachment 75331 [details] [review] [review]
> simple fix
> 
> Just this patch alone seems to fix the issue here.

I just had a lock up after ~30min in the game (openarena).

Comment 60 Alex Deucher 2013-02-22 19:43:08 UTC

(In reply to comment #59)
> (In reply to comment #56)
> > Created attachment 75331 [details] [review] [review] [review]
> > simple fix
> > 
> > Just this patch alone seems to fix the issue here.
> 
> I just had a lock up after ~30min in the game (openarena).

Can you try just the patch "new attempt 4/5" (attachment 75320 [details] [review]) by itself?

Comment 61 Alex Deucher 2013-02-22 19:53:28 UTC

(In reply to comment #57)
> (In reply to comment #51)
> > Created attachment 75319 [details] [review] [review] [review]
> > new attempt 3/5
> > 
> > 3/5
> 
> FWIW now it's obsolete this still regressed etqw.
> 
> Also tried 1+2+4 and 1+2+3+4 with openarena/nexuiz and still had lockups.
> 
> Will try 4 alone next.

Can you also try the simple fix (attachment 75331 [details] [review])?

Comment 62 Alex Deucher 2013-02-22 19:59:16 UTC

Created attachment 75363 [details] [review]
alternate simple fix

Another patch to try.

Comment 63 Myckel Habets 2013-02-22 20:04:31 UTC

I tried the simple fix together with Eriks patch, haven't been able to get it locked up yet after ~30 minutes.

I'll also try the alternate simple fix later.

Comment 64 Andy Furniss 2013-02-22 20:56:59 UTC

(In reply to comment #56)
> Created attachment 75331 [details] [review] [review]
> simple fix
> 
> Just this patch alone seems to fix the issue here.

I can still lockup with this and 0004.

It took longer with 0004 and generally llvm seems to take longer to lock than R600_LLVM=0.

Will try the new patch.

Comment 65 Andy Furniss 2013-02-22 21:15:15 UTC

(In reply to comment #64)
> (In reply to comment #56)
> > Created attachment 75331 [details] [review] [review] [review]
> > simple fix
> > 
> > Just this patch alone seems to fix the issue here.
> 
> I can still lockup with this 

Ignore this - I messed up when testing simple fix and was testing unpatched - it's running now and hasn't locked yet.

Comment 66 Andy Furniss 2013-02-22 21:25:02 UTC

(In reply to comment #65)
> (In reply to comment #64)
> > (In reply to comment #56)
> > > Created attachment 75331 [details] [review] [review] [review] [review]
> > > simple fix
> > > 
> > > Just this patch alone seems to fix the issue here.
> > 
> > I can still lockup with this 
> 
> Ignore this - I messed up when testing simple fix and was testing unpatched
> - it's running now and hasn't locked yet.

It eventually hard locked with nexuiz.

Comment 67 Alex Deucher 2013-02-22 21:42:21 UTC

Created attachment 75373 [details] [review]
better alternative fix

Please try this one instead of the previous one.

Comment 68 Bryan Quigley 2013-02-22 22:56:38 UTC

The better alternative fix just worked fine for me running: openarena, nexuiz, padman, tremulus, and urbanterror.
I'm going to run it again to be sure.  Will report back if it breaks. (http://openbenchmarking.org/result/1302221-RA-BUGTESTIN78)

Comment 69 Alex Deucher 2013-02-22 23:31:33 UTC

I went ahead and pushed a split up version of attachment 75373 [details] [review] to mesa:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=7ebf83f109db9dde89830d5844107c936cf42e4d
http://cgit.freedesktop.org/mesa/mesa/commit/?id=8442b67f5f3aedbfdb4446164dd09d4eaeda4888
9.1 is supposed to be released today and even if the patch isn't perfect for everyone yet, it's a lot better than it was before.  I'll keep this bug open and we can continue to work on this until we get it nailed.

Comment 70 Andy Furniss 2013-02-22 23:33:56 UTC

(In reply to comment #67)
> Created attachment 75373 [details] [review] [review]
> better alternative fix
> 
> Please try this one instead of the previous one.

I can still hard lock with this and previous - nexuiz is easiest and it normally hard locks. openarena with vanilla is nicer and gpu locks and recovery is possible, but with these patches I did get a hard lock from it.

There is a difference to vanilla in that I am getting the locks after a level/timedemo has run rather than during.

With the patch before this I played 40 minutes of openarena got bored and typed disconnect then after it had exited the level it locked.

With nexuix I just run the demos in order and again the locks are coming after a demo has finished and the game has been showing a text screen for several seconds.

Comment 71 Andy Furniss 2013-02-22 23:49:49 UTC

(In reply to comment #69)
> I went ahead and pushed a split up version of attachment 75373 [details] [review]
> [review] to mesa:
> http://cgit.freedesktop.org/mesa/mesa/commit/
> ?id=7ebf83f109db9dde89830d5844107c936cf42e4d
> http://cgit.freedesktop.org/mesa/mesa/commit/
> ?id=8442b67f5f3aedbfdb4446164dd09d4eaeda4888
> 9.1 is supposed to be released today and even if the patch isn't perfect for
> everyone yet, it's a lot better than it was before.  I'll keep this bug open
> and we can continue to work on this until we get it nailed.

That was quick - I've only just got to try with etqw and with v5 it quickly causes a GPU reset.

Comment 72 Andy Furniss 2013-02-23 12:09:01 UTC

(In reply to comment #71)
> (In reply to comment #69)
> > I went ahead and pushed a split up version of attachment 75373 [details] [review] [review]
> > [review] to mesa:
> > http://cgit.freedesktop.org/mesa/mesa/commit/
> > ?id=7ebf83f109db9dde89830d5844107c936cf42e4d
> > http://cgit.freedesktop.org/mesa/mesa/commit/
> > ?id=8442b67f5f3aedbfdb4446164dd09d4eaeda4888
> > 9.1 is supposed to be released today and even if the patch isn't perfect for
> > everyone yet, it's a lot better than it was before.  I'll keep this bug open
> > and we can continue to work on this until we get it nailed.
> 
> That was quick - I've only just got to try with etqw and with v5 it quickly
> causes a GPU reset.

On vanilla master now. Can still get etqw to provoke a gpu reset but it seems like it's the initial use of the text console when on the main screen that provokes it. If I avoid using it then I can run without locks.

Comment 73 Alex Deucher 2013-02-23 17:41:47 UTC

Does disabling hyperZ help?  Set env var R600_HYPERZ=0

Comment 74 Andy Furniss 2013-02-23 20:01:28 UTC

(In reply to comment #73)
> Does disabling hyperZ help?  Set env var R600_HYPERZ=0

No, that doesn't help.

I have just found another way to avoid it though, running with my card on "low" I can't get it to lock. Turning it up to high as I normally do and it will lock on first (but not subsequent) use of text console every time.

Comment 75 Myckel Habets 2013-02-24 08:22:23 UTC

(In reply to comment #72)
> (In reply to comment #71)
> > (In reply to comment #69)
> > > I went ahead and pushed a split up version of attachment 75373 [details] [review] [review] [review]
> > > [review] to mesa:
> > > http://cgit.freedesktop.org/mesa/mesa/commit/
> > > ?id=7ebf83f109db9dde89830d5844107c936cf42e4d
> > > http://cgit.freedesktop.org/mesa/mesa/commit/
> > > ?id=8442b67f5f3aedbfdb4446164dd09d4eaeda4888
> > > 9.1 is supposed to be released today and even if the patch isn't perfect for
> > > everyone yet, it's a lot better than it was before.  I'll keep this bug open
> > > and we can continue to work on this until we get it nailed.
> > 
> > That was quick - I've only just got to try with etqw and with v5 it quickly
> > causes a GPU reset.
> 
> On vanilla master now. Can still get etqw to provoke a gpu reset but it
> seems like it's the initial use of the text console when on the main screen
> that provokes it. If I avoid using it then I can run without locks.

I'm also on vanilla master now, just got a lock up on open arena (after ~40 min). I'm trying Eriks patch again, because I yet have to get it to lock up with that one (after ~2h of playing).

Comment 76 Bryan Quigley 2014-07-30 14:04:08 UTC

I haven't seen this bug since my last comment. (and for the last month been on a different video card).

Does anyone else still see this issue or shall I close it Fix Released?

Comment 77 Marek Olšák 2014-07-30 15:02:55 UTC

(In reply to comment #76)
> I haven't seen this bug since my last comment. (and for the last month been
> on a different video card).
> 
> Does anyone else still see this issue or shall I close it Fix Released?

I tested RV670 with piglit and DOTA 2 in April this year and it worked fine.

Comment 78 Myckel Habets 2014-07-30 19:14:59 UTC

(In reply to comment #76)
> I haven't seen this bug since my last comment. (and for the last month been
> on a different video card).
> 
> Does anyone else still see this issue or shall I close it Fix Released?

Give me a few days to test (not so much spare time now) and see if I can still trigger the bug.

Comment 79 Bryan Quigley 2017-09-01 20:27:33 UTC

Per my comment on 2014-07-30 and no other updates since that year I'm going to go ahead and mark this Fixed.   Thanks all!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.