Bug 37028 - Amnesia/HPL2 Demo: Strange graphical bugs on r600g
Amnesia/HPL2 Demo: Strange graphical bugs on r600g
Status: RESOLVED FIXED
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600
git
x86-64 (AMD64) Linux (All)
: medium normal
Assigned To: Default DRI bug account
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-05-09 10:39 UTC by Maggioni Marcello
Modified: 2011-06-07 14:45 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Screenshot (166.29 KB, image/jpeg)
2011-05-09 10:39 UTC, Maggioni Marcello
Details
game log (18.07 KB, text/x-log)
2011-05-09 10:40 UTC, Maggioni Marcello
Details
Screenshot (269.74 KB, image/jpeg)
2011-05-09 10:45 UTC, Maggioni Marcello
Details
No problem (322.50 KB, image/jpeg)
2011-05-25 02:46 UTC, Maggioni Marcello
Details
Problem (295.63 KB, image/jpeg)
2011-05-25 02:48 UTC, Maggioni Marcello
Details
proposed patch (1.06 KB, patch)
2011-06-06 13:28 UTC, Pierre-Eric Pelloux-Prayer
Details | Splinter Review
simple test app (2.95 KB, text/plain)
2011-06-07 05:18 UTC, Pierre-Eric Pelloux-Prayer
Details
Further tests (3.38 KB, text/x-c++src)
2011-06-07 09:15 UTC, Maggioni Marcello
Details
Correct test (3.38 KB, text/x-c++src)
2011-06-07 10:12 UTC, Maggioni Marcello
Details
2nd patch (1.06 KB, patch)
2011-06-07 12:52 UTC, Pierre-Eric Pelloux-Prayer
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Maggioni Marcello 2011-05-09 10:39:49 UTC
Created attachment 46492 [details]
Screenshot

Hi, this game (amnesia dark descent) seems to run on the r600g driver, but with some graphical bugs that make the experience "not enjoyable".

There's a big black corruption in the bottom part of the screen that covers 1/5 (almost) of the screen and strange white dots appear in the air. Water also flickers.

The most serious glitch is the first one, the black corruption.

I've attached a screenshot of this.

I also attached the screenshot and the log from the game.

The screenshot and the log is taken from the demo version of the game.
Comment 1 Maggioni Marcello 2011-05-09 10:40:21 UTC
Created attachment 46493 [details]
game log
Comment 2 Maggioni Marcello 2011-05-09 10:41:11 UTC
I also tried to lower every graphical setting to the minimum, but the bug remains.
Comment 3 Maggioni Marcello 2011-05-09 10:45:13 UTC
Created attachment 46494 [details]
Screenshot

I uploaded the wrong screenshot the first time
Comment 4 Sven Arvidsson 2011-05-09 12:52:05 UTC
Yeah, I'm having the same problem.

The Amnesia devs posted a nifty compatibility test for the game which can be used to to quickly recreate these bugs (the black bar seems to start at test 5). It isn't available from their site any longer so I uploaded it here:
http://dl.dropbox.com/u/28577999/RendererFeatTestRound2-Linux32.zip

The game (or at least the RenderFeatTest tool) is rendering correctly with llmvpipe.
Comment 5 Maggioni Marcello 2011-05-25 02:45:09 UTC
I tried the RenderTest and it seemed to work fine (the SSAO test was completely white though) .

I tried the latest git today and then tested Amnesia again.

The situation is improved. The game is not bug free, but now the black area in the bottom of the screen has been substituted by a white area that is present only under certain lighting conditions (I've attached two screenshots , one in which the problem isn't present and one in which it is). Also, when present the "disturbance" flickers at a very high frequency, so it's quite difficult to capture it in a screenshot (you catch it while it is appearing or disappearing on the screen very often), so it is bigger than the one that can be seen in the screenshot.
Comment 6 Maggioni Marcello 2011-05-25 02:46:50 UTC
Created attachment 47127 [details]
No problem

This is the screenshot where the problem is not present.
Comment 7 Maggioni Marcello 2011-05-25 02:48:12 UTC
Created attachment 47128 [details]
Problem

This is the screenshot where the problem is present
Comment 8 Maggioni Marcello 2011-05-25 03:46:01 UTC
An update.

I noticed that in fullscreen the problem is not present with the new drivers (no corruption at the bottom of the screen) , but the problem with the white dots persists.
I also noticed that these dots are in random locations, but they surround point light sources like candles. I thought they were random, because you can see them trough walls , so the candles generating the dots were not directly visible, but behind the wall and I haven't immediatly connected this problem with the lights.
Comment 9 Pierre-Eric Pelloux-Prayer 2011-06-06 13:28:16 UTC
Created attachment 47621 [details] [review]
proposed patch

Could you please try this patch ? It should fix the issue.
Comment 10 Maggioni Marcello 2011-06-06 17:10:42 UTC
Hi, after applying the patch the problem seems to be still there ... my card is a Radeon 5850.

The lights have a white halo surrounding them flickering. Also , the white area at the bottom of the screen problem is still present in fullscreen mode. It was just scene dependent and I thought it was gone because I checked in a scene in which it wasn't present.
Comment 11 Pierre-Eric Pelloux-Prayer 2011-06-07 05:18:56 UTC
Created attachment 47657 [details]
simple test app

Weird, here the patch solves the white-dot-around-lights issue. I'm using a HD4850, maybe that's why...

I've attached a small test app, which exhibits the problem fixed by this patch. Without the patch applied it fails almost every time (the failure is immediate).

Could you please compile it and launch it a few times and tell me :
1) if it fails without the patch 
2) if so, does the patch brings some improvement ?
Comment 12 Maggioni Marcello 2011-06-07 09:15:59 UTC
Created attachment 47666 [details]
Further tests

Hi, your test application crashes on my system with and without the patch.

I determined that the result that the query should return is 68698 samples each execution.

So I modified your test application to see how many times that value is returned instead of some other random values (the modded app is attached) and the output is this:

[hades@artemis ~]$ ./query 
Correct values: 17
Not correct values: 33
Correct values: 50
Not correct values: 0
Correct values: 50
Not correct values: 0
Correct values: 50
Not correct values: 0
Correct values: 50
Not correct values: 0
Correct values: 50

.....

As you can see at the beginning the probability of wrong queries is high and then it corrects itself out. All the subsequent queries seem to be correct. This happens both with your patch and without it.

I tried to step inside the gallium code (r600_query_result) to see what values are returned.
When the value is correct I get values like:

(gdb) n
1641                            query->result += end - start;
2: /x end = 0x80000001824291f0
1: /x start = 0x8000000182424e0c
...

when I get an error I get values for start and end like :
(gdb) n
1641                            query->result += end - start;
2: /x end = 0xffffffffffff0000
1: /x start = 0xffff0000ffffffff
(gdb) n
1636            for (i = 0; i < size; i += 4) {
2: /x end = 0xffffffffffff0000
1: /x start = 0xffff0000ffffffff
(gdb) 


That are obviously wrong.

The very first r600_query_result execution gives correct results, then from the second r600_query_result execution on the results are incorrect and then , after some runs, the values become correct again.

I don't know how to interpret this. Maybe a drm bug? I use kernel 2.6.39
Comment 13 Maggioni Marcello 2011-06-07 10:12:20 UTC
Created attachment 47669 [details]
Correct test

Ups sorry, I completely botched the test application :D

I placed a "!=" instead of a "==" and so all the results in the previous post are inverted.

The query are correct for a little in the beginning and then they get completely wrong afterwards :

[hades@artemis ~]$ ./query
Correct values: 25
Not correct values: 25
Correct values: 0
Not correct values: 50
Correct values: 0
Not correct values: 50
Correct values: 0
Not correct values: 50
Correct values: 0
Not correct values: 50
Correct values: 0
Not correct values: 50
Correct values: 0
Not correct values: 50
Correct values: 0

...

Sorry for the mess, I'm quite tired in this period :p
Comment 14 Pierre-Eric Pelloux-Prayer 2011-06-07 10:19:00 UTC
Remote debugging is hard :-/

Anyway, your modified app is wrong I think (not only the != / == inversion) :
The app is supposed to :
- draw a cube to a random position
- use query to count how many pixels were drawn (so this number change every
time the cube position changes)
- then it compares the result of all queries : if not equal it fails

Now I have no idea why it crashes on your system, did you try to debug it ?
maybe there's an obvious mistake in it ? or maybe you could try to reduce the
num_queries param ?

The issue you've seen with values obviously wrong in 'r600_query_result' is the
problem the patch is supposed to address. Without the patch, the 'size'
variable is wrong (too big) and the 'results' buffer is read too far.

Last, if you have piglit installed, I found a test case in it probably more
robust than my simple app : occlusion_query.
This test sometimes fails when using non-patched git mesa like this : 
$ ./bin/occlusion_query -auto
samples passed = 400, expected = 400
samples passed = 0, expected = 0
samples passed = 2600, expected = 2600
samples passed = 2147483647, expected = 0  [<- ERROR]
samples passed = 400, expected = 400
[...]
PIGLIT: {'result': 'fail' }
Comment 15 Maggioni Marcello 2011-06-07 11:46:29 UTC
Yes, you are perfectly right, I looked better to the application and you are right, but considering that you don't set the "random seed" value with srand the random sequence is always the same, so the first execution of the of the "while" cycle should always have the same 68698 samples drawn. So at least in the first cycle my test app should report

Correct: 50
Non correct: 0

right?


This is an example of a sequence of values reported in the first cycle:

Param is: 68698
Param is: 2147483647
Param is: 2147483647
Param is: 2147483647
Param is: 2147483647
Param is: 68698
Param is: 2147483647
Param is: 68698
Param is: 68698
Param is: 2147483647
Param is: 68698
Param is: 68698
Param is: 68698
Param is: 2147483647
Param is: 2147483647
Param is: 2147483647
Param is: 2147483647
Param is: 68698
Param is: 68698
Param is: 68698
Param is: 68698

The patch is applied, of this I'm sure.

I tried reducing the value of the "size" variable (as you suggested) to 16 (instead of the value that usually is set by num_results that is 32), and the result is :

Param is: 34594
Param is: 2147483647
Param is: 34594
Param is: 34594
Param is: 34594
Param is: 2147483647
Param is: 34594
Param is: 34594
Param is: 34594
Param is: 34594
Param is: 2147483647
Param is: 34594
Param is: 34594
Param is: 2147483647
Param is: 34594

The correct values are almost half of the ones with "size = query->num_results" and the non correct ones are the same ...

It seems like the memory gets corrupted or not correctly initialized sometimes.

Any idea?
Comment 16 Sven Arvidsson 2011-06-07 12:29:19 UTC
I'm using a 5670 (Redwood) and the patch seems to solve the problems here. The test application works with the patch and asserts without it:

 query error : 2147483647 != 68698
 query: query.cpp:104: int main(int, char**): Assertion `false' failed.

I have only given Amnesia a quick try, but there are no longer any obvious rendering errors, and the piglit test general/occlusion_query now passes. Great job! \(^ ^)/
Comment 17 Maggioni Marcello 2011-06-07 12:47:00 UTC
What kernel version do you use Sven?
Comment 18 Pierre-Eric Pelloux-Prayer 2011-06-07 12:52:35 UTC
Created attachment 47680 [details] [review]
2nd patch

Here's a second patch to add.
It removes some memory initialization logic which seems to be wrong (thanks agd5f for the help).

Marcello could you please test it ? (in addition to the 1st patch)

If it's still wrong, could you tell me which value the 'num_backends' and 'ctx->max_db' variables have (both in r600_query_begin for instance) ?
Comment 19 Sven Arvidsson 2011-06-07 13:08:38 UTC
(In reply to comment #17)
> What kernel version do you use Sven?

I'm using 2.6.39.
Comment 20 Maggioni Marcello 2011-06-07 13:56:47 UTC
The second patch doesn't solve for ... I don't know if it is a mine specific problem or if it is universal for all 5850 owners.

this is the output of GDB for the variables you requested in the begin_query function:

r600_query_begin (ctx=0x635918, query=0x7fe690) at r600_hw_context.c:1651
1651    {
(gdb) 
1653            int num_backends = r600_get_num_backends(ctx->radeon);
(gdb) 
1657                    required_space = 16;
(gdb) disp ctx->max_db
3: ctx->max_db = 8
(gdb) disp num_backends
4: num_backends = 8
(gdb) 


Also, I have two cards (using only one of them on linux), I don't know if this actually may create problems.
Comment 21 Pierre-Eric Pelloux-Prayer 2011-06-07 14:14:30 UTC
Could you please update your git repo ?
 
agd5f pushed a patch (r600g: always clear query memory) which could correct the problem (you still need to apply the first one too)
Comment 22 Maggioni Marcello 2011-06-07 14:31:57 UTC
IT WORKS! Yay! :D

I kind of imagined that the problem was in the initialization of the memory somehow and this afternoon I was looking at that piece of cone in begin_query, but I completely missing the fact that it wasn't executed!

Great guys, you are awesome :)

I still have that strange "corruption" at the bottom of the screen (depending on the scene) , but this is a very minor problem when compared with the one you just solved.

I'll let you know if I find out other major problems with the game :)
Comment 23 Alex Deucher 2011-06-07 14:45:44 UTC
Fixes pushed:
7c1d4781920e34ad5419b4c5cf9d54ae14938844
bdf2e112856659816d000699fce606adc4ee9926