Summary: | Texture flicker in native Dota2 in mesa 9.2.0rc1 | ||
---|---|---|---|
Product: | Mesa | Reporter: | Peter Kraus <peter.kraus> |
Component: | Drivers/Gallium/r600 | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | alexandre.f.demers, postmaster, vrodic |
Version: | 9.2 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=70042 | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Corruption 1 (polygon facing bottom left).
Corruption 2 (hardly visible brown noise, above middle tower) possible fix patch fix 2 |
Description
Peter Kraus
2013-08-22 20:59:55 UTC
Can you bisect to track down what commit caused the problem? RE: Emil: Possibly. I couldn't get a screenshot of the corruption, mine looked a lot different, but it's definitely possible. RE: Alex: I'll try when I get back home and have some time (ie tomorrow). I might try and follow the bisect of bug 67887, to see whether it's indeed a duplicate. Hello, did not have a time to bisect yet (seems to be a bit more complex than I thought, or I'm just daft). The behaviour is not fixed by yesterday's game update, and it's not fixed in 9.2rc2 either. What's the best way of doing the bisect if I don't want to pollute my Arch Linux install? Cheers. Hello again, I'm trying to bisect, but the compilation fails with: glsl_parser.cpp:2603:41: error: 'scanner' was not declared in this scope That's on git revision 56114, which is the first one git bisect asks me to build. try `make clean` or `make distclean` before rebuilding. (In reply to comment #5) > glsl_parser.cpp:2603:41: error: 'scanner' was not declared in this scope > Arch uses Bison 3, thus you might need to apply one or all of the following. eb7c8c7fb6e49a04f3fe84a6d438160dc4a14ac0 f043381334a0760ec118d07b6fb7785b5692572a de917b4c4c4dfc949d5f8e3d9eb2dd48b63a3de5 6d2a9220b832d9a0c0cf35fcc5b9de1542af267d 5ffa28df4e4cc22481b4ed41c78632f35765f41d Hello, Alex, your suggestion doesn't seem to work. Emil, sorry, I don't really know what to do with these hashes - if you give me the git command to run in the repo, I could try that... I can get current HEAD (58251) compiled, running, and the regression is still there. Once I start bisecting, though, the "scanner" error as in comment 5 happens. Also, I've managed to get some screenshots of the corruption, see attached. Created attachment 84668 [details]
Corruption 1 (polygon facing bottom left).
Created attachment 84669 [details]
Corruption 2 (hardly visible brown noise, above middle tower)
(In reply to comment #8) > Alex, your suggestion doesn't seem to work. > Regardless, you would need to do make clean/distclean after each bisection step. > Emil, sorry, I don't really know what to do with these hashes - if you give > me the git command to run in the repo, I could try that... > this should do the job $ git cherry-pick $hash Note: you might need to pick one, two, etc of the hashes depending on where exactly your HEAD is (i.e. some of the patches may be already applied). In theory all of those should apply cleanly (git will not complain), although you may not be so lucky. > I can get current HEAD (58251) compiled, running, and the regression is > still there. Once I start bisecting, though, the "scanner" error as in > comment 5 happens. > Once you take a quick look at the patches mentioned you will see which one resolves what issue (git show $hash), and you'll be able to pick the correct ones. > Also, I've managed to get some screenshots of the corruption, see attached. Indeed your issue seems different from the one I've mentioned. Hello, I guess I'm just really unlucky. Compilation of "required merge" according to git bisect fails with: CXXLD libOSMesa.la /usr/bin/ld: i386:x86-64 architecture of input file `../../../../src/mesa/.libs/libmesa.a(ast_expr.o)' is incompatible with i386 output Which is similar to a bug report here: bug 50754. Adding those variables into the build script doesn't help (in fact, they've been in before). Looks like I'm stuffed yet again! Found it: 7948ed1250cae78ae1b22dbce4ab23aceacc6159 is the first bad commit commit 7948ed1250cae78ae1b22dbce4ab23aceacc6159 Author: Marek Olšák <maraeo@gmail.com> Date: Sun Jun 30 19:57:59 2013 +0200 r600g: only flush the caches that need to be flushed during CP DMA operations This should increase performance if constant uploads are done with the CP DMA, because only the cache that needs to be flushed is flushed. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> :040000 040000 69219cf4b797f08ed91c367342386a00f87b1c45 ae2f808ee6d4c10bc631f7327664da6d349b0a83 Msrc That commit does break something, but it was fixed in: http://cgit.freedesktop.org/mesa/mesa/commit/?id=0d7f087483d014305ec96a84ce5a28355f843c86 In other words, can you checkout 7948ed1250cae78ae1b22dbce4ab23aceacc6159, then cherry-pick 0d7f087483d014305ec96a84ce5a28355f843c86, and then see if the issue is still there? Hello, yes, the issue is still there, even after this patch. Hi there, any update on this one? Can I / do I need to test anything? It's a bit of a bummer... Or is this a bug in Dota? The behaviour is there also with linux-3.11. Peter Hi, I am also having this behavior in Dota2. Peter, are you sure 7948ed1250cae78ae1b22dbce4ab23aceacc6159 is the first non-working? Can you try one earlier commit, 1b40398d024d2ac5c8e8b78d0f4941e2a007de2c, and confirm that it works? Because, I am seeing the corruption on even earlier (haven't figured out yet how much earlier). Maybe I am doing something wrong, I am compiling 32-bit mesa on otherwise 64-bit system, but I can see my own printf()'s I've put in to make sure the correct library is loaded when running Dota2. (In reply to comment #17) > Hi, > I am also having this behavior in Dota2. Peter, are you sure > 7948ed1250cae78ae1b22dbce4ab23aceacc6159 is the first non-working? Can you > try one earlier commit, 1b40398d024d2ac5c8e8b78d0f4941e2a007de2c, and > confirm that it works? Because, I am seeing the corruption on even earlier > (haven't figured out yet how much earlier). > > Maybe I am doing something wrong, I am compiling 32-bit mesa on otherwise > 64-bit system, but I can see my own printf()'s I've put in to make sure the > correct library is loaded when running Dota2. Hello, I have just checked, commit 1b40398d024d2ac5c8e8b78d0f4941e2a007de2c (+ the 5 patches for Bison 3, above) still works fine. Current HEAD is still broken. Peter Created attachment 86107 [details] [review] possible fix Could you please try this patch? Hello, I wont be able to test it until October. Will update then, unless someone else can test it instead. Peter (In reply to comment #20) > Hello, > I wont be able to test it until October. Will update then, unless someone > else can test it instead. > Peter I may give it a try this weekend, but I'm using a HD6950 (I'm also suffering from texture flickering problem with Dota 2). Doesn't seem to fix anything over here. Created attachment 87165 [details] [review] patch Could please test this patch? The patch reverting 7948ed1250cae78ae1b22dbce4ab23aceacc6159 fixes it for me on 6630M (prime) The issues I was experiencing: - Corruption 1 (polygon facing bottom left) - texture flickering Tested patch in attachment 87165 [details] [review] on HD 6950, fixes the corruption reported in bug 70042 also. Yes, this "fixes" the issue here too. Cheers. fun observation: Instead of reverting, setting this at the end of r600_cp_dma_copy_buffer() appears to fix it for me: rctx->b.flags |= R600_CONTEXT_INV_VERTEX_CACHE; (R600_CONTEXT_INV_CONST_CACHE will also work) Thing I don't understand about this is that if I instead set the flag just before r600_flush_emit() is called (2 places) the corruption is still there. I must be missing something. (In reply to comment #27) > fun observation: > > Instead of reverting, setting this at the end of r600_cp_dma_copy_buffer() > appears to fix it for me: > rctx->b.flags |= R600_CONTEXT_INV_VERTEX_CACHE; > > (R600_CONTEXT_INV_CONST_CACHE will also work) > > > Thing I don't understand about this is that if I instead set the flag just > before r600_flush_emit() is called (2 places) the corruption is still there. > I must be missing something. I'd be interested to hear about it from Marek. (In reply to comment #27) > fun observation: > > Instead of reverting, setting this at the end of r600_cp_dma_copy_buffer() > appears to fix it for me: > rctx->b.flags |= R600_CONTEXT_INV_VERTEX_CACHE; > > (R600_CONTEXT_INV_CONST_CACHE will also work) > > > Thing I don't understand about this is that if I instead set the flag just > before r600_flush_emit() is called (2 places) the corruption is still there. > I must be missing something. Your observation also applies to my HD 6950. Is there any performance regression? If there isn't, I'm okay with the revert. Marek, you ask about, so... If I'm right, I see some on my poor RV730 AGP under git, but not with Dota2 or the like 'cause I haven't such stuff. I see it with Q3A demo (my old testing one) and mesa-demos/ objviewer bobcat.obj/buddha.obj/bunny.obj/GreatLakesBiplaneHP.obj. Most with high frame rate ones (bobcat.obj). (In reply to comment #30) > Is there any performance regression? If there isn't, I'm okay with the > revert. Marek, do you have any application you propose that I could use to benchmark it? I'll gladly give it a run. (In reply to comment #27) > fun observation: > > Instead of reverting, setting this at the end of r600_cp_dma_copy_buffer() > appears to fix it for me: > rctx->b.flags |= R600_CONTEXT_INV_VERTEX_CACHE; > > (R600_CONTEXT_INV_CONST_CACHE will also work) > Well, if we are using CP DMA to update a constant buffer or vertex buffer, we need to flush the the apprortiate shader read caches. (In reply to comment #33) > (In reply to comment #27) > > fun observation: > > > > Instead of reverting, setting this at the end of r600_cp_dma_copy_buffer() > > appears to fix it for me: > > rctx->b.flags |= R600_CONTEXT_INV_VERTEX_CACHE; > > > > (R600_CONTEXT_INV_CONST_CACHE will also work) > > > > Well, if we are using CP DMA to update a constant buffer or vertex buffer, > we need to flush the the apprortiate shader read caches. So, if I understand correctly what you mean, before reverting commit 7948ed1250cae78ae1b22dbce4ab23aceacc6159, the problem was that we were not flushing correctly (read "when expected") caches. Am I understanding correctly? Why would adding either rctx->b.flags |= R600_CONTEXT_INV_VERTEX_CACHE or rctx->b.flags |= R600_CONTEXT_INV_CONST_CACHE work in fixing the texture glitch (which are coming from an unknown buffer type for now) if they are not intended for the same buffer type? Also, I'm still interested in benchmarking with and without commit 7948ed1250cae78ae1b22dbce4ab23aceacc6159, so I'll gladly run any suggestion. Would something like Phoronix test suite be of any interest? Feel free to do some benchmarking if you want. The question is why this code at the end of the function didn't set one of the flush flags: r600_flag_resource_cache_flush(rctx, dst); Constants are usually read with the shader cache and indirect addressing of the constants goes through the vertex cache. Also some chips don't have the vertex cache, so they have to use the texture cache instead. That's why both VERTEX and CONST work for some chips in this situation. The purpose of the flags is to express what type of buffer was changed. The flushing code will then figure out which cache should be flushed. (In reply to comment #30) > Is there any performance regression? If there isn't, I'm okay with the > revert. I've run with the original commit 7948ed (+ little workaround) and with it remove (both using latest git as of yesterday). I do see a performance difference. I have the following result with the original commit (+ workaround): Nexuiz -> 58.59 OpenArea -> 59.47 World of Padman -> 59.70 Urban Terror -> 38.03 Warsow -> 157.43 Once removed: Nexuiz -> 58.47 OpenArea -> 59.20 World of Padman -> 59.63 Urban Terror -> 37.07 Warsow -> 141.55 Now, I should run it again and be sure I'm not enabling vsync here and there. Warsow seems to be the one showing the greatest difference since it was not hitting the refresh limit. (In reply to comment #36) > (In reply to comment #30) > > Is there any performance regression? If there isn't, I'm okay with the > > revert. > > I've run with the original commit 7948ed (+ little workaround) and with it > remove (both using latest git as of yesterday). I do see a performance > difference. > > I have the following result with the original commit (+ workaround): > Nexuiz -> 58.59 > OpenArea -> 59.47 > World of Padman -> 59.70 > Urban Terror -> 38.03 > Warsow -> 157.43 > > Once removed: > Nexuiz -> 58.47 > OpenArea -> 59.20 > World of Padman -> 59.63 > Urban Terror -> 37.07 > Warsow -> 141.55 > > Now, I should run it again and be sure I'm not enabling vsync here and > there. Warsow seems to be the one showing the greatest difference since it > was not hitting the refresh limit. I ran it again with vsync disabled and, while the scores went up, results are pretty close from one to another. This time, Warsow scored a bit lower with 7948ed than without it. So, it as no real impact on performances from what I can see. This issue was fixed by the revert. Closing. Sorry to reopen this, but is there any reason this has not been commited to 9.2.3? I am also seeing the corruption on Mesa 10.0.1. (In reply to comment #40) > I am also seeing the corruption on Mesa 10.0.1. What GPU are you using? (In reply to comment #41) > (In reply to comment #40) > > I am also seeing the corruption on Mesa 10.0.1. > > What GPU are you using? Radeon 4850. Does it work with others? (In reply to comment #42) > (In reply to comment #41) > > (In reply to comment #40) > > > I am also seeing the corruption on Mesa 10.0.1. > > > > What GPU are you using? > > Radeon 4850. Does it work with others? Was fixed on 6950 (I'll test it again later today to be sure things are still OK). Patches point to evergreen code and above. Marek could tell if it applies (or if a similar solution should be made) to R700 (HD4XXX). Got flickering increasing in time here too. Radeon HD 5750 (Juniper PRO, evergreen). (that on a up-to-date fedora 19 x86-64, yes I did install that for all the gaming c++ kludge bloody hell) (In reply to comment #44) > Got flickering increasing in time here too. Radeon HD 5750 (Juniper PRO, > evergreen). > (that on a up-to-date fedora 19 x86-64, yes I did install that for all the > gaming c++ kludge bloody hell) Do you mean the flickering increases in time even if you stay where you are in DOTA 2? If so, it may be a different bug. The patch fixed the flickering I was seeing (and I think it was the same for Peter) in some specific areas (always the same flickering intensity in a given area, it could be related to some elements in the are like the river or towers for example). Peter reopened this bug because the patch was not backported to mesa 9.2 but it had been fixed for him too when applied manually. (In reply to comment #43) > (In reply to comment #42) > > (In reply to comment #41) > > > (In reply to comment #40) > > > > I am also seeing the corruption on Mesa 10.0.1. > > > > > > What GPU are you using? > > > > Radeon 4850. Does it work with others? > > Was fixed on 6950 (I'll test it again later today to be sure things are > still OK). Patches point to evergreen code and above. Marek could tell if it > applies (or if a similar solution should be made) to R700 (HD4XXX). Retested with latest mesa on kernel 3.13-rc3 and there is no flickering at all on Cayman. (In reply to comment #40) > I am also seeing the corruption on Mesa 10.0.1. Same here: lots of flickering textures. rv730 (HD4xxx). Present with the tip of the master branch. Not influenced by 7948ed1250. So far I haven't been able to find a revision that does *not* show the flickering. I went as far back as December 2012. The original bug was fixed by mesa-10.0.1 release. I suggest you open a new bug... (In reply to comment #47) > (In reply to comment #40) > > I am also seeing the corruption on Mesa 10.0.1. > > Same here: lots of flickering textures. rv730 (HD4xxx). > > Present with the tip of the master branch. Not influenced by 7948ed1250. > So far I haven't been able to find a revision that does *not* show the > flickering. I went as far back as December 2012. Please set this environment variable: R600_DEBUG=nodma,nocpdma and tell us if it fixes anything. (In reply to comment #49) > (In reply to comment #47) > > (In reply to comment #40) > > > I am also seeing the corruption on Mesa 10.0.1. > > > > Same here: lots of flickering textures. rv730 (HD4xxx). > > > > Present with the tip of the master branch. Not influenced by 7948ed1250. > > So far I haven't been able to find a revision that does *not* show the > > flickering. I went as far back as December 2012. > > Please set this environment variable: > > R600_DEBUG=nodma,nocpdma > > and tell us if it fixes anything. These two options do not help. Do you want me to file a new bug? (In reply to comment #50) > (In reply to comment #49) > > (In reply to comment #47) > > > (In reply to comment #40) > > > > I am also seeing the corruption on Mesa 10.0.1. > > > > > > Same here: lots of flickering textures. rv730 (HD4xxx). > > > > > > Present with the tip of the master branch. Not influenced by 7948ed1250. > > > So far I haven't been able to find a revision that does *not* show the > > > flickering. I went as far back as December 2012. > > > > Please set this environment variable: > > > > R600_DEBUG=nodma,nocpdma > > > > and tell us if it fixes anything. > > These two options do not help. Do you want me to file a new bug? Please try also: R600_DEBUG=noinvalrange There will be a severe performance hit though. (In reply to comment #51) > R600_DEBUG=noinvalrange Yes, that one fixes the problem. Created attachment 93328 [details] [review] fix 2 (In reply to comment #52) > (In reply to comment #51) > > R600_DEBUG=noinvalrange > > Yes, that one fixes the problem. Please try the attached patch with and without: R600_DEBUG=nodma,nocpdma (In reply to comment #53) > Created attachment 93328 [details] [review] [review] > fix 2 > > (In reply to comment #52) > > (In reply to comment #51) > > > R600_DEBUG=noinvalrange > > > > Yes, that one fixes the problem. > > Please try the attached patch with and without: R600_DEBUG=nodma,nocpdma I'm seeing additional texture corruption with that patch. I can't say if nodma,nocpdma helps or makes them worse. (In reply to comment #54) > (In reply to comment #53) > > Created attachment 93328 [details] [review] [review] [review] > > fix 2 > > > > (In reply to comment #52) > > > (In reply to comment #51) > > > > R600_DEBUG=noinvalrange > > > > > > Yes, that one fixes the problem. > > > > Please try the attached patch with and without: R600_DEBUG=nodma,nocpdma > > I'm seeing additional texture corruption with that patch. I can't say if > nodma,nocpdma helps or makes them worse. Oops, I only just noticed that my kernel (3.13.2) is rejecting CS buffers: [ 377.460557] radeon 0000:01:00.0: r600_cs_track_check:756 mask 0x00000777 | 0x00000FFF no cb for 2 [ 377.460563] radeon 0000:01:00.0: r600_packet3_check:1708 invalid cmd stream [ 377.460566] [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream ! This is with libdrm 2.4.52 and Mesa f47e5. Make sure your kernel has this patch: drm/radeon: skip colorbuffer checking if COLOR_INFO.FORMAT is set to INVALID Alright, that fixed the CS rejections as expected. However the Mesa patch doesn't fix the flickering. @Tilman Sauerbeck : did you file a new bug ? If not, shouldn't this bug be reopened ? FYI, the flickering on R600 and R700 should be fixed by my latest Mesa commits. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.