Bug 84627 - (bisected) 32bit corruption with PIPE_USAGE_STREAM reverted
Summary: (bisected) 32bit corruption with PIPE_USAGE_STREAM reverted
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86 (IA32) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
: 85526 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-10-03 09:21 UTC by smoki
Modified: 2015-08-02 11:33 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
corruption (103.86 KB, image/jpeg)
2014-10-06 07:27 UTC, smoki
Details
game corruption (2.68 MB, image/png)
2014-10-06 08:17 UTC, smoki
Details
mesa: Only use two caching buffer managers again (888 bytes, patch)
2014-10-07 04:01 UTC, Michel Dänzer
Details | Splinter Review
drm/radeon: Ignore RADEON_GEM_GTT_WC on Kabini (841 bytes, patch)
2014-10-10 09:48 UTC, Michel Dänzer
Details | Splinter Review
Added Aruba GPU to Ignore RADEON_GEM_GTT_WC on Kabini (894 bytes, patch)
2014-11-26 00:05 UTC, Zbigniew Luszpinski
Details | Splinter Review
drm/radeon: Ignore RADEON_GEM_GTT_WC on 32-bit x86 (902 bytes, patch)
2014-11-27 08:07 UTC, Michel Dänzer
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description smoki 2014-10-03 09:21:49 UTC
As Michel asks me here https://bugs.freedesktop.org/show_bug.cgi?id=82050#add_comment

> (In reply to comment #65)
> Keep in mind that revert broke 32bit complitely, lot of corruption :)

>I haven't been able to reproduce that. If you still can, please file a bug for >it, as there's nothing preventing the kernel from using GTT instead of VRAM when >the latter is full.

 So i can reproduce it today too on 32bit (64bit is not affected, at least not by corruption) drm-next-3.18 kernel, 3.17.rc7, current mesa git and reverted this:

 http://lists.freedesktop.org/archives/mesa-dev/2014-August/066746.html

 For the mesa part, i already bisected that it starts at (might be same reason as bug 83436, but let alone that one for now):

 http://cgit.freedesktop.org/mesa/mesa/commit/?id=07c65b85eada8dd34019763b6e82ed4257a9b4a6

 For the kernel part not bisected yet, but it is somewhere in between 3.16 and 3.17-rc1, so hopefully i will bisect that maybe today :)
Comment 1 smoki 2014-10-03 09:37:33 UTC
 OK bisected kernel fast, by lucky guessing :) So for the kernel it is:

 02376d8282b88f07d0716da6155094c8760b1a13
 
 drm/radeon: Allow write-combined CPU mappings of BOs in GTT (v2)
v2: fix rebase onto drm-fixes
Comment 2 Michel Dänzer 2014-10-06 06:56:23 UTC
What kind of corruption are we talking about? Can you attach a screenshot?
Comment 3 smoki 2014-10-06 07:02:51 UTC
 How to say it is some kind of immidiate and raising corruption everywhere. Appear alredy after i login on 32bit machine, basically nearly everything is corrupted. Fonts, images, games... everything.
Comment 4 smoki 2014-10-06 07:27:04 UTC
Created attachment 107392 [details]
corruption

 Ok just builded again with stream reverted, if that screenshot can explain something.
Comment 5 smoki 2014-10-06 08:17:36 UTC
Created attachment 107401 [details]
game corruption

(In reply to smoki from comment #4)
> Created attachment 107392 [details]
> corruption
> 
>  Ok just builded again with stream reverted, if that screenshot can explain
> something.

 That screenshot does not show icons are also corrupted, maouse cursor also dissapear or it is just a point or some kind of mess on screen, etc... One screenshot from games, i don't know how i get there without visible cursor and fonts, etc :D
Comment 6 Michel Dänzer 2014-10-06 08:24:09 UTC
And this goes away if RADEON_FLAG_GTT_WC isn't set in r600_buffer_common.c:r600_init_resource()?
Comment 7 smoki 2014-10-06 08:40:32 UTC
(In reply to Michel Dänzer from comment #6)
> And this goes away if RADEON_FLAG_GTT_WC isn't set in
> r600_buffer_common.c:r600_init_resource()?

 No, that does not help for this corruption. That helps only where games stutter when GTT loads.
Comment 8 Michel Dänzer 2014-10-06 09:53:02 UTC
(In reply to smoki from comment #7)
> No, that does not help for this corruption.

Note that I mean disabling *all* uses of RADEON_FLAG_GTT_WC. With those disabled, the behaviour should be basically the same as before the commit you bisected to. If that really doesn't work anymore, can you bisect which other commit in the meantime broke things without RADEON_FLAG_GTT_WC?
Comment 9 smoki 2014-10-06 10:07:37 UTC
(In reply to Michel Dänzer from comment #8)
> (In reply to smoki from comment #7)
> > No, that does not help for this corruption.
> 
> Note that I mean disabling *all* uses of RADEON_FLAG_GTT_WC. With those
> disabled, the behaviour should be basically the same as before the commit
> you bisected to. If that really doesn't work anymore, can you bisect which
> other commit in the meantime broke things without RADEON_FLAG_GTT_WC?

 I tried only to disable it in default, and also both RADEON_FLAG_GTT_WC together and also tried your patch form here https://bugs.freedesktop.org/show_bug.cgi?id=84662#c12 ... it return to old behavior, so there is no stutter but there is not anymore performance gain... and corruption is still there.

 Don't know which other commit to bisect it does not work at mesa 07c65b85eada8dd34019763b6e82ed4257a9b4a6 and kernel commit 02376d8282b88f07d0716da6155094c8760b1a13 ... even with both RADEON_FLAG_GTT_WC disabled and the same disablement does not work with current stack.
Comment 10 smoki 2014-10-06 10:09:39 UTC
 Whatever i do corruption is still there and i can't compile 32bit mesa with normal mtune like in bug 84627 - it is same bisect.
Comment 11 Michel Dänzer 2014-10-07 04:01:32 UTC
Created attachment 107452 [details] [review]
mesa: Only use two caching buffer managers again

Hmm, but if it's not the GTT_WC flag which triggers the corruption, I'm not sure what it could be...

Here's a shot in the dark, does this patch work around the problem?
Comment 12 smoki 2014-10-07 09:40:53 UTC
(In reply to Michel Dänzer from comment #11)
> Created attachment 107452 [details] [review] [review]
> mesa: Only use two caching buffer managers again
> 
> Hmm, but if it's not the GTT_WC flag which triggers the corruption, I'm not
> sure what it could be...

 Actually that is it doublechecked it now, sorry i think i did not remove all GTT_WC form there, so yeah if i remove all GTT_WC from r600_buffer_common.c there is no problem anymore.
Comment 13 smoki 2014-10-07 10:33:27 UTC
 Maybe to mention here what i playing a little there yesterday, it is not only PIPE_USAGE_STREAM reverted which triggers problem/corruption on 32bit, for example if i only add res->domains = RADEON_DOMAIN_GTT; to default i can't login anymore:

	default:
		/* Not listing GTT here improves performance in some apps. */
		res->domains = RADEON_DOMAIN_VRAM;
+		res->domains = RADEON_DOMAIN_GTT;
		flags |= RADEON_FLAG_GTT_WC;
		break;

 On 64bit this is also not a problem, after adding just this performance goes up by 25% in games like Torchlight, Dota2, etc... i only found slight decrease in other apps by up to 1-3% (Xonotic, Openarena, etc) and it also remove stutter for me in Unigine Valley. So this seems to be better default now, but...

 Then again that is only because 'flags |= RADEON_FLAG_GTT_WC;' is there so that made a difference to not fps go down much. If commented that (with DOMAIN_GTT there) a see even greater boost by 35%-40% performance go up in Torchlight, but it is not actually so good anymore for other apps (Valley suffer). So that seems like some apps very much like, but some not at all :D

 Yeah on 32bit i can't play with those, there are various triggered problems with GTT_WC being there.
Comment 14 smoki 2014-10-09 23:34:38 UTC
  Revert is upstreamed with commit http://cgit.freedesktop.org/mesa/mesa/commit/?id=7b4276d7acf2e0f77044cb50caa6ad936fa78786, i am now in corruption enviroment :) Good is only that -mtune=generic now works OK performance wise.

 So on top of that, only if i disable all RADEON_FLAG_GTT_WC lines in r600_buffer_common.c corruption goes away.
Comment 15 Michel Dänzer 2014-10-10 09:48:58 UTC
Created attachment 107655 [details] [review]
drm/radeon: Ignore RADEON_GEM_GTT_WC on Kabini

There seems to be a general problem with write-combined CPU mappings of GTT on Kabini. We previously disabled write-combined mappings of the ring buffers again because other users reported ring test failures on Kabini with that, but from this report it looks like the problem runs deeper.

This kernel patch should work around it for now.
Comment 16 smoki 2014-10-10 16:20:17 UTC
 Yes that patch workarounds corruption to appear.
Comment 17 smoki 2014-10-15 01:28:58 UTC
 This thing starts to work (on proper 3.17 kernel) but on multiarched 64bit OS. Don't know what what is issue about on pure 32bit OS? tried with 64bit kernel there too and it was still an issues there, so this might not be radeon bug at all...

 Seems to me this will be partly invalid bug Michel, not sure how many people even try to run 32bit OS on Kabini, maybe none :)
Comment 18 smoki 2014-10-15 01:58:00 UTC
 Is it something like - glamor 32bit bug? So it might not be specific to asic... who knows, but really wild guessing... i don't know :)
Comment 19 Michel Dänzer 2014-10-15 07:05:41 UTC
(In reply to smoki from comment #17)
>  This thing starts to work (on proper 3.17 kernel) but on multiarched 64bit
> OS. Don't know what what is issue about on pure 32bit OS? tried with 64bit
> kernel there too and it was still an issues there, so this might not be
> radeon bug at all...

Running a 64 bit kernel is by definition not a 'pure 32 bit' OS.

So what exactly is the difference between the working and broken cases? It sounds like you're using a different distro installation in each case. Maybe there's a difference between them which isn't directly related to the graphics stack? E.g. one of them updating the CPU (not GPU!) microcode at boot, but the other one not.


(In reply to smoki from comment #18)
>  Is it something like - glamor 32bit bug?

Seems unlikely.
Comment 20 smoki 2014-10-15 13:53:28 UTC
(In reply to Michel Dänzer from comment #19)
> 
> Running a 64 bit kernel is by definition not a 'pure 32 bit' OS.
> 

 I run 32bit pae kernel on 32bit OS of course, but just tried there 64bit kernel in hope that might help, but nope.
 
> So what exactly is the difference between the working and broken cases? It
> sounds like you're using a different distro installation in each case. Maybe
> there's a difference between them which isn't directly related to the
> graphics stack? E.g. one of them updating the CPU (not GPU!) microcode at
> boot, but the other one not.

 Difference is corruption. I have those installed on separate partitions, one pure 32bit Debian and one 64bit - both are the same up to date Sid intallations so it is basicaly only different arch. 

 So on pure 32bit OS one there are these problems, on 64bit one where i run multiarch (32bit) so the same 32bit programs, there is no corruption. 

 I also tried to build stack on 32bit, but runing it on 64bit multiarch... it works there, but not in pure 32bit OS installation. And i also tried fresh 32bit OS installation and nope again does not work == corruption is there.
Comment 21 smoki 2014-10-15 15:21:56 UTC
(In reply to Michel Dänzer from comment #19)
> E.g. one of them updating the CPU (not GPU!) microcode at
> boot, but the other one not.

 Nope i tried both with/without updating CPU microcode, tried also distro kernel from experimental that is linux-image-3.17-rc5-686-pae currently... in all cases corruption is still there.

 Well to reproduce it someone must install 32bit OS :)
Comment 22 Michel Dänzer 2014-10-16 03:22:27 UTC
(In reply to smoki from comment #20)
> (In reply to Michel Dänzer from comment #19)
> > Running a 64 bit kernel is by definition not a 'pure 32 bit' OS.
> 
>  I run 32bit pae kernel on 32bit OS of course, but just tried there 64bit
> kernel in hope that might help, but nope.

What I meant is that there's no fundamental difference between doing that and running 32 bit apps in a 64-bit distro installation.

To clarify though, for that test did you install a linux-image-*-amd64_*_i386.deb in the 32-bit install, or did you manually copy the kernel from the 64-bit install? If the former, can you also try the latter to rule out any difference between the *_i386.deb and *_amd64.deb?


> So on pure 32bit OS one there are these problems, on 64bit one where i run
> multiarch (32bit) so the same 32bit programs, there is no corruption. 

So you need to isolate what exactly makes the difference. If it's not the kernel and not 'the stack' (what does that mean exactly?), what is it?
Comment 23 smoki 2014-10-16 04:13:17 UTC
(In reply to Michel Dänzer from comment #22)
 
> What I meant is that there's no fundamental difference between doing that
> and running 32 bit apps in a 64-bit distro installation.

 I know, should not be any difference but yeah don't know what happens...
 
> To clarify though, for that test did you install a
> linux-image-*-amd64_*_i386.deb in the 32-bit install, or did you manually
> copy the kernel from the 64-bit install? If the former, can you also try the
> latter to rule out any difference between the *_i386.deb and *_amd64.deb?


 I run normaly pae kernel there that is default on pure 32bit for me. Yes i tried that one too: amd64 kernel packaged in i386.deb. Those both and my compiled kernel have corruption.

 Tried now what you said pure 64bit kernel packaged as amd64.deb and voila no corruption with it! Both are distro kernels from experimental and my kernels behave the same there, 32bit as i386.deb one give corruptions, 64bit and packaged as amd64.deb works fine. But hmmm, how and why those differ both are 64bit in different package or?
Comment 24 smoki 2014-10-16 04:28:14 UTC
 So doublechecked to be sure both are 64bit, this one works:

linux-image-3.17-rc5-amd64_3.17~rc5-1~exp1_amd64.deb

 this one - corruption:

linux-image-3.17-rc5-amd64_3.17~rc5-1~exp1_i386.deb
Comment 25 Michel Dänzer 2014-10-16 06:45:07 UTC
I think you should work with the Debian kernel maintainers to isolate what exactly makes the difference between the two kernel packages.
Comment 26 Michel Dänzer 2014-10-29 02:08:59 UTC
*** Bug 85526 has been marked as a duplicate of this bug. ***
Comment 27 Hamish Wilson 2014-11-08 23:17:00 UTC
Not that I suspect this to surprise anyone, but I felt I should mention that this bug now also appears with Mesa 10.3.3 as well.
Comment 28 Nils Holland 2014-11-23 17:33:45 UTC
I first started seeing this bug after upgrading my (completely 32 bit) system from Mesa 10.3.1 to Mesa 10.3.4. Previously I was using version 3.17.3 of the kernel, but upon seeing that there were quite a few changes made to the radeon code between kernels 3.17.3 and 3.17.4, I decided to try 3.17.4 for a change. It seems that the corruption I'm seeing is not as bad when using Mesa 10.3.4 with kernel 3.17.4 compared to 3.17.3, but it is still there and makes things barely usable. I'm downgarding to Mesa 10.3.1 again which makes things work properly for me - still, any more help needed in diagnosing this, and if so, anything I can do to help?
Comment 29 Michel Dänzer 2014-11-25 07:33:03 UTC
(In reply to Nils Holland from comment #28)
> any more help needed in diagnosing this, and if so, anything I can do to help?

See comment 22. From comment 23, it sounds like it might actually be a 32-bit toolchain issue, which somehow leaks into 64-bit kernel builds as well.

If there's any way you can run a 'real' 64-bit kernel on your distro, that's probably the best way to avoid the problem for now.
Comment 30 Zbigniew Luszpinski 2014-11-26 00:05:17 UTC
Created attachment 110024 [details] [review]
Added Aruba GPU to Ignore RADEON_GEM_GTT_WC on Kabini

Comment 27 Hamish Wilson 2014-11-08 23:17:00 UTC 
> I should mention that this bug now also appears with Mesa 10.3.3 as well.

Actually this bug appeared first in Mesa 10.3.2:
http://cgit.freedesktop.org/mesa/mesa/log/?h=10.3&ofs=100
with this patch:
2014-10-13	r600g,radeonsi: Always use GTT again for PIPE_USAGE_STREAM buffers	Michel Dänzer
http://cgit.freedesktop.org/mesa/mesa/commit/?h=10.3&id=64c2bdc334ba472603b1e7cd2c3046cfbce285b6
I reported this finding to Michael and he responded with link to this bug here.
This patch continues to be in later releases so all new mesa releases are affected on 32 bit Linux.

The quick workaround for mesa 10.3.2 and later on 32 bit Linux is to just revert this Michael's patch above
OR
add your GPU to  Michael's exclusion patch "drm/radeon: Ignore RADEON_GEM_GTT_WC on Kabini" attached to this bug report.

Just apply only one of these solutions. I prefer to add my GPU to exclusion list "drm/radeon: Ignore RADEON_GEM_GTT_WC on Kabini" patch instead of patching mesa.

See patch attached here where modified Michael's patch to add my Aruba GPU.
Comment 31 Michel Dänzer 2014-11-26 02:52:41 UTC
(In reply to Zbigniew Luszpinski from comment #30)
> Just apply only one of these solutions. I prefer to add my GPU to exclusion
> list "drm/radeon: Ignore RADEON_GEM_GTT_WC on Kabini" patch instead of
> patching mesa.

While that patch works around the problem, it doesn't make much sense conceptually: Since everything is working fine with 64-bit kernels, there doesn't seem to be anything wrong with any particular GPUs.

Also, disabling write-combining decreases performance.

For those reasons, until someone figures out the root cause of this problem, IMHO the preferred workaround is to run a 64-bit kernel. Any particular reason why you can't do that?
Comment 32 Michel Dänzer 2014-11-26 02:54:38 UTC
(In reply to smoki from comment #23)
>  Tried now what you said pure 64bit kernel packaged as amd64.deb and voila
> no corruption with it! Both are distro kernels from experimental and my
> kernels behave the same there, 32bit as i386.deb one give corruptions, 64bit
> and packaged as amd64.deb works fine. But hmmm, how and why those differ
> both are 64bit in different package or?

Is there any difference in the /boot/config-3.* file between the two packages?
Comment 33 smoki 2014-11-26 06:55:02 UTC
(In reply to Michel Dänzer from comment #32)
> Is there any difference in the /boot/config-3.* file between the two
> packages?

 As i see only one: amd64.amd64 package has CONFIG_DEBUG_INFO=y while amd64.i386 has CONFIG_DEBUG_INFO is not set ... diff says only this:

< CONFIG_DEBUG_INFO=y
< # CONFIG_DEBUG_INFO_REDUCED is not set
< # CONFIG_DEBUG_INFO_SPLIT is not set
< # CONFIG_DEBUG_INFO_DWARF4 is not set
---
> # CONFIG_DEBUG_INFO is not set

 But also affected 686-pae has CONFIG_DEBUG_INFO=y too, so i guess that config option is not an issue.
Comment 34 Nils Holland 2014-11-26 21:56:56 UTC
Ok, I believe I can confirm smoki's findings that only 32 bit kernels are affected while identically configured 64 bit ones are not.

I'm on Gentoo Linux where I've custom-compiled my whole system, including the kernel (which, originally, was a 32 bit one). I now took my kernel config and changed only two things: I turned on the option to compile a 64 bit kernel as well as the option that allows me to run 32 bit binaries with the resulting kernel. I then compiled the exact same kernel sources I had used before and upon booting the new 64 bit kernel Mesa worked absolutely fine without any visible corruption.

I guess that proves Michels point that this issue seems to be 32 bit toolchain related.
Comment 35 Michel Dänzer 2014-11-27 08:07:08 UTC
Created attachment 110118 [details] [review]
drm/radeon: Ignore RADEON_GEM_GTT_WC on 32-bit x86

Can one of you confirm that this patch works around the problem for a 32-bit kernel build?
Comment 36 smoki 2014-11-27 08:52:12 UTC
(In reply to Michel Dänzer from comment #35)
> Created attachment 110118 [details] [review] [review]
> drm/radeon: Ignore RADEON_GEM_GTT_WC on 32-bit x86
> 
> Can one of you confirm that this patch works around the problem for a 32-bit
> kernel build?

 Yes, it works for 686-pae kernel i've just builded :)
Comment 37 Nils Holland 2014-11-27 18:39:34 UTC
(In reply to Michel Dänzer from comment #35)
> Created attachment 110118 [details] [review] [review]
> drm/radeon: Ignore RADEON_GEM_GTT_WC on 32-bit x86
> 
> Can one of you confirm that this patch works around the problem for a 32-bit
> kernel build?

Yes, just like smoki, I can confirm that with this patch, I can run current Mesa builds just fine on my 32 bit kernel.
Comment 38 Igor Gnatenko 2014-12-04 13:56:59 UTC
Any news? Seems patch works ok for people. can you sned it for 10.3 stable also?
Comment 39 Nils Holland 2014-12-04 18:27:20 UTC
(In reply to Igor Gnatenko from comment #38)
> Any news? Seems patch works ok for people. can you sned it for 10.3 stable
> also?

The patch that fixed the issue for us is actually a kernel patch and not a Mesa patch and as such should work regardless of what Mesa version you're using.

Regarding the question on why this issue only seems to affect 32 bit kernels and not 64 bit ones, and if the proposes patch can serve as a permanent fix or is coupled with too many undesirable side-effects ... well, Michel can probably give more qualified information about that.

As it stands, the patch disables write-combining and thus will probably have negative effects on performance, although I don't really notice any difference on my own machine compared to the time "when things worked fine without any patching at all" (but I'm not really running very demanding stuff). Just using a 64 bit kernel instead of going down the patch route should fix the problem without these side-effects.
Comment 40 Hamish Wilson 2015-01-08 19:29:58 UTC
Did the kernel patch ever get merged?
Comment 42 Marek Olšák 2015-08-02 11:33:34 UTC
(In reply to Nils Holland from comment #37)
> (In reply to Michel Dänzer from comment #35)
> > Created attachment 110118 [details] [review] [review] [review]
> > drm/radeon: Ignore RADEON_GEM_GTT_WC on 32-bit x86
> > 
> > Can one of you confirm that this patch works around the problem for a 32-bit
> > kernel build?
> 
> Yes, just like smoki, I can confirm that with this patch, I can run current
> Mesa builds just fine on my 32 bit kernel.

(In reply to Hamish Wilson from comment #40)
> Did the kernel patch ever get merged?

(In reply to Alex Deucher from comment #41)
> Yes:
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/
> ?id=a08b588e4199e4200d26027ffcdf3ab2fa906412

OK. Closing then.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.