Bug 83436

Summary: Sudden framerate drops in multiple games when compiling with -mtune=generic (as well with -mtune=pentium-mmx and older CPUs)
Product: Mesa Reporter: Maciej <gutigen>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED NOTOURBUG QA Contact:
Severity: normal    
Priority: medium CC: erik.badman, muhomor.d, pedretti.fabio
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: dmesg
Xorg.0.log

Description Maciej 2014-09-03 12:30:31 UTC
Created attachment 105680 [details]
dmesg

I'm having a lot of fps drops in TF2, Unigine Heaven, Xonotic, Wine games. Framerate goes from even something like stable 100fps to sudden 5-15fps, it hangs there for few seconds and goes back up to usual. 

On top of that I've noticed some performance decrease in above apps by even 20%.

This is happening since last week (maybe friday? something around the weekend).
Comment 1 Maciej 2014-09-03 12:30:45 UTC
Created attachment 105681 [details]
Xorg.0.log
Comment 2 smoki 2014-09-03 12:37:43 UTC
 I seeing that too now, all of those are 32bit apps is it? 

 -03 compiled mesa seems to fixed that for me.
Comment 3 Maciej 2014-09-03 12:41:44 UTC
Yes all 32bit. Xonotic (64bit) is not affected by sudden drops, only by performance decrease, I should have made it a bit more clear, sorry.
Comment 4 Maciej 2014-09-03 12:42:35 UTC
Wait a minute... Unigine is affected, but it's 64bit. Reporting bugs is confusing, especially when you can't edit your post ;)
Comment 5 smoki 2014-09-03 12:50:27 UTC
(In reply to comment #4)
> Wait a minute... Unigine is affected, but it's 64bit. Reporting bugs is
> confusing, especially when you can't edit your post ;)

 Unigine Heaven have 64 and 32 variants in it as i see, are you sure he does not load 32bit one? 

 If you build you mesa try -O3 gcc optimization, that seems fixed those... don't know yet what is particular issue with default optimization :).
Comment 6 Maciej 2014-09-03 12:58:23 UTC
Yes, it's 64bit for sure. 

As for your fix, I'm getting my mesa from Oibaf PPA, got not enough knowledge to compile it myself.
Comment 7 smoki 2014-09-03 13:48:14 UTC
 Actually nothing to do with the -O3, but with the -mtune -march it seems.

 On Debian default is -mtune=generic -march=i586 maybe that is the problem, if i pass -mtune=native -march=native then it performs fine in most cases :) But not all, glretrace sometimes perform slow sometimes fast... so build system seems borked currently even more on 32bit :)
Comment 8 Michel Dänzer 2014-09-04 03:42:09 UTC
(In reply to comment #4)
> Unigine is affected, but it's 64bit.

You're saying it's affected by both the framerate drops and performance decrease?

Can you guys bisect?
Comment 9 Maciej 2014-09-04 10:37:16 UTC
Did an update today, performance decrease is still there, but fps drops are gone. I had no other apps running in the background, so I'm not sure what's up. However fps drops in TF2 are still a thing.

As for bisecting, I really have no skills to do that, I'm just a gamer with AMD card :/
Comment 10 smoki 2014-09-04 12:04:27 UTC
 OK i will bisect this, having now here pretty much clear case something like 3 times performance drop in OpenJK :)
Comment 11 smoki 2014-09-04 18:38:38 UTC
(In reply to comment #8) 
> Can you guys bisect?

 So latest good is 37d43ebb28ce8be38f3d9b0805b8b14354ce786d, after 07c65b85eada8dd34019763b6e82ed4257a9b4a6 there is corruption (so i can't tested this) all the way until 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 one week later where is a fix for 3.17 kernel. Took the patch tried 07c65b85eada8dd34019763b6e82ed4257a9b4a6 again an bingo, blah, blah...

 So PIPE_USAGE_STREAM seems to be a main problem again, but unlike 64bit on 32bit it can't be reverted to old behavior, on 32bit it is unusable produce much corruption.
Comment 12 smoki 2014-09-04 18:45:19 UTC
 That happens with default ./autogen blah blah

 As i said earlier if i pass -mtune=native -march=native flags perf is normal, very weird and funny thing in the same time :D
Comment 13 smoki 2014-09-04 19:11:24 UTC
(In reply to comment #12)
>  That happens with default ./autogen blah blah
> 
>  As i said earlier if i pass -mtune=native -march=native flags perf is
> normal, very weird and funny thing in the same time :D

 But almost normal, as i said earlier too glretrace for example is slowish again :) Very weird issues, all is fine on 37d43ebb28ce8be38f3d9b0805b8b14354ce786d.
Comment 14 Emil Velikov 2014-09-04 19:26:25 UTC
(In reply to comment #12)
>  That happens with default ./autogen blah blah
> 
>  As i said earlier if i pass -mtune=native -march=native flags perf is
> normal, very weird and funny thing in the same time :D

Unless you're provided --enable-debug mesa does mess around with the compiler optimisation/debug options (-O* and -g*). And it never touches -mtune or -march. Might be that the compiler is going for some different heuristics before/after the commit causing substantially different code to be generated ?
Comment 15 smoki 2014-09-04 21:48:22 UTC
 Default what Debian 32bit passes in addition to mesa options is:
 
/usr/lib/gcc/i586-linux-gnu/4.9/cc1 -E -quiet -v -imultiarch i386-linux-gnu - -mtune=generic -march=i586

 That is with -mtune=generic -march=i586, with -mtune=native -march=native on AMD Kabini it passes this:

 /usr/lib/gcc/i586-linux-gnu/4.9/cc1 -E -quiet -v -imultiarch i386-linux-gnu - -march=btver2 -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mbmi -mno-bmi2 -mno-tbm -mavx -mno-avx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mno-rdrnd -mf16c -mno-fsgsbase -mno-rdseed -mprfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=btver2

 So Debian default works fine for the 10.2 branch, and for the 10.3-devel up to the commit 37d43ebb28ce8be38f3d9b0805b8b14354ce786d . But anything newer is affected, 10.3 branch and git master.

 Maybe some of those options need to be added to default, not sure what of these make is work normal :)
Comment 16 Michel Dänzer 2014-09-05 02:15:57 UTC
(In reply to comment #11)
> So PIPE_USAGE_STREAM seems to be a main problem again,

Note that this could be due to the application or non-driver Gallium code incorrectly using PIPE_USAGE_STREAM.


> but unlike 64bit on 32bit it can't be reverted to old behavior, on 32bit it is
> unusable produce much corruption.

Weird, sounds like a kernel issue.
Comment 17 smoki 2014-09-05 14:03:04 UTC
 OK i found -mtune=generic is culprit for performance :). Played a little with -mtune to found what is minimum this code wants to work fast:
 
 -mtune=i586 = slow
 -mtune=pentium = slow
 -mtune=pentium-mmx = slow
 -mtune=pentium-pro = fast
 -mtune=i686 = fast
 -mtune=pentium3 = fast
 -mtune=pentium-pro = fast
 etc...
 
  So -mtune=generic seems to set lower cpu target than this code needed to perform fast.
Comment 18 smoki 2014-09-05 18:17:30 UTC
 Just tried oibaf's radeonsi_dri.so to copy here on Debian (just to be sure :D) and it suffers from the same problem with -march=generic ;)
Comment 19 smoki 2014-09-05 18:24:04 UTC
(In reply to comment #18)
>  Just tried oibaf's radeonsi_dri.so to copy here on Debian (just to be sure
> :D) and it suffers from the same problem with -march=generic ;)

 I mean -mtune=generic of course :).

 Now to bisect where that SSE4.1 code start to compile this slow (4 minututes on Kabini just for that) :D Issue also not present on 64bit ;)
Comment 20 Andy Furniss 2014-09-05 20:17:40 UTC
(In reply to comment #19)
> (In reply to comment #18)
> >  Just tried oibaf's radeonsi_dri.so to copy here on Debian (just to be sure
> > :D) and it suffers from the same problem with -march=generic ;)
> 
>  I mean -mtune=generic of course :).
> 
>  Now to bisect where that SSE4.1 code start to compile this slow (4
> minututes on Kabini just for that) :D Issue also not present on 64bit ;)

I haven't seen the perf issue on my pure 64 bit setup, but have noticed that the compile sits on sse4.1 for a while recently - just using one core, like it's blocking the other threads of a make -j5 till it's done.
Comment 21 smoki 2014-09-05 20:23:10 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > (In reply to comment #18)
> > >  Just tried oibaf's radeonsi_dri.so to copy here on Debian (just to be sure
> > > :D) and it suffers from the same problem with -march=generic ;)
> > 
> >  I mean -mtune=generic of course :).
> > 
> >  Now to bisect where that SSE4.1 code start to compile this slow (4
> > minututes on Kabini just for that) :D Issue also not present on 64bit ;)
> 
> I haven't seen the perf issue on my pure 64 bit setup, but have noticed that
> the compile sits on sse4.1 for a while recently - just using one core, like
> it's blocking the other threads of a make -j5 till it's done.

 Oh i bisected that ine, it takes 200X more time to compile on 32bit :D , actually compile libmesa_gallium i think this is the commit:

 http://cgit.freedesktop.org/mesa/mesa/commit/?id=d55f77b503ab7b59ecdd8f31c4f7dc498710e75b
Comment 22 Dieter Nützel 2014-09-05 20:56:36 UTC
(In reply to comment #21)
> (In reply to comment #20)
> > (In reply to comment #19)
> > > (In reply to comment #18)
> > > >  Just tried oibaf's radeonsi_dri.so to copy here on Debian (just to be sure
> > > > :D) and it suffers from the same problem with -march=generic ;)
> > > 
> > >  I mean -mtune=generic of course :).
> > > 
> > >  Now to bisect where that SSE4.1 code start to compile this slow (4
> > > minututes on Kabini just for that) :D Issue also not present on 64bit ;)
> > 
> > I haven't seen the perf issue on my pure 64 bit setup, but have noticed that
> > the compile sits on sse4.1 for a while recently - just using one core, like
> > it's blocking the other threads of a make -j5 till it's done.
> 
>  Oh i bisected that ine, it takes 200X more time to compile on 32bit :D ,
> actually compile libmesa_gallium i think this is the commit:
> 
>  http://cgit.freedesktop.org/mesa/mesa/commit/
> ?id=d55f77b503ab7b59ecdd8f31c4f7dc498710e75b

For this look, here:

http://lists.freedesktop.org/archives/mesa-dev/2014-August/065823.html

Greetings and happy 'bisecting'...;-)

Dieter
Comment 23 smoki 2014-09-05 21:02:23 UTC
(In reply to comment #22)
> (In reply to comment #21)
> > (In reply to comment #20)
> > > (In reply to comment #19)
> > > > (In reply to comment #18)
> > > > >  Just tried oibaf's radeonsi_dri.so to copy here on Debian (just to be sure
> > > > > :D) and it suffers from the same problem with -march=generic ;)
> > > > 
> > > >  I mean -mtune=generic of course :).
> > > > 
> > > >  Now to bisect where that SSE4.1 code start to compile this slow (4
> > > > minututes on Kabini just for that) :D Issue also not present on 64bit ;)
> > > 
> > > I haven't seen the perf issue on my pure 64 bit setup, but have noticed that
> > > the compile sits on sse4.1 for a while recently - just using one core, like
> > > it's blocking the other threads of a make -j5 till it's done.
> > 
> >  Oh i bisected that ine, it takes 200X more time to compile on 32bit :D ,
> > actually compile libmesa_gallium i think this is the commit:
> > 
> >  http://cgit.freedesktop.org/mesa/mesa/commit/
> > ?id=d55f77b503ab7b59ecdd8f31c4f7dc498710e75b
> 
> For this look, here:
> 
> http://lists.freedesktop.org/archives/mesa-dev/2014-August/065823.html
> 
> Greetings and happy 'bisecting'...;-)
> 
> Dieter

 So what is the solution for that after one month? Simply to use -O0 maybe :)
Comment 24 Dieter Nützel 2014-09-05 21:05:32 UTC
(In reply to comment #23)
> (In reply to comment #22)
> > (In reply to comment #21)
> > > (In reply to comment #20)
> > > > (In reply to comment #19)
> > > > > (In reply to comment #18)
> > > > > >  Just tried oibaf's radeonsi_dri.so to copy here on Debian (just to be sure
> > > > > > :D) and it suffers from the same problem with -march=generic ;)
> > > > > 
> > > > >  I mean -mtune=generic of course :).
> > > > > 
> > > > >  Now to bisect where that SSE4.1 code start to compile this slow (4
> > > > > minututes on Kabini just for that) :D Issue also not present on 64bit ;)
> > > > 
> > > > I haven't seen the perf issue on my pure 64 bit setup, but have noticed that
> > > > the compile sits on sse4.1 for a while recently - just using one core, like
> > > > it's blocking the other threads of a make -j5 till it's done.
> > > 
> > >  Oh i bisected that ine, it takes 200X more time to compile on 32bit :D ,
> > > actually compile libmesa_gallium i think this is the commit:
> > > 
> > >  http://cgit.freedesktop.org/mesa/mesa/commit/
> > > ?id=d55f77b503ab7b59ecdd8f31c4f7dc498710e75b
> > 
> > For this look, here:
> > 
> > http://lists.freedesktop.org/archives/mesa-dev/2014-August/065823.html
> > 
> > Greetings and happy 'bisecting'...;-)
> > 
> > Dieter
> 
>  So what is the solution for that after one month? Simply to use -O0 maybe :)

Ping Jason Ekstrand, maybe 8-)
Comment 25 smoki 2014-09-05 21:08:44 UTC
(In reply to comment #24)
> (In reply to comment #23)
> > (In reply to comment #22)
> > > (In reply to comment #21)
> > > > (In reply to comment #20)
> > > > > (In reply to comment #19)
> > > > > > (In reply to comment #18)
> > > > > > >  Just tried oibaf's radeonsi_dri.so to copy here on Debian (just to be sure
> > > > > > > :D) and it suffers from the same problem with -march=generic ;)
> > > > > > 
> > > > > >  I mean -mtune=generic of course :).
> > > > > > 
> > > > > >  Now to bisect where that SSE4.1 code start to compile this slow (4
> > > > > > minututes on Kabini just for that) :D Issue also not present on 64bit ;)
> > > > > 
> > > > > I haven't seen the perf issue on my pure 64 bit setup, but have noticed that
> > > > > the compile sits on sse4.1 for a while recently - just using one core, like
> > > > > it's blocking the other threads of a make -j5 till it's done.
> > > > 
> > > >  Oh i bisected that ine, it takes 200X more time to compile on 32bit :D ,
> > > > actually compile libmesa_gallium i think this is the commit:
> > > > 
> > > >  http://cgit.freedesktop.org/mesa/mesa/commit/
> > > > ?id=d55f77b503ab7b59ecdd8f31c4f7dc498710e75b
> > > 
> > > For this look, here:
> > > 
> > > http://lists.freedesktop.org/archives/mesa-dev/2014-August/065823.html
> > > 
> > > Greetings and happy 'bisecting'...;-)
> > > 
> > > Dieter
> > 
> >  So what is the solution for that after one month? Simply to use -O0 maybe :)
> 
> Ping Jason Ekstrand, maybe 8-)

 Ping.
Comment 26 smoki 2014-09-06 09:05:18 UTC
 Hah, actually found that glamor on 32bit OS is also affected by this. Selection of icons on Desktop like LXDE, SpaceFM, etc... so for sure it might be also different usecases that are affected by recent radeon mesa compiled with -mtune=generic
Comment 27 smoki 2014-09-07 21:07:39 UTC
 OpenJK even with -O0 is ~2.5x times faster then -O2 -mtune=generic :)

 Anyway, just found code example from bug 83442 is useful here too, so maybe more people can easely test if they are affected.
  
32bit on commit 37d43ebb28ce8be38f3d9b0805b8b14354ce786d:
 
 -mtune=generic 4-7ms
 -mtune=native  6-10ms
 -O0            6-10ms
 
32bit current git:
 
 -mtune=generic  197-200ms
 -mtune=native  5-8ms
Comment 28 Fabio Pedretti 2014-09-12 12:19:34 UTC
Just to let you know that oibaf PPA now disable -mtune=generic to avoid this issue.

So don't trust the PPA, as it is now, to check if this issue is fixed or not.
Comment 29 Fabio Pedretti 2014-09-17 12:09:01 UTC
> Oh i bisected that ine, it takes 200X more time to compile on 32bit :D , actually compile libmesa_gallium i think this is the commit:

The slow compile should be fixed with:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=cfeb394224f2daeb2139cf4ec489a4dd8297a44d


Is the original issue - slow performance with -mtune=generic - still an issue?
Comment 30 smoki 2014-09-17 17:39:27 UTC
(In reply to comment #29)
> > Oh i bisected that ine, it takes 200X more time to compile on 32bit :D , actually compile libmesa_gallium i think this is the commit:
> 
> The slow compile should be fixed with:
> http://cgit.freedesktop.org/mesa/mesa/commit/
> ?id=cfeb394224f2daeb2139cf4ec489a4dd8297a44d
> 
> 
> Is the original issue - slow performance with -mtune=generic - still an
> issue?

 I alredy assume the same yesterday that slow compile fix might fix this too, but not - it is not fixed...
Comment 31 smoki 2014-09-17 17:46:43 UTC
 As i mentioned on your Phoronix thread, after much and further testing i am pretty sure now that generic make those random lockups on 64bit :)

 Don't use Chromium but those people from bug 81644 should try non generic tuned build :)
Comment 32 smoki 2014-10-09 23:06:23 UTC
 As of mesa commit 7b4276d7acf2e0f77044cb50caa6ad936fa78786 -mtune=generic works normal. 

 But now everything is full of corruption as i mentioned in Comment 11 and in bug 84627
Comment 33 smoki 2014-10-10 16:28:52 UTC
(In reply to smoki from comment #32)
>  As of mesa commit 7b4276d7acf2e0f77044cb50caa6ad936fa78786 -mtune=generic
> works normal. 
> 

 After further testing actually this fixed major fps drops with -mtune=generic, but there is still very worse stuter in (m)any games with it , so i not considered still this as fixed.

 I also tried this patch https://bugs.freedesktop.org/attachment.cgi?id=107655  to remove corruption after that commit, it helps for that... but -mtune=generic is not fixed, i need to patch mesa too to remove any occuranes of GTT_WC then -mtune=generic works fine for me.
Comment 34 Marek Olšák 2015-08-02 11:01:11 UTC
Is this issue still happening on current Mesa git?
Comment 35 smoki 2015-08-02 18:31:41 UTC
(In reply to Marek Olšák from comment #34)
> Is this issue still happening on current Mesa git?

 32bit apps on 32bit kernel are fine, as GTT_WC is disabled for X86_32:

 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a08b588e4199e4200d26027ffcdf3ab2fa906412

 But now issue appear too for 64bit kernel using 32bit apps, so it is still an issue. 64bit kernel and 64bit apps are not affected.

 Unrelated to this driver, but i also found that 32bit fglrx is also slow on the same thing... GTT_WC while running some 32bit apps simply made things much slower.
Of course no idea how to do anything with that driver to 100% approve that :)
Comment 36 egon2003 2015-08-14 22:59:58 UTC
I think I have this aswell. Mesa from git updated today. Mesa is compiled with march=native. Unigine Heaven and Interstellar Marines is where I notice it mostly, other games seem to work ok.

Everytime I fire the gun in Interstellar Marines fps goes to almost 0 for about 1-2 seconds, fps is pretty good otherwise around 50-90. If there is anything I can do to help please let me know, I am on IRC as egon2003.
Comment 37 egon2003 2015-08-22 18:37:40 UTC
I updated my videocard bios and that made everything a lot better, still some minor drops here and there but for example Unigine Heaven runs as it should now.
Comment 38 smoki 2015-08-22 18:52:00 UTC
 egon2003, please open another bug for specific issue.

 This one is related to only 32bit some apps run much slower, compiler so gcc involved, libc maybe... that sort of things.
Comment 39 smoki 2015-08-23 12:31:19 UTC
(In reply to smoki from comment #38)
>  egon2003, please open another bug for specific issue.
> 
>  This one is related to only 32bit some apps run much slower, compiler so
> gcc involved, libc maybe... that sort of things.

 Huh and when i mention libc and immediately found it, just libc6-i686 was not installed - what a buff :D

 Yeah i will close this one now, but someone needs to wrote it somewhere even highly depend libc6-i686 for mesa and Catalyst even.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.