Bug 87886 - constant fps drops with Intel and Radeon
Summary: constant fps drops with Intel and Radeon
Status: RESOLVED NOTABUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Mesa core (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: mesa-dev
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-31 01:42 UTC by Stéphane Travostino
Modified: 2015-01-21 17:57 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stéphane Travostino 2014-12-31 01:42:50 UTC
I'm experiencing constant fps drops every few seconds with both intel and radeon cards (the latter with DRI_PRIME=1) on Source games,

When using the intel card and INTEL_DEBUG=perf I see "recompiling fragment shader" messages during drops, see http://pastebin.com/5zV9uuRb
This messages repeat every drop, and the games goes from the maximum 40fps for a couple of seconds to ~10 fps for 3/4 seconds.

This happens with the radeon card too, and with GALLIUM_HUD=fps,buffer-wait-time I see the latter parameter increasing to around 5k during fps drops.

This happens with Counter Strike: Global Offensive and Left 4 Dead 2 (only games available for testing), also when standing still in an empty black room.

$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) -- SANDYBRIDGE
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Whistler [Radeon HD 6630M/6650M/6750M/7670M/7690M] (rev ff)

OS Archlinux x86-64 + Linux 3.18
Mesa git version 67105.0c7f895
Comment 1 Eero Tamminen 2014-12-31 13:42:14 UTC
Has your performance regressed?  This fall there have been some compiler frontend improvements that allow e.g. more inlining to be done for some shaders, and as result compilation can take (in worst case even several times) longer than earlier. See bug 86140.

Recompile messages come from the backend which is separate for AMD and Intel, i.e. you may need to file separate bugs for each (Intel one would be for "Drivers/DRI/i965" component).


> This happens with Counter Strike: Global Offensive and Left 4 Dead 2 (only games available for testing), also when standing still in an empty black room.

Please give detailed instructions, how one can reach an "empty black room" where Mesa will do constant shader recompiles.  Preferably in single player / tutorial level which doesn't require hours of playing.

(I haven't seen anything like that on HSW in those games.)

Alternatively, you could provide Apitrace trace.
Comment 2 Kenneth Graunke 2015-01-01 01:19:03 UTC
(In reply to Eero Tamminen from comment #1)
> (I haven't seen anything like that on HSW in those games.)

You won't - all of the reporter's recompiles are due to EXT_texture_swizzle or DEPTH_TEXTURE_MODE swizzling, which only happen on pre-Haswell.
Comment 3 Kenneth Graunke 2015-01-01 04:08:32 UTC
I discovered a bug in the i965 driver: the precompile was guessing the texture swizzle incorrectly.

These two patches should cut around 40% of the recompiles:
http://lists.freedesktop.org/archives/mesa-dev/2014-December/073483.html
http://lists.freedesktop.org/archives/mesa-dev/2014-December/073484.html
Comment 4 Kenneth Graunke 2015-01-01 06:52:06 UTC
And it turns out we can eliminate the rest of them pretty easily too:
http://lists.freedesktop.org/archives/mesa-dev/2014-December/073490.html
http://lists.freedesktop.org/archives/mesa-dev/2014-December/073489.html

With those four patches on top of Mesa master, I see no recompiles at all in CSGO on my Sandybridge.
Comment 5 Stéphane Travostino 2015-01-03 00:37:14 UTC
(In reply to Eero Tamminen from comment #1)
> Has your performance regressed?  This fall there have been some compiler
> frontend improvements that allow e.g. more inlining to be done for some
> shaders, and as result compilation can take (in worst case even several
> times) longer than earlier. See bug 86140.

I've always seen lag with the Radeon card, although it seems to be slowly getting better since my first tests 12 months ago. I don't have numbers to support that claim though.

> 
> Recompile messages come from the backend which is separate for AMD and
> Intel, i.e. you may need to file separate bugs for each (Intel one would be
> for "Drivers/DRI/i965" component).

I've opened a generic bug report since this affects two different card vendors, and even if Intel has some specific bugs or performance issues, I suspect the problem is in the driver-independent code.

> 
> 
> > This happens with Counter Strike: Global Offensive and Left 4 Dead 2 (only games available for testing), also when standing still in an empty black room.
> 
> Please give detailed instructions, how one can reach an "empty black room"
> where Mesa will do constant shader recompiles.  Preferably in single player
> / tutorial level which doesn't require hours of playing.

Left 4 Dead 2 instructions: load a single player game, Dead Center map 1, load the game: as soon as the game starts (helicopter scene) you get a noticeable slowdown, probably due to map loading.
From that point on, every few seconds you should get a noticeable lag every few seconds. You can get a bigger slowdown when meleeing zombies or the explosion about 30 seconds into the game.

Note: this affects every map, and is constant in every point of the map. I written about the empty room just to point out that this happen even when there is nothing being rendered on the screen (apart from HUD).

> 
> (I haven't seen anything like that on HSW in those games.)
> 
> Alternatively, you could provide Apitrace trace.

Will do.
Comment 6 Jason Ekstrand 2015-01-03 00:40:47 UTC
Could you please also test with Ken's 4 patches.  That will tell us if it was just the recompiles or if there's something else we should be looking for.
Comment 7 Stéphane Travostino 2015-01-03 02:08:52 UTC
(In reply to Jason Ekstrand from comment #6)
> Could you please also test with Ken's 4 patches.  That will tell us if it
> was just the recompiles or if there's something else we should be looking
> for.

So, with the patch I confirm I no longer get messages about EXT_texture_swizzle recompiles, but the lag and performance issues are still present.

Here's an updated log output from the game and INTEL_DEBUG=perf: http://pastebin.com/1bp76x5e
Comment 8 Stéphane Travostino 2015-01-03 02:42:26 UTC
Here's the apitrace with mesa-git and the 4 patches from comments #3 and #4: https://drive.google.com/file/d/0BwBQBTnr5Iv6WHBfeE50RUxvRUU/view?usp=sharing

Warning: 290MB file, ~1GB uncompressed.

The first 30 seconds are the game booting up and map loading, then there's about 1:40 of actual game trace.
Comment 9 almos 2015-01-03 15:25:20 UTC
I just checked L4D2 with mesa 10.3.2 (AMD Barts), and I see no such fps drops. Sure, in the beginning there are some hiccups, but once all shaders have been used at least once, everything is smooth.
Comment 10 Stéphane Travostino 2015-01-03 18:41:49 UTC
(In reply to almos from comment #9)
> I just checked L4D2 with mesa 10.3.2 (AMD Barts), and I see no such fps
> drops. Sure, in the beginning there are some hiccups, but once all shaders
> have been used at least once, everything is smooth.

That's interesting... so I've decided to test 10.3.2 with CS and L4D2 and:

Intel card: fps drops are still present constantly and with the same magnitude. 
  Haven't tried Kenneth's patches as I'm having errors compiling vanilla mesa 10.3.2 from git on Archlinux, so I'm using the old upstream packages.

Radeon card: no more fps drops. When standing still fps graph is flat, compared to mesa 10.4 or HEAD where fps graph describes a sine wave. Performance is generally bad (40% slower) probably due to r600 improvements in the past releases.

For reference, mine is a AMD TURKS card.
Comment 11 Stéphane Travostino 2015-01-04 02:40:43 UTC
After spending half a day bisecting, I don't think there's any real difference between Mesa 10.3.2 and master: the fps drops happen in both releases, although they seem to be more infrequent in 10.3.2 and almost predictable in master.

Due to this behaviour it's hard to bisect and tell which commit introduced these performance issues.

I'm still on square one.
Comment 12 Eero Tamminen 2015-01-05 13:26:55 UTC
(In reply to Stéphane Travostino from comment #11)
> After spending half a day bisecting, I don't think there's any real
> difference between Mesa 10.3.2 and master: the fps drops happen in both
> releases, although they seem to be more infrequent in 10.3.2 and almost
> predictable in master.
> 
> Due to this behaviour it's hard to bisect and tell which commit introduced
> these performance issues.
> 
> I'm still on square one.

Your comment is from early yesterday and Kenneth's patches landed in Mesa at the end of yesterday.  Do your _Intel_SNB_ issues go away if you use Mesa from today?
Comment 13 Michel Dänzer 2015-01-07 08:21:10 UTC
Does it still happen with the Radeon card with a 3.19-rc kernel?
Comment 14 Gustaw Smolarczyk 2015-01-07 12:32:01 UTC
If you are using 3.18 kernel, you could also try the previous one (3.17.x). I have a similar problem on radeon (though it's TAHITI, so radeonsi) and found that it is a kernel regression.

In my case, I can easily reproduce it by playing Minecraft - after loading a world, in first minute there will always be a series of 1-3s pauses.

https://bugzilla.kernel.org/show_bug.cgi?id=90741
Comment 15 almos 2015-01-07 19:53:19 UTC
I tried l4d2 again with mesa 10.5-dev (git-1829f9c), and still nothing. Kernel is the same as before (3.17.7). Do I need to underclock my CPU to see the lag spikes?
Comment 16 Stéphane Travostino 2015-01-07 20:03:08 UTC
Is it possible there's a weird interaction with PRIME? @almos, does your system have a muxless setup?

My CPU isn't underclocked nor undervolted, and using the performance governor doesn't help in any way.

Also, I had the same problem with 3.17.6 -- I'll soon try again with an updated mesa and Linux 3.19-rc
Comment 17 Stéphane Travostino 2015-01-10 01:01:31 UTC
Status as of Linux 3.19-rc3, mesa HEAD e28f9d0

Resolution 1280x800 out of native 1600x900, all graphic detail at minimum.

Radeon: min fps 20, avg fps 65, max fps 95
  In game FPS averages 65 FPS, with no issues during the first minute of the game.
  After that a valley in the FPS chart of about 3 seconds @ 20 FPS, repeating throughout the game every 15 seconds, or so.
  FPS drops are independent of the scene complexity, as the same scene after the drop goes back to the average of 65 FPS.

Intel: min fps 10, avg fps 45, max fps 75
  Same as Radeon, although the FPS drops seems to occur immediately after the actual game starts.
  FPS drops happen faster than with radeon, about 5 seconds around 45 fps and 5 seconds around 20 fps, a constant up and down throughout the gameplay.
  No relation with scene complexity either.
  Truncated log: http://pastebin.com/xHsekJUD -- this is less than 1 minute of gameplay.

  I confirm I no longer get EXT_texture_swizzle messages with Intel.

These values are from Left 4 Dead 2
Same effect with Counter Strike: Global Offensive, although the FPS values are lower probably due to the higher complexity of the map and textures.
Comment 18 Stéphane Travostino 2015-01-19 21:01:17 UTC
OK from further experiments this bug does NOT ONLY affect Source games, but any 3d/OpenGL application.

I've experienced the same issue with varying degrees of severity also on:

- Sauerbraten: average FPS around 180, random quick <= 1 second drops to 40 FPS every 10 seconds, severely affecting gameplay (being a multiplayer FPS)

- WebGL on both Chromium and Firefox, example http://brm.io/matter-js-demo/#stress
  While moving a box around I get FPS drops every few seconds, from 60 FPS to 40, resolving by themselves after a couple seconds.

Hopefully with these "free" tests I can find someone else experiencing the same issue.
Comment 19 Hohahiu 2015-01-20 02:15:15 UTC
Maybe a wild guess, but what are the temperatures of the GPUs in your laptop? Is this an overheating issue?
Comment 20 Eero Tamminen 2015-01-20 09:12:34 UTC
Are you still getting INTEL_DEBUG=perf output from them?

-> if you're still getting recompile messages, re-check you have latest Mesa

-> if there are no perf warnings, check that:
* your dmesg doesn't have any suspicious warnings
* "top" output doesn't show things to be CPU limited and you having some background CPU / X hog occasionally stalling things for the foreground app

(You need another machine to monitor this when running things at fullscreen)
Comment 21 Stéphane Travostino 2015-01-20 20:31:18 UTC
(In reply to Hohahiu from comment #19)
> Maybe a wild guess, but what are the temperatures of the GPUs in your
> laptop? Is this an overheating issue?

No, but GOOD NEWS i've managed to reproduce the same problem with glxgears + RADEON.

Here's a self-explanatory screenshot with Gallium FPS HUD enabled:

https://dl.dropboxusercontent.com/u/64733/Screenshot%20from%202015-01-20%2020%3A25%3A42.png

I'm trying to reproduce the same thing with Intel, but it's V-synced and can't manage to have it run more than 59 FPS.

One bizarre thing I've noticed is that no matter the complexity of the game the fan speed is relatively slow compared to Windows, where any AAA game makes my laptop sound like a jet engine. I'd empirically say it's running at 50%, where 0% is normal operation and 100% is jet-fighter loud.

Hope this helps.
Comment 22 Stéphane Travostino 2015-01-20 20:37:48 UTC
(In reply to Eero Tamminen from comment #20)
> Are you still getting INTEL_DEBUG=perf output from them?
> 
> -> if you're still getting recompile messages, re-check you have latest Mesa
> 
> -> if there are no perf warnings, check that:
> * your dmesg doesn't have any suspicious warnings
> * "top" output doesn't show things to be CPU limited and you having some
> background CPU / X hog occasionally stalling things for the foreground app
> 
> (You need another machine to monitor this when running things at fullscreen)

No warnings, no dmesg warnings whatsoever.

Following up my latest update, here's my dmesg after running glxgears: http://pastebin.com/0zvVCXdB

There are multiple power state switches as I ran glxgears w/PRIME multiple times in a row.
Comment 23 Michel Dänzer 2015-01-21 02:10:02 UTC
(In reply to Stéphane Travostino from comment #21)
> I'm trying to reproduce the same thing with Intel, but it's V-synced and
> can't manage to have it run more than 59 FPS.

Have you tried 'vblank_mode=0 glxgears'?
Comment 24 Stéphane Travostino 2015-01-21 02:31:44 UTC
Thanks Michel,

yes I confirm I can reproduce the same FPS drops with glxgears, Intel and vsync disabled.

No FPS HUD for Intel, but I can see the FPS numbers change between ~5.6k and 1.6k every 10 seconds on average.
Comment 25 Michel Dänzer 2015-01-21 04:17:44 UTC
(In reply to Stéphane Travostino from comment #24)
> yes I confirm I can reproduce the same FPS drops with glxgears, Intel and
> vsync disabled.

So, it seems like something is causing the performance of your system as a whole to degrade at regular intervals.

Does top show any additional CPU load while performance is degraded? Or does something like iotop or vmstat show I/O during those times?

Does it also affect pure CPU applications, e.g. audio or video encoding / transcoding?
Comment 26 Eero Tamminen 2015-01-21 11:16:14 UTC
as root, mount debugfs and monitor CAGF (actual GPU frequency) value from "/sys/kernel/debug/dri/0/i915_frequency_info", or if you have older kernel from "/sys/kernel/debug/dri/0/i915_cur_delayinfo"  Does it keep up, or go down when you get FPS drop?

(If it goes down, this seems like kernel power management issue.)
Comment 27 Stéphane Travostino 2015-01-21 13:57:36 UTC
(In reply to Eero Tamminen from comment #26)
> as root, mount debugfs and monitor CAGF (actual GPU frequency) value from
> "/sys/kernel/debug/dri/0/i915_frequency_info", or if you have older kernel
> from "/sys/kernel/debug/dri/0/i915_cur_delayinfo"  Does it keep up, or go
> down when you get FPS drop?
> 
> (If it goes down, this seems like kernel power management issue.)

Good call! Watching i915_frequency_info while running glxgears on Intel, I see it starts at 1100 MHz and after a few seconds it starts to drop to 650 MHz and up again, repeatedly.
Frequency changes collerate with FPS changes in glxgears, so yeah, it seems it is a kernel pm issue.

I tried booting up with i915.powersave=0 i915.enable_rc6=0, also tried disabling Runtime PM on the Intel card via powertop but I can't find a way to force the frequency to max values to be 100% sure it's frequency switching related.

Any idea where to go from there?
Comment 28 Stéphane Travostino 2015-01-21 13:58:15 UTC
Forgot to specify that that the min/max frequency of my intel card are 650/1200 MHz
Comment 29 Eero Tamminen 2015-01-21 14:46:16 UTC
You need to do one more test.

if you'll echo max freq value to /sys/kernel/debug/dri/0/i915_min_freq, kernel will not drop GPU frequency.

If that gets rid of the issue, it's kernel PM issue.

If the CAGF value still goes below max GPU frequency, you got issues outside of what Linux can control -> the CPU speed gets limited by HW / firmware, potentially because of temperature issues.

Temperature you can track with lmsensors.  If frequency gets limited when temperature reaches a certain limit, you need to make sure your CPU is better cooled.

Updating BIOS may also help, I think newer BIOS versions try to keep fluctuations smaller (drop freq less, but earlier).
Comment 30 Stéphane Travostino 2015-01-21 17:57:02 UTC
Solved!

Yes, forcing the min/max i915 freq didn't stop the GPU from going to the low frequency by itself, so as you say it's something outside the control of the kernel.

This machine is a Sony Vaio VPCSA series, and has a "sony_laptop" module to control keyboard backlight, and.. thermal control.

By default "/sys/devices/platform/sony-laptop/thermal_control" is set to "balanced", changing it to "performance" I get:

- Stable FPS on both Intel & Radeon
- Intel CAGF frequency stable on max when running intensive OpenGL operations
- No more FPS drops

Thanks everybody for the help troubleshooting this, marking this as NOTABUG.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.