Starting recently, when I am running a full screen GL app and stopping it after some time (be it a native linux game, a wine d3d game or a ogl screensaver on X, or simple-egl demo on Wayland compositor), my machine continues working at way higher temperature than it actually should. It is normal for this laptop that it runs at > 90 °C (100 °C is critical, but never reached that one) when running a game on linux (~80 °C on ms windows) since it doesn't have that good cooling system, but normal temperature is arround 63 °C. I have discovered, that after I leave my machine idle for some time, it turns on its screensaver, which is Blue Screen of Death screensaver from xscreensaver package, which is in turn also OpenGL screensaver and machine stays cool and quiet. But, when I try to start using my machine again (terminating screensaver), machine starts working at > 85 °C temperature, even if I didn't run anything else. CPU usage is reported to be < 1% on all 4 cpu threads. Same happens with any other OGL app, Wine D3D game or weston-simple-egl demo when ran fullscreen on Wayland compositor - Weston. I do not recall that this was an issue with Mesa 9.2 and some pre 10.0 branching version, but it did occour somewhere between 9.2 and 10.0 branching iirc. I'll try to investigate more. Installed packages: linux-3.12.5 libdrm-2.4.50 wayland-1.3.0-21-g1521c62 llvm-195929 (svn revision) Mesa-10.1.0-g2b404a6 weston-1.3.0-272-ga5059eb xorg-server-1.14.99.904 xf86-video-intel-2.99.906-98-g9289e2c All built with gcc-4.8.2, binutils-2.24 and glibc-2.18 (custom system, not Gentoo). If some more info is needed, let me know.
Most likely this is a kernel/power management problem, not a Mesa problem. I've seen reports that the Sandybridge GPU sometimes gets stuck at the maximum clock frequency. You can check that via cat /sys/class/drm/card0/gt_{min,cur,max}_freq_mhz. It's highly unlikely this is a Mesa issue.
Thanks for the response. Indeed it is a kernel issue as you said it is. Before starting xonotic-glx: $ cat /sys/class/drm/card0/gt_{min,cur,max}_freq_mhz 650 650 1200 After quitting xonotic-glx after playing it for a minute or two: $ cat /sys/class/drm/card0/gt_{min,cur,max}_freq_mhz 650 1200 1200 And it won't go down. It is possible that I got kernel 3.12 in some time before mesa 10.0 branching and that's why I was unable to reproduce the issue. I have all kinds of weird problems with power management even with radeon, not just intel it seems (hybrid gpus). Kernel config and dmesg output attached.
Created attachment 91065 [details] dmesg
Created attachment 91066 [details] Kernel config
Try a 3.13 kernel for a different algorithm for tuning RPS frequencies. The original one will maintain a frequency (even the highest) if there is any OpenGL activity. However, this should not prevent the GPU from entering rc6 (sleep mode) unless there is very frequent GPU activity.
I am running xfwm4 window manager, which I believe has render-based compositing enabled, but everything worked fine before 3.12 upgrade. I have i915.i915_enable_rc6=1 on the kernel command line, which was there for a long time already and I never noticed something like this. I'll try 3.13 kernel shortly, but I believe it might be worth investigating what broke this.
iirc, what changed in 3.12 was that the rc6 activation period was increased, and we have had a number of reports that that has reduced rc6 efficacy on many machines.
From what I noticed, that period is more than 15 minutes here, and I ussually just reboot since I can't stand the fans going so loud for so long. Do you have a link to that commit maybe? I want to try to revert it and see if it will fix my problem.
Seems to be working fine with kernel 3.13-rc5.
For the rc6 commits, see: commit 351aa5666d02062b52329bcfe4bcf9d1f882fba9 Author: Stéphane Marchesin <marcheu@chromium.org> Date: Tue Aug 13 11:55:17 2013 -0700 drm/i915: tune the RC6 threshold for stability It's basically the same deal as the RC6+ issues on ivy bridge except this time with RC6 on sandy bridge. Like last time the core of the issue is that the timings don't work 100% with our voltage regulator. So from time to time, the kernel will print a warning message about the GPU not getting out of RC6. In particular, I found this fairly easy to reproduce during suspend/resume. Changing the threshold to 125000 instead of 50000 seems to fix the issue. The previous patch used 150000 but as it turns out this doesn't work everywhere. After getting such a machine, I bisected the highest value which works, which is 125000, so here it is. I also measured the idle power usage before/after this patch and didn't see a difference on a sandy bridge laptop. On haswell and up, it makes a big difference, so we want to keep it at 50k there. It also seems like haswell doesn't have the RC6 issues that sandy bridge has so the 50k value is fine. Signed-off-by: Stéphane Marchesin <marcheu@chromium.org> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> commit 29c78f609e661e663a239a37923adb1d61f6386c Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Sat Nov 16 16:04:26 2013 +0100 Partially revert "drm/i915: tune the RC6 threshold for stability" This reverts commit 351aa5666d02062b52329bcfe4bcf9d1f882fba9. It breaks rc6 on at least one snb machine. Since we don't yet have a report for ivb let's keep it there for now. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71656 Cc: Stéphane Marchesin <marcheu@chromium.org> Cc: erik@vontaene.de Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> But if 3.13 is cooler, that suggests it was RPS (gpu frequency).
Probably related to (the same as!) bug 68807.
If we happy with the updated RPS tuning, lets leave it at that.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.