Bug 71455 - Thermal management in nouveau running hot 3.12.0+ kernel
Summary: Thermal management in nouveau running hot 3.12.0+ kernel
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: high critical
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-11-10 14:17 UTC by Bob Gleitsmann
Modified: 2015-10-29 23:59 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg text (98.05 KB, text/plain)
2015-02-15 02:51 UTC, Evan Foss
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bob Gleitsmann 2013-11-10 14:17:05 UTC
Thermal management now works except it regulates between 91 and 101 deg C. That's what the sensors command shows as being the hysteresis and critical temperatures. I think it's way too hot. With 3.12.0-rc7+ it runs at about 65 probably full fan. My card is a 6800 Ultra. AFAICT, there is only one clock speed available. Don't know what is in the BIOS tables or the method used to compute the temp limits.  

Best Wishes,

Bob
Comment 1 Martin Peres 2013-11-12 11:30:03 UTC
Hi Bob,

Could you send me your kernel logs please? Extracting your vbios could also be of some help later on.

Is fan management really working properly for you? I know it is an old card and those were really running hot, but still!
Comment 2 Bob Gleitsmann 2013-12-01 17:25:18 UTC
Here is an additional data point. The problematic kernel sets 0xb20010f0 to 0x83ff03ff. The driver reports that the PWM is 100%. In fact it is 100% off. As can be expected, the temperature immediately begins repidly rising. Interestingly, the card seems to self protect by turning the fan on when the temperature gets above the "critical" temperature and restoring the default PWM (fan off). I suppose that is what "auto" means. The value in 0xb200010f0 never changes while the fan speed goes up and down. 
The blob sets the same mmio address to 0x81ff03ff and reports the fan to be 50%. I can't tell you if "100%" PWM is fan off on all NV cards. I'm sure you already know anyway. The fan speeds up significantly if the PWM setting is either 0x00007fff or 0x80007fff. 
I guess that's more than one data point. Other than that, I'm completely in the dark. However, I think I will patch my kernel so that it sets the PWM to 50% instead of 100%. 50% off is a lot better than 100% off.

Bob
Comment 3 Bob Gleitsmann 2013-12-01 21:06:26 UTC
The fan PWM setting is derived from priv->cstate. The value it gets from there is zero. So the setting of PWM for NV40 is working correctly, "inverting" the percent setting to get the value set in 0xb20010f0. Probably priv->cstate is never initialized. I'll bet that wasn't intentional.

Bob
Comment 4 Bob Gleitsmann 2013-12-01 21:09:52 UTC
Oh, the value reported by the driver for the PWM setting is the value before it is set by the driver. It doesn't say that in the kernel message.
Comment 5 Ben Skeggs 2013-12-01 21:11:26 UTC
(In reply to comment #3)
> The fan PWM setting is derived from priv->cstate. The value it gets from
> there is zero. So the setting of PWM for NV40 is working correctly,
> "inverting" the percent setting to get the value set in 0xb20010f0. Probably
> priv->cstate is never initialized. I'll bet that wasn't intentional.
> 
Not intentional. But also found and fixed already in -rc1.

Ben. 

> Bob
Comment 6 Bob Gleitsmann 2013-12-02 04:59:27 UTC
In view of Ben's comment, I am stopping further research. However, I will offer as a final remark that perfE.pstate = 0x20 on my card. It gets into nv40_clock_ctor but the fan speed is never set. According to nvbios, for this card the default setting for the fan speed is 50. 

Best Wishes,

Bob
Comment 7 Evan Foss 2015-02-15 02:51:23 UTC
Created attachment 113500 [details]
dmesg text

I am running a mid 2012 macbook pro. 650M nVidia chip. There is something funky with the glx rendering. I tried the various things in the troubleshooting guide and got no where. If need be I can email the following two images to the list but I thought I would save everyones mailboxes some bloat. 

https://www.flickr.com/photos/evanfoss/15912299653/
https://www.flickr.com/photos/evanfoss/16507165346/in/photostream/

For those in the far future who may find the above links dead the images show glxgears rendering at +3K frames a second. The glxgears window however has blocks missing with only a single pixel in each one being rendered.

I have tested the following kernel versions and found this to be true for all of the following. 

gentoo-sources-3.12.13
gentoo-sources-3.14.6
gentoo-sources-3.16.1
Comment 8 Ilia Mirkin 2015-10-28 00:38:03 UTC
Bob, please confirm that newer kernels resolve this issue.

Evan, your issue is wholly unrelated to the original one. And most likely also fixed in newer kernels (large page size mismatch).
Comment 9 Evan Foss 2015-10-29 23:59:17 UTC
On Wed, Oct 28, 2015 at 12:38 AM,  <bugzilla-daemon@freedesktop.org> wrote:
> Comment # 8 on bug 71455 from Ilia Mirkin
>
> Bob, please confirm that newer kernels resolve this issue.
>
> Evan, your issue is wholly unrelated to the original one. And most likely
> also
> fixed in newer kernels (large page size mismatch).

Yes and it was dealt with by other people. (my thanks again by the way)

I think i attached it to this by accident before I put it with the right one.

> ________________________________
> You are receiving this mail because:
>
> You are on the CC list for the bug.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.