Bug 87885 - [NV94] Fan speed of Nvidia GeForce 9600GT (G94) constantly too high
Summary: [NV94] Fan speed of Nvidia GeForce 9600GT (G94) constantly too high
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-30 23:59 UTC by TM
Modified: 2015-05-03 01:55 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
vbios (63.50 KB, text/plain)
2014-12-30 23:59 UTC, TM
no flags Details
dmegs output (58.77 KB, text/plain)
2014-12-31 00:00 UTC, TM
no flags Details
"sensors" output (527 bytes, text/plain)
2014-12-31 00:01 UTC, TM
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description TM 2014-12-30 23:59:00 UTC
Created attachment 111546 [details]
vbios

The above-mentioned graphics card produces a lot of noise. The fan speed never changes (hasn't changed in years ;) ). Judging from the noise level it must be at its maximum.
People on IRC (#nouveau) asked me to place a bug report here, providing all necessary information. If I missed some info you need, feel free to ask for more!
Comment 1 TM 2014-12-31 00:00:44 UTC
Created attachment 111547 [details]
dmegs output
Comment 2 TM 2014-12-31 00:01:07 UTC
Created attachment 111548 [details]
"sensors" output
Comment 3 Martin Peres 2014-12-31 17:10:39 UTC
Thanks TM for reporting this bug!

Could you test the following cases, please?

- inverted PWM: The fan may be inverted. 35% would then translate to 65% and increasing as the temperature falls down. Could you try the manual fan management and set the pwm to 50%? Please tell us what happens to the fan.

- If the previous test yielded no modification in the fan speed at all, then either we are driving the wrong PWM controller ... or our settings are not taken into account. PTHERM may override the fan speed and set it to 100%. Could you please dump PTHERM's reg space please ? "nvapeek 20000 1000" will do :) Could you also add "nvapeek e114 20" please?

Looking forward to hearing back from you!

PS: nvapeek can be found in envytools: https://github.com/envytools/envytools/
Comment 4 TM 2014-12-31 17:52:29 UTC
Dear Martin, thanks for your reply.

I tried to set the pwm state manually yesterday, which didn't work. I simply forgot to add that info here as well - sorry.

Doing the following has no effect on the fan speed (sound):
# cat pwm1{,_enable}
35
2
# echo 1 > pwm1_enable 
# echo 50 > pwm1
# cat pwm1{,_enable}   # just to prove that the changes have been applied and not overwritten by some crazy application
50
1

About your other suggestions: here's the output of nvapeek:
# ./nvapeek 20000 1000
00020000: 0a2003ff 00000000 c00836f7 00009c40
00020010: 00000000 fe4c038d 9e9c9a9a fe0202ee
00020020: 02080e07 07000a09 000a0802 00050601
00020030: 02020801 00091006 15020805 1a056000
00020040: 07000201 00ff019a 000000ff 44072a0b
00020050: 00000a07 ffffffff ffffffff 00000000
00020060: 00040000 40400000 ffff8480 ffffffff
00020070: ffffffff 00000000 00000000 00000000
00020080: 100c0736 00000000 00000000 00000000
...
000200a0: 01000000 00000003 c10910de 00000004
...
00020100: 00000000 1a114c99 082a0000 00000000
00020110: 00000000 00000000 00000000 00876530
00020120: 00000241 00000200 00000000 00000000
...
00020400: 00000032 00000000 00000000 00000001
00020410: 00000000 0000005c 00000000 00000000
...
00020440: 00000000 00000080 00000000 00000000
...
00020480: 00000087 00000002 00000000 00000000
...
000204c0: 00000064 00000057 00000080 0000005d
...
000207f0: 00000000 00000000 000010de 00000001
...



And:
# ./nvapeek e114 20
0000e114: 00000001 00000000 0000021c 0000010e
0000e124: 10000000 00000000 0001010e 00000000

Happy new year :)
Comment 5 Martin Peres 2014-12-31 18:04:44 UTC
Thanks TM :) Now is not the time to have a look at your dump :p I'll have a look at your ptherm dump tomorrow!
Comment 6 Martin Peres 2015-01-01 09:23:45 UTC
So, I had a look at the dumps. The temperature thresholds are OK and the reported temperature seems sensible.

I now wonder if we somehow select the wrong PWM controller, could you try running the following writes?

nvapoke e114 21c
nvapoke e118 8000010e

If this does not work, have you ever tried with the proprietary driver? If it works with it, could you run "./nvapeek e114 20" on it?

Looking at your vbios, it reports you have an adt7473 which is an external fan management IC. Some vbios report the device but don't actually have it. In your case, it may be there but it would be undetected. We have had a lot of problems with this chip. If everything else does not work, then we'll try to see if we can get this freaking adt7473 to work!

Happy new year :)
Comment 7 TM 2015-01-01 17:03:21 UTC
Unfortunately the first two commands didn't have an effect on the fan speed.
I'll try to install the proprietary driver soon, but probably not today. I'll let you know.
Comment 8 TM 2015-01-03 02:53:29 UTC
Hmm; I installed the proprietary driver now (strange enough: I had to use the G02 driver, not the G03 one which I thought I was supposed to use according to web pages like https://en.opensuse.org/SDB:NVIDIA_drivers). It's running fine, I can see in the nvidia-settings utility that the fan speed is reported to be at 35%.

Running
  nvapoke e114 21c
  nvapoke e118 8000010e
doesn't have any effect.

Nevertheless, here's the output of
./nvapeek e114 20
_after_ running the above commands:
# ./nvapeek e114 20
0000e114: 0000021c 0000010e 0000021c 030000bd
0000e124: 10000000 00000000 0001010e 00000000

Does this help in any way?
Comment 9 Ilia Mirkin 2015-01-03 02:58:50 UTC
(In reply to TM from comment #8)
> Hmm; I installed the proprietary driver now (strange enough: I had to use
> the G02 driver, not the G03 one which I thought I was supposed to use
> according to web pages like https://en.opensuse.org/SDB:NVIDIA_drivers).
> It's running fine, I can see in the nvidia-settings utility that the fan
> speed is reported to be at 35%.
> 
> Running
>   nvapoke e114 21c
>   nvapoke e118 8000010e
> doesn't have any effect.
> 
> Nevertheless, here's the output of
> ./nvapeek e114 20
> _after_ running the above commands:
> # ./nvapeek e114 20
> 0000e114: 0000021c 0000010e 0000021c 030000bd
> 0000e124: 10000000 00000000 0001010e 00000000
> 
> Does this help in any way?

Martin's comments weren't entirely clear... what he meant was

(a) See if those commands fix things on top of nouveau. And if not, then:

(b) On the blob, run the nvapeek. Running the pokes first overwrites the values which negates the usefulness of the peek.

Actually to preempt further questions just run

nvapeek 20000 1000
nvapeek e114 20

when running blob drivers. (And no pokes.)
Comment 10 TM 2015-01-03 03:17:02 UTC
Thanks for the clarification. I was already suspecting I was doing sth wrong :)

So, after a reboot, here's what nvapeek looks like:
# ./nvapeek 20000 1000
00020000: ca000114 00000018 c00836f7 00009c40
00020010: 00000000 fe4c038d 9e9c9a9a fe0202ee
00020020: 02080e07 07000a09 000a0802 00050601
00020030: 02020801 00091006 15020805 1a056000
00020040: 07000201 00ff0239 000000ff 44072a0b
00020050: 00000a07 ffffffff ffffffff 00000000
00020060: 00040001 40400000 ffff8480 ffffffff
00020070: ffffffff 40000104 01100000 00000000
00020080: 100c0736 00000000 00000000 00000000
...
000200a0: 01000000 00000003 c10910de 00000004
...
00020100: 00000000 1a114c99 082a0000 00000000
00020110: 00000000 00000000 00000000 00876530
00020120: 00000241 00000200 00000000 00000000
...
00020400: 00000032 00000000 00000000 00000001
00020410: 00000000 00000069 00000000 00000000
...
00020440: 00000000 00000080 00000000 00000000
...
00020480: 00000087 0000000a 00000000 00000000
...
000204c0: 00000068 00000000 00000080 0000005d
...
000207f0: 00000000 00000000 000010de 00000001
...

# ./nvapeek e114 20
0000e114: 00000001 00000000 0000021c 030000bd
0000e124: 10000000 00000000 0001010e 00000000
Comment 11 Martin Peres 2015-01-03 07:32:25 UTC
(In reply to TM from comment #10)
> Thanks for the clarification. I was already suspecting I was doing sth wrong
> :)
> 
> So, after a reboot, here's what nvapeek looks like:
> # ./nvapeek 20000 1000
> 00020000: ca000114 00000018 c00836f7 00009c40
> 00020010: 00000000 fe4c038d 9e9c9a9a fe0202ee
> 00020020: 02080e07 07000a09 000a0802 00050601
> 00020030: 02020801 00091006 15020805 1a056000
> 00020040: 07000201 00ff0239 000000ff 44072a0b
> 00020050: 00000a07 ffffffff ffffffff 00000000
> 00020060: 00040001 40400000 ffff8480 ffffffff
> 00020070: ffffffff 40000104 01100000 00000000
> 00020080: 100c0736 00000000 00000000 00000000
> ...
> 000200a0: 01000000 00000003 c10910de 00000004
> ...
> 00020100: 00000000 1a114c99 082a0000 00000000
> 00020110: 00000000 00000000 00000000 00876530
> 00020120: 00000241 00000200 00000000 00000000
> ...
> 00020400: 00000032 00000000 00000000 00000001
> 00020410: 00000000 00000069 00000000 00000000
> ...
> 00020440: 00000000 00000080 00000000 00000000
> ...
> 00020480: 00000087 0000000a 00000000 00000000
> ...
> 000204c0: 00000068 00000000 00000080 0000005d
> ...
> 000207f0: 00000000 00000000 000010de 00000001
> ...
> 
> # ./nvapeek e114 20
> 0000e114: 00000001 00000000 0000021c 030000bd
> 0000e124: 10000000 00000000 0001010e 00000000

Thank you TM and Ilia!

Ok, so the bits controlling the function to drive the fan to maximum speed when the temperature exceed a certain threshold are set on your card.

Let's try setting them on Nouveau and see if we can change the fan speed afterwards:

nvapoke e120 830000bd

If the theory is right, then the fan speed should drop considerably and Nouveau's auto fan management should then work fine. If this is not the case then I would like to try the following command when the blob is running (execute it several times in a row):

nvapoke e120 8300021c

If the fan speed goes to 100% for a second then it means we screwed up some initial configuration :)
Comment 12 TM 2015-01-03 18:13:33 UTC
Hi again,

I tried both things you suggested, but none of those actions had an effect on the fan speed. Do you have any other ideas? I'd be happy to continue testing your suggestions!
Comment 13 TM 2015-01-18 21:41:17 UTC
Quick update, not sure if valuable or not:
I changed some Xorg config, therefore remembered this thread, and just went and tried the last advice concerning the situation using nouveau again. Here are the steps:

# ./nvapoke e120 830000bd
# cd /sys/class/drm/card0/device/hwmon/hwmon0
# echo 1 > pwm1_enable
# echo 65 > pwm1
# cat pwm1_min
65

# cd /path/to/envytools/nva
# ./nvapoke e120 830000bd
# cd /sys/class/drm/card0/device/hwmon/hwmon0
# cat pwm1_min 
35
# cat pwm1
14

I found it odd that the state of pwm1 is below the min after issuing the nvapoke command.
Apart from that, the fan speed again didn't change :/
Hope it helps, if not: sorry for the noise.
Comment 14 JonS 2015-01-22 09:30:43 UTC
Hi all,

Came across this thread while googling for a similar problem on a CentOS 7 install.

I have a solution.

echo 10000 > /sys/class/drm/controlD64/device/hwmon/hwmon0/temp1_auto_point1_temp

echo 50000 > /sys/class/drm/controlD64/device/hwmon/hwmon0/temp1_auto_point1_temp

Worth noting that 50000 is a custom value, but you could use the 90000 it is originally set to.

I put that in rc.local

Anyway, it does the trick for me.

Not sure why it is necessary, but I'm sure nouveau people upstream might be able to identify why it's necessary and code a proper fix. In the mean-time this solves the issue for me.
Comment 15 TM 2015-02-02 01:11:55 UTC
Hi all,

the trick provided by JonS also doesn't work for me - perhaps my card is just broken in some way? I guess it would be difficult to firmly detect that. Dear developers, do you have any more ideas how we could proceed?
Comment 16 Martin Peres 2015-02-16 20:28:10 UTC
(In reply to JonS from comment #14)
> Hi all,
> 
> Came across this thread while googling for a similar problem on a CentOS 7
> install.
> 
> I have a solution.
> 
> echo 10000 >
> /sys/class/drm/controlD64/device/hwmon/hwmon0/temp1_auto_point1_temp
> 
> echo 50000 >
> /sys/class/drm/controlD64/device/hwmon/hwmon0/temp1_auto_point1_temp
> 
> Worth noting that 50000 is a custom value, but you could use the 90000 it is
> originally set to.
> 
> I put that in rc.local
> 
> Anyway, it does the trick for me.
> 
> Not sure why it is necessary, but I'm sure nouveau people upstream might be
> able to identify why it's necessary and code a proper fix. In the mean-time
> this solves the issue for me.

Hey,

Seems like your vbios is broken. Please open another bug report as it is not the same bug as this one. I will have a look at it!
Comment 17 Martin Peres 2015-02-16 21:40:08 UTC
(In reply to TM from comment #13)
> Quick update, not sure if valuable or not:
> I changed some Xorg config, therefore remembered this thread, and just went
> and tried the last advice concerning the situation using nouveau again. Here
> are the steps:
> 
> # ./nvapoke e120 830000bd
> # cd /sys/class/drm/card0/device/hwmon/hwmon0
> # echo 1 > pwm1_enable
> # echo 65 > pwm1
> # cat pwm1_min
> 65
> 
> # cd /path/to/envytools/nva
> # ./nvapoke e120 830000bd
> # cd /sys/class/drm/card0/device/hwmon/hwmon0
> # cat pwm1_min 
> 35
> # cat pwm1
> 14
> 
> I found it odd that the state of pwm1 is below the min after issuing the
> nvapoke command.
> Apart from that, the fan speed again didn't change :/
> Hope it helps, if not: sorry for the noise.

I'm very sorry for not giving you any feedback on this. I don't understand what is going on now but that's very helpful! I'll spend my wednesday evening on this!
Comment 18 TM 2015-03-03 00:36:09 UTC
Quick update: I see this message as well (http://lists.freedesktop.org/archives/nouveau/2015-January/019778.html):

[    1.625309] nouveau W[   VBIOS][0000:05:00.0] M0203T not found
[    1.625311] nouveau W[   VBIOS][0000:05:00.0] M0203E not matched!

In case that helps...
Comment 19 TM 2015-04-06 00:55:39 UTC
Ping :)

(no need to rush; just making sure this doesn't get forgotten)
Comment 20 Martin Peres 2015-04-08 06:49:23 UTC
(In reply to TM from comment #19)
> Ping :)
> 
> (no need to rush; just making sure this doesn't get forgotten)

Another bug report provided me with the necessary information. I also got access to one of the GPUs with this weird behavior. I am pretty sure you have one of those weird cards where there is a divisor that I cannot find.

To be sure about this, you can try to go to manual fan management and then run:
nvapoke e118 80000005

This should silence the fan. Try to adjust the last value (5) to see how much you can go to reach the maximum speed. Please also make sure that the fan is still running and the temperature is not skyrocketing.

Fixing this problem is still my number one priority for nouveau but I don't work on it too often. Fixing bugs isn't always fun...
Comment 21 TM 2015-05-03 01:55:34 UTC
Hi, sorry for the late reply - Busy times...

Unfortunately the new command didn't work either :/

I also re-installed the proprietary driver again, for testing purposes. The "nvidia-settings" GUI still reported a fan speed of 35% (as always), but this time (I guess I missed that information before) I saw a line saying that adjusting the fan speed is not supported. Were you aware of this already? I think I wasn't...

Scrolling through this bug report again I saw that you were suspecting a adt7473 chip to be placed on the graphics card board. I had a look at it and couldn't find one. Here are the first lines written on all chips I could find on the board (apart from memory chips):
- AT24C16B7
- AZ358M
- RT9214
- RT9259
- NXP 74HC08D

I hope that helps in some way :/


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.