Bug 98852 - Nvidia graphics card fan not running or to slow, danger of overheating
Summary: Nvidia graphics card fan not running or to slow, danger of overheating
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-25 09:10 UTC by Egon Niessner
Modified: 2016-12-16 19:18 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Output of the dmesg command (53.56 KB, text/plain)
2016-11-25 09:12 UTC, Egon Niessner
no flags Details
installed nouveau drivers (568 bytes, text/plain)
2016-11-25 09:14 UTC, Egon Niessner
no flags Details
Output of the lspci command (2.06 KB, text/plain)
2016-11-25 09:15 UTC, Egon Niessner
no flags Details
content of the yast-hwscreen tool (339.62 KB, text/plain)
2016-11-25 09:18 UTC, Egon Niessner
no flags Details
Content of vbios.rom (60.00 KB, application/octet-stream)
2016-11-25 11:56 UTC, Egon Niessner
no flags Details
Session in a Terminal Window with the attempt to start the graphic card fan (14.76 KB, text/plain)
2016-11-26 17:21 UTC, Egon Niessner
no flags Details
Content of the vbios (1.00 MB, application/octet-stream)
2016-12-08 11:58 UTC, Egon Niessner
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Egon Niessner 2016-11-25 09:10:02 UTC
I upgraded from Opensuse 42.1 to 42.2.

I have a nvidia geforce GT240 grapics card in my system.
When i start opensuse Leap 42.2, the fan of the graphics card 
stops, it starts running very slowly, after the cooler of the card
has become very hot.

During the installation of opensuse 42.2 the nouveau driver is used.

(When in the dual boot system windows 10 is booted,
or a reboot is done and the system goes into the bios,
the fan spins up immediately.)
Comment 1 Egon Niessner 2016-11-25 09:12:53 UTC
Created attachment 128183 [details]
Output of the dmesg command
Comment 2 Egon Niessner 2016-11-25 09:14:17 UTC
Created attachment 128184 [details]
installed nouveau drivers
Comment 3 Egon Niessner 2016-11-25 09:15:02 UTC
Created attachment 128185 [details]
Output of the lspci command
Comment 4 Egon Niessner 2016-11-25 09:18:35 UTC
Created attachment 128186 [details]
content of the yast-hwscreen tool
Comment 5 Egon Niessner 2016-11-25 09:25:27 UTC
How can I set the speed of the graphic card fan to full speed or
an certain value greater 0 rpms ?
Comment 6 Karol Herbst 2016-11-25 09:45:03 UTC
can you attach your vbios.rom file from /sys/kernel/debug/dri/0/vbios.rom
Comment 7 Egon Niessner 2016-11-25 11:56:25 UTC
Created attachment 128188 [details]
Content of vbios.rom
Comment 8 Martin Peres 2016-11-25 18:48:52 UTC
The documentation on how to set the fan speed is located here: https://www.kernel.org/doc/Documentation/thermal/nouveau_thermal

However, it is quite likely that we do not configure the PWM controler properly. Looking at your vbios, I see that the expected frequency is 42kHz which is quite high and may be the reason why your fan would not rotate properly (unless it is set to 100%).

What would be really useful, would be for you to try to force the fan to 100%, by following the guide I sent you.
Comment 9 Martin Peres 2016-11-25 18:54:06 UTC
(In reply to Martin Peres from comment #8)
> The documentation on how to set the fan speed is located here:
> https://www.kernel.org/doc/Documentation/thermal/nouveau_thermal
> 
> However, it is quite likely that we do not configure the PWM controler
> properly. Looking at your vbios, I see that the expected frequency is 42kHz
> which is quite high and may be the reason why your fan would not rotate
> properly (unless it is set to 100%).
> 
> What would be really useful, would be for you to try to force the fan to
> 100%, by following the guide I sent you.

I guess you should also try other fan speeds too, report back when you have tried.
The expected result would be for the fan to barely spin at 10% and gradually spin up until reaching 100%. At 100%, the noise should be quite loud.

If this is the behaviour you see, then I would like you to set the fan mode to automatic (mode 2) and monitor the fan speed set by Nouveau by reading pwm1.

If the fan speed does not change with temperature, it would be useful to paste here kernel logs with nouveau.debug="ptherm=debug" set in your kernel command line (you can edit this from grub directly, when booting up).
Comment 10 Egon Niessner 2016-11-26 17:18:29 UTC
I list here the commands I have given as root.

But the fan did not start.

Have I used wrong values ?
 
I'am a little bit confused by the description where is written, that
some parameter are in milli-degree, but I can't recognize what of the parameters this are.

So I played around with values from 1 to 1000 to hit as blind user
a value, where the fan could start.

In the appendix hwmon-handling  is the protocol of my session.
Comment 11 Egon Niessner 2016-11-26 17:21:19 UTC
Created attachment 128205 [details]
Session in a Terminal Window with the attempt to start the graphic card fan
Comment 12 Egon Niessner 2016-12-05 10:23:59 UTC
I did some further tests with setting of the boot time parameter
in the yast kernel parameter line.

Also with 
 
nouveau.runpm=0  or
nouveau.runpm=1  
the fan on the graphic card does not start.

Beside this I use for PC tests also the systemrescue-Distribution
where DVD images can be found on 
http://www.system-rescue-cd.org/Download

I observed that until version 4.6.1 the fan on the graphic card was running.
Starting with version 4.7.1 the fan stops after loading of the rescue system.
Comment 13 Martin Peres 2016-12-07 04:43:59 UTC
(In reply to Egon Niessner from comment #12)
> I did some further tests with setting of the boot time parameter
> in the yast kernel parameter line.
> 
> Also with 
>  
> nouveau.runpm=0  or
> nouveau.runpm=1  
> the fan on the graphic card does not start.
> 
> Beside this I use for PC tests also the systemrescue-Distribution
> where DVD images can be found on 
> http://www.system-rescue-cd.org/Download
> 
> I observed that until version 4.6.1 the fan on the graphic card was running.
> Starting with version 4.7.1 the fan stops after loading of the rescue system.

This is really weird as there are no changes related to the fan between 4.6 and 4.7 :s

I guess, at this point, we need to check out what nouveau is doing versus what the blob is doing.

Please download and compile envytools[0] and run the following command when running Nouveau, then when using the proprietary driver:
 - nvapeek e114 10

Please report back.

[0] https://github.com/envytools/envytools/commits/master
Comment 14 Egon Niessner 2016-12-07 11:16:59 UTC
I downloaded the envytools Software from your link and tried an installation.
I got following error messages:

inux-234d:/home/nie1/envytools/envytools-master # cmake . -G Ninja
CMake Error: CMake was unable to find a build program corresponding to "Ninja".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
-- Configuring incomplete, errors occurred!




linux-234d:/home/nie1/envytools/envytools-master # cmake .
CMake Error: CMake was unable to find a build program corresponding to "Ninja".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a different build tool.
CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
-- Configuring incomplete, errors occurred!


On the system the whole linux kernel development environment is installed
and all packages mentioned in the envytools description.
What have I to do, that the compilation is possible ?
Comment 15 Karol Herbst 2016-12-07 12:14:37 UTC
(In reply to Egon Niessner from comment #14)
> I downloaded the envytools Software from your link and tried an installation.
> I got following error messages:
> 
> inux-234d:/home/nie1/envytools/envytools-master # cmake . -G Ninja
> CMake Error: CMake was unable to find a build program corresponding to
> "Ninja".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a
> different build tool.
> CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
> CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
> -- Configuring incomplete, errors occurred!
> 
> 
> 
> 
> linux-234d:/home/nie1/envytools/envytools-master # cmake .
> CMake Error: CMake was unable to find a build program corresponding to
> "Ninja".  CMAKE_MAKE_PROGRAM is not set.  You probably need to select a
> different build tool.
> CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
> CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
> -- Configuring incomplete, errors occurred!
> 
> 
> On the system the whole linux kernel development environment is installed
> and all packages mentioned in the envytools description.
> What have I to do, that the compilation is possible ?

drop the "-G Ninja"
Comment 16 Pierre Moreau 2016-12-07 12:19:54 UTC
There is a ninja package on Arch Linux, I would guess that something similar exists on openSUSE.

Try with `cmake -G"Unix Makefile" .` instead, or clear the CMake cache (delete CMakeCache.txt and CMakeFiles should be enough I think), and try again to run `cmake .`.
Comment 17 Egon Niessner 2016-12-07 14:30:58 UTC
Thanks for your help!
With `cmake -G"Unix Makefiles" .
I could translate the envytools Package.

Here the Output with the nouveau driver and not runnung fan:
nvapeek e114 10
0000e114: 00000001 00000000 00000000 80000000


Here the Output with the original nvidia driver
(Installed is the original NVIDIA Package
NVIDIA-Linux-x86_64-340.96.run)

nvapeek e114 120
0000e114: 00000001 00000000 00000020 00000003
0000e124: 10000000 00000000 0001010e 00000000
0000e134: 000f4240 00000007 10000000 00000000
0000e144: 0001010e 00000000 000f4240 00000007
0000e154: 10000000 00000000 0001010e 00000000
0000e164: 000f4240 00000007 10000000 00000000
0000e174: 0001010e 00000000 000f4240 00000007
0000e184: 00000012 00000000 00010000 00000000
0000e194: 00000000 00000000 00000000 0000000d
0000e1a4: 00000001 0000000c 0022ffff 00000000
0000e1b4: 00000001 fe7fffff 00003047 00000002
...
0000e1d4: 00000000 0000001f 00000001 00000000
0000e1e4: 00000000 00000001 00000003 00000003
0000e1f4: 0000000c 00000002 00000002 0003103c
0000e204: 00000002 00000000 00000000 00000000
Comment 18 Martin Peres 2016-12-08 10:28:00 UTC
(In reply to Egon Niessner from comment #17)
> Thanks for your help!
> With `cmake -G"Unix Makefiles" .
> I could translate the envytools Package.
> 
> Here the Output with the nouveau driver and not runnung fan:
> nvapeek e114 10
> 0000e114: 00000001 00000000 00000000 80000000
> 
> 
> Here the Output with the original nvidia driver
> (Installed is the original NVIDIA Package
> NVIDIA-Linux-x86_64-340.96.run)
> 
> nvapeek e114 120
> 0000e114: 00000001 00000000 00000020 00000003

Exactly what I did not want to see :s

Could you attach here your vbios? You can get it by running: nvagetbios -s prom > vbios.rom

Please also add the output of: nvapeek 101000

Thanks in advance, you uncovered a deeper bug!
Comment 19 Egon Niessner 2016-12-08 11:57:06 UTC
Here the output of the nvapeek command:

nvapeek e114 10
0000e114: 00000001 00000000 00000000 80000000

I added the vbios.com as attachment.
Comment 20 Egon Niessner 2016-12-08 11:58:55 UTC
Created attachment 128379 [details]
Content of the vbios
Comment 21 Martin Peres 2016-12-08 12:04:46 UTC
(In reply to Egon Niessner from comment #19)
> Here the output of the nvapeek command:
> 
> nvapeek e114 10
> 0000e114: 00000001 00000000 00000000 80000000
> 
> I added the vbios.com as attachment.

no, you should run: nvapeek 101000 :)
Comment 22 Egon Niessner 2016-12-08 12:17:36 UTC
Sorry, the last output was not the wished one.
Now the correct output:

nvapeek 101000
00101000: 80408c8e
Comment 23 Martin Peres 2016-12-09 10:11:00 UTC
Thanks for being super helpful and responsive! We compute the clock tree correctly so now, the problem is that we do not detect which PWM controller we need to write to.

So, let's have a look at nouveau's code now and see what could go wrong :)
Comment 24 Martin Peres 2016-12-09 10:26:12 UTC
(In reply to Martin Peres from comment #23)
> Thanks for being super helpful and responsive! We compute the clock tree
> correctly so now, the problem is that we do not detect which PWM controller
> we need to write to.
> 
> So, let's have a look at nouveau's code now and see what could go wrong :)

Nothing obvious comes to mind. Especially not something related to the fan. If it is a regression between 4.6 and 4.7, is it too much to ask for a bisect? :s
Comment 25 Egon Niessner 2016-12-09 12:56:45 UTC
Hello,
what have I to do, to produce a bisect?

Have I to install a special version of a kernel?

I can make an installation on a other harddisk.
Comment 26 Egon Niessner 2016-12-16 19:18:46 UTC
I have got a second Geforce GT240 card, which shows the same Fan-symptoms.
Would it help you, if I send the pcie-card to you for tests in your own
test-equipment ?

If you will do own tests, you have not to send the card back to me.
Regards
Egon


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.