Bug 80901 - [NVCF] PWM fan speed too high
Summary: [NVCF] PWM fan speed too high
Status: RESOLVED MOVED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
: 80900 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-07-04 07:33 UTC by Invalid Invalid
Modified: 2019-12-04 08:46 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Dmesg (88.40 KB, text/plain)
2014-07-04 07:34 UTC, Invalid Invalid
no flags Details
vbios.rom (60.00 KB, text/plain)
2014-07-04 07:34 UTC, Invalid Invalid
no flags Details
vbios.rom (60.00 KB, application/octet-stream)
2014-07-04 07:35 UTC, Invalid Invalid
no flags Details
xorg.log (38.45 KB, text/plain)
2014-07-04 07:36 UTC, Invalid Invalid
no flags Details
sensors.nouveau (584 bytes, text/plain)
2014-07-04 07:57 UTC, Invalid Invalid
no flags Details
sensors.nvidia_levels (583 bytes, text/plain)
2014-07-04 07:59 UTC, Invalid Invalid
no flags Details
nvidia-smi output (1.28 KB, text/plain)
2014-07-04 08:17 UTC, Invalid Invalid
no flags Details
lspci -vv output (2.77 KB, text/plain)
2014-07-04 08:20 UTC, Invalid Invalid
no flags Details
PATCH: add LINEAR_MIN and LINEAR_MAX to sysfs (5.07 KB, text/plain)
2014-07-08 12:01 UTC, Invalid Invalid
no flags Details
PATCH v2 [1/2]: add LINEAR_MIN and LINEAR_MAX to sysfs (5.61 KB, text/plain)
2014-07-10 16:48 UTC, Invalid Invalid
no flags Details
PATCH v2 [1/2]: add LINEAR_MIN and LINEAR_MAX to sysfs (5.74 KB, text/plain)
2014-07-10 16:54 UTC, Invalid Invalid
no flags Details
PATCH v2 [2/2]: add new attributes to documentation (1.19 KB, text/plain)
2014-07-10 16:55 UTC, Invalid Invalid
no flags Details
Hwmon email (1.32 KB, text/plain)
2014-07-14 16:49 UTC, Invalid Invalid
no flags Details
GeForce GT 630 vbios.rom (60.50 KB, application/octet-stream)
2014-12-24 12:27 UTC, K.-P. Schrage
no flags Details
vbios.rom from NVIDIA Corporation GF106GL (59.50 KB, application/octet-stream)
2014-12-25 00:48 UTC, Lars E Pettersson
no flags Details

Description Invalid Invalid 2014-07-04 07:33:44 UTC
Since updating to kernel 3.15, the pwm fan speed of my 550 Ti card is too high (above 35% in all situations) when compared with the proprietary NVIDIA driver (that hovers around 30%).
The only way to reduce it to silence once again is to manually change the pwm1_max value under sysfs to 30 (by default it's 100).
Comment 1 Invalid Invalid 2014-07-04 07:34:17 UTC
Created attachment 102244 [details]
Dmesg
Comment 2 Invalid Invalid 2014-07-04 07:34:49 UTC
Created attachment 102245 [details]
vbios.rom
Comment 3 Invalid Invalid 2014-07-04 07:35:25 UTC
Created attachment 102246 [details]
vbios.rom
Comment 4 Invalid Invalid 2014-07-04 07:36:13 UTC
Created attachment 102247 [details]
xorg.log
Comment 5 Ilia Mirkin 2014-07-04 07:52:31 UTC
*** Bug 80900 has been marked as a duplicate of this bug. ***
Comment 6 Invalid Invalid 2014-07-04 07:57:02 UTC
Created attachment 102251 [details]
sensors.nouveau
Comment 7 Invalid Invalid 2014-07-04 07:59:54 UTC
Created attachment 102252 [details]
sensors.nvidia_levels
Comment 8 Invalid Invalid 2014-07-04 08:01:23 UTC
Sorry, i forgot to mention: temperature is around 40-50°C in both cases. 
This is not an heavy-duty machine, so usually it is dead silent.
Comment 9 Invalid Invalid 2014-07-04 08:17:53 UTC
Created attachment 102255 [details]
nvidia-smi output
Comment 10 Invalid Invalid 2014-07-04 08:20:15 UTC
Created attachment 102256 [details]
lspci -vv output
Comment 11 Martin Peres 2014-07-04 08:48:09 UTC
Thanks for all this information. It seems like NVIDIA changed the default minimum temperature before increasing the fan speed (http://code.woboq.org/linux/linux/drivers/gpu/drm/nouveau/core/subdev/therm/fan.c.html#195).

Based on your information, I would guess it is set to 60°C instead of the 40°C I saw on some card. It is very possible that they set a different value per chipset.

I'll check again with a recent version of the blob what is the default temperature at which the fan speed starts being increased but I don't think the fix should be to bump the value (that I will set to the minimum value I found across the boards). The proper fix would be to let users change this value in sysfs.

I'll keep the bug open as a reminder but my time is very limited this summer. If you feel like writing the patch (in nouveau_hwmon.c), I would gladly review it!

Thanks for reporting :)
Comment 12 Invalid Invalid 2014-07-08 12:01:58 UTC
Created attachment 102433 [details]
PATCH: add LINEAR_MIN and LINEAR_MAX to sysfs

Tentative patch. 
NOTE: It compiles, but I've not tested this yet, since at the moment I don't have a NVIDIA machine around.
Comment 13 Martin Peres 2014-07-08 12:41:47 UTC
Comment on attachment 102433 [details]
PATCH: add LINEAR_MIN and LINEAR_MAX to sysfs

Thanks, this patch is perfectly sound!

However, it doesn't follow the sysfs interface of hwmon as defined here: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface

I don't particularly like this interface, but I've tried to stay as close to it as possible. One possibility could be to use the trip points to expose linear_min/max, however, we would be bending the definition of trip point by a long shot (we are not supposed to scale linearly between trip points, at least that's how NVIDIA defines trip points, that's a question worth asking to the hwmon guys).

If we are to keep your patch more or less intact, you would have to move your the end result to nouveau_sysfs.c. You would also need to change the name to temp1_fan_linear_min/max, to improve the clarity of what values are expected in there :) Finally, you would need to update the documentation (http://cgit.freedesktop.org/nouveau/linux-2.6/tree/Documentation/thermal/nouveau_thermal).

Thanks again for your interest in fixing this!
Comment 14 Invalid Invalid 2014-07-09 16:05:27 UTC
The problem I see with trip-points is that those allow to set a fixed PWM value when the sensors detect a certain temperature. The nouveau driver instead raises the fan speed "continuously" after a certain temperature is reached (would we need infinite trip-points for that?). 
Please correct any error in my understanding here :)

Maybe we can use the trip points as a start/end point and then move autonomously disregarding the point[0-*]_pwm value?


For now, if you agree, I will move the sysfs values as suggested (and change the documentation), hopefully before/during next weekend. 
So this bug will still have an almost "clean" patch for anyone (included me) who want it.
Then I will ask around in the lm-sensors mailing list (is it the correct one?). Hopefully someone can point me to a better solution.

Thanks for the review! :)
Comment 15 Martin Peres 2014-07-09 16:35:04 UTC
(In reply to comment #14)
> The problem I see with trip-points is that those allow to set a fixed PWM
> value when the sensors detect a certain temperature. The nouveau driver
> instead raises the fan speed "continuously" after a certain temperature is
> reached (would we need infinite trip-points for that?). 
> Please correct any error in my understanding here :)
> 
> Maybe we can use the trip points as a start/end point and then move
> autonomously disregarding the point[0-*]_pwm value?
> 
> 
> For now, if you agree, I will move the sysfs values as suggested (and change
> the documentation), hopefully before/during next weekend. 
> So this bug will still have an almost "clean" patch for anyone (included me)
> who want it.
> Then I will ask around in the lm-sensors mailing list (is it the correct
> one?). Hopefully someone can point me to a better solution.
> 
> Thanks for the review! :)

Yes, you understood my review and have a perfect todo list! Good luck with it :)
Comment 16 Invalid Invalid 2014-07-10 16:48:55 UTC
Created attachment 102562 [details]
PATCH v2 [1/2]: add LINEAR_MIN and LINEAR_MAX to sysfs
Comment 17 Invalid Invalid 2014-07-10 16:54:57 UTC
Created attachment 102563 [details]
PATCH v2 [1/2]: add LINEAR_MIN and LINEAR_MAX to sysfs
Comment 18 Invalid Invalid 2014-07-10 16:55:29 UTC
Created attachment 102564 [details]
PATCH v2 [2/2]: add new attributes to documentation
Comment 19 Invalid Invalid 2014-07-10 16:56:09 UTC
Those new patches should (hopefully) be cleaner that the previous one.
Comment 20 Martin Peres 2014-07-12 22:23:56 UTC
(In reply to comment #19)
> Those new patches should (hopefully) be cleaner that the previous one.

They are and they seem to work as expected. I would now like to hear back from the hwmon guys. Could you contact them (please CC: nouveau@lists.freedesktop.org)?
Comment 21 Invalid Invalid 2014-07-14 16:49:20 UTC
Created attachment 102785 [details]
Hwmon email

Email sent. Text in attachment for reference.
Comment 22 Martin Peres 2014-08-22 12:01:49 UTC
(In reply to comment #21)
> Created attachment 102785 [details]
> Hwmon email
> 
> Email sent. Text in attachment for reference.

Any update on this?
Comment 23 Invalid Invalid 2014-08-25 11:36:23 UTC
Sorry but I find myself short on time since currently.
I will start working on this again as soon as I can.
Comment 24 Martin Peres 2014-08-25 12:54:57 UTC
(In reply to comment #23)
> Sorry but I find myself short on time since currently.
> I will start working on this again as soon as I can.

Ok, good luck! TTYL then ;)
Comment 25 K.-P. Schrage 2014-12-23 17:07:57 UTC
It seems that I am struck by this very bug as well since several months now.

Fedora 21, kernel-3.17.7-300.fc21.x86_64 (but it started on Fedora 20, around kernel-3.15.5-200.fc20.x86_64)

Graphics card:
01:00.0 VGA compatible controller: NVIDIA Corporation GF108 [GeForce GT 630] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 8a90
        Flags: bus master, fast devsel, latency 0, IRQ 43
        Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e8000000 (64-bit, prefetchable) [size=128M]
        Memory at f0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        Expansion ROM at f7000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Kernel driver in use: nouveau
        Kernel modules: nouveau

It seems as if problems started with this commit:
http://lists.freedesktop.org/archives/nouveau/2014-March/016589.html
At least reverting this patch to the nouveau source is a workaround, see discussion on Redhat bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=1121331

BTW, isn't bug 84721 a duplicate of this one? It is the very same NVIDIA controller as mine that is affected.
Comment 26 Martin Peres 2014-12-23 22:00:29 UTC
(In reply to K.-P. Schrage from comment #25)
> It seems that I am struck by this very bug as well since several months now.
> 
> Fedora 21, kernel-3.17.7-300.fc21.x86_64 (but it started on Fedora 20,
> around kernel-3.15.5-200.fc20.x86_64)
> 
> Graphics card:
> 01:00.0 VGA compatible controller: NVIDIA Corporation GF108 [GeForce GT 630]
> (rev a1) (prog-if 00 [VGA controller])
>         Subsystem: Micro-Star International Co., Ltd. [MSI] Device 8a90
>         Flags: bus master, fast devsel, latency 0, IRQ 43
>         Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
>         Memory at e8000000 (64-bit, prefetchable) [size=128M]
>         Memory at f0000000 (64-bit, prefetchable) [size=32M]
>         I/O ports at e000 [size=128]
>         Expansion ROM at f7000000 [disabled] [size=512K]
>         Capabilities: [60] Power Management version 3
>         Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
>         Capabilities: [78] Express Endpoint, MSI 00
>         Capabilities: [b4] Vendor Specific Information: Len=14 <?>
>         Capabilities: [100] Virtual Channel
>         Capabilities: [128] Power Budgeting <?>
>         Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1
> Len=024 <?>
>         Kernel driver in use: nouveau
>         Kernel modules: nouveau
> 
> It seems as if problems started with this commit:
> http://lists.freedesktop.org/archives/nouveau/2014-March/016589.html
> At least reverting this patch to the nouveau source is a workaround, see
> discussion on Redhat bugzilla:
> https://bugzilla.redhat.com/show_bug.cgi?id=1121331
> 
> BTW, isn't bug 84721 a duplicate of this one? It is the very same NVIDIA
> controller as mine that is affected.

It depends on your problem. If by "too high", you mean fan at 100% constantly, then the bug is a duplicate. If you mean the default fan speed is 35% instead of 30%, then the bugs are not related and this is a bug hijacking :p Please send your vbios in your answer and I'll or will not advise you to open a new bug report.

I am currently in the process of moving to another home, but I can still write code for you to try :)
Comment 27 K.-P. Schrage 2014-12-24 12:27:26 UTC
Created attachment 111288 [details]
GeForce GT 630 vbios.rom

vbios.rom: cat /sys/kernel/debug/dri/0/vbios.rom >vbios.rom
Comment 28 K.-P. Schrage 2014-12-24 12:33:38 UTC
(In reply to Martin Peres from comment #26)

> > BTW, isn't bug 84721 a duplicate of this one? It is the very same NVIDIA
> > controller as mine that is affected.
> 
> It depends on your problem. If by "too high", you mean fan at 100%
> constantly, then the bug is a duplicate. If you mean the default fan speed
> is 35% instead of 30%, then the bugs are not related and this is a bug
> hijacking :p Please send your vbios in your answer and I'll or will not
> advise you to open a new bug report.
> 
> I am currently in the process of moving to another home, but I can still
> write code for you to try :)

Thanks for caring ... yes, the fan is running at full speed constantly.
Comment 29 Lars E Pettersson 2014-12-25 00:46:41 UTC
I am the original poster of the Fedora bug report mentioned by K.-P. Schrage,
https://bugzilla.redhat.com/show_bug.cgi?id=1121331

I have the exact same problem as the original poster of this bug report, but using another graphical card. I.e. the fan on the card works nice using kernels below 3.15, but all kernels from 3.15 and up will peg the fan.

Reverting the commit mentioned by K.-P. Schrage,
http://lists.freedesktop.org/archives/nouveau/2014-March/016589.html
solves the issue for me. The fan returns to how it behaved before kernel 3.15.

I am using a Nvidia GF106GL (Quadro 2000) card, lspci:
01:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro 2000] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: NVIDIA Corporation Device 084a
	Flags: bus master, fast devsel, latency 0, IRQ 30
	Memory at f0000000 (32-bit, non-prefetchable) [size=32M]
	Memory at e0000000 (64-bit, prefetchable) [size=128M]
	Memory at e8000000 (64-bit, prefetchable) [size=64M]
	I/O ports at e000 [size=128]
	Expansion ROM at f2000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [b4] Vendor Specific Information: Len=14 <?>
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau
	Kernel modules: nouveau

At the moment I am using the following kernel
Linux tux 3.17.7-300.fc21.x86_64 #1 SMP Wed Dec 17 03:08:44 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

vbios.rom will follow...
Comment 30 Lars E Pettersson 2014-12-25 00:48:33 UTC
Created attachment 111308 [details]
vbios.rom from NVIDIA Corporation GF106GL
Comment 31 Lars E Pettersson 2015-02-08 00:27:54 UTC
Any news on this issue? I just had to re-compile the module for yet another kernel update...

Is more information needed to solve the issue?
Comment 32 K.-P. Schrage 2015-02-10 12:43:40 UTC
(In reply to Lars E Pettersson from comment #31)
> Any news on this issue? I just had to re-compile the module for yet another
> kernel update...

Yes, for me as well, the issue has reached the 3.18 kernel line (now on 3.18.3-201.fc21.x86_64).
Comment 33 Martin Peres 2015-02-25 22:02:49 UTC
Hey guys,

I may finally have managed to reproduce your bug. To check that, I need you to install envytools and send me the result of the following command:

nvapeek e114 10

When selecting the manual fan management mode, you should be able to bring down the fan speed by running:

nvapoke e118 80000005 or nvapoke e120 80000005

In any case, please open a separate bug report as this bug clearly is not related to your bug.
Comment 34 K.-P. Schrage 2015-02-26 10:24:14 UTC
(In reply to Martin Peres from comment #33)

Hello, Martin,

thank you for caring!


# nvapeek e114 10
0000e114: 0000021c 000000d8 00000001 00000000

(that's with the latest nouveau driver from darktama)

After enabling manual fan control mode (pwm1_enable = 1),
'nvapoke e118 80000005' somewhat reduces fan speed audibly, but still seems to be too high (pwm1 shows a value of 0).

FWIW, after 'nvapoke e118 80000005", the output of the nvapeek command has changed to
0000e114: 0000021c 00000005 00000001 00000000
Comment 35 Martin Peres 2015-02-26 21:27:10 UTC
(In reply to K.-P. Schrage from comment #34)
> (In reply to Martin Peres from comment #33)
> 
> Hello, Martin,
> 
> thank you for caring!
> 
> 
> # nvapeek e114 10
> 0000e114: 0000021c 000000d8 00000001 00000000
> 
> (that's with the latest nouveau driver from darktama)
> 
> After enabling manual fan control mode (pwm1_enable = 1),
> 'nvapoke e118 80000005' somewhat reduces fan speed audibly, but still seems
> to be too high (pwm1 shows a value of 0).
> 
> FWIW, after 'nvapoke e118 80000005", the output of the nvapeek command has
> changed to
> 0000e114: 0000021c 00000005 00000001 00000000

Ok, try to boot with nouveau blacklisted then run nvapeek e114 10 again and send me the result. We may be on to something here.
Comment 36 K.-P. Schrage 2015-02-27 09:16:44 UTC
(In reply to Martin Peres from comment #35)
 
> Ok, try to boot with nouveau blacklisted then run nvapeek e114 10 again and
> send me the result. We may be on to something here.


# nvapeek e114 10
0000e114: 0000021c 00000002 00000001 00000000

(nouveau hopefully killed: blacklist.conf, grub commandline, dracut)
Comment 37 Martin Peres 2015-02-27 09:23:52 UTC
(In reply to K.-P. Schrage from comment #36)
> (In reply to Martin Peres from comment #35)
>  
> > Ok, try to boot with nouveau blacklisted then run nvapeek e114 10 again and
> > send me the result. We may be on to something here.
> 
> 
> # nvapeek e114 10
> 0000e114: 0000021c 00000002 00000001 00000000
> 
> (nouveau hopefully killed: blacklist.conf, grub commandline, dracut)

It worked. Set fan management to manual then nvapoke e118 80000002 and you'll get quietness.

I kind of have the same problem with the other nvc1 I have access to at work. I'll be digging into this to fix this problem for good. Good that I got access to this board, your bug would have been a mystery otherwise...

Thanks!
Comment 38 K.-P. Schrage 2015-02-27 10:40:05 UTC
(In reply to Martin Peres from comment #37)

> It worked. Set fan management to manual then nvapoke e118 80000002 and
> you'll get quietness.
> 

Yes, it works (Silence Is Golden ...)

I put the two lines
echo 1 > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/hwmon/hwmon1/pwm1_enable
/usr/local/bin/nvapoke e118 80000002
into /etc/rc.d/rc.local (don't know any other way to make this workaround permanent).

Thanks again!
Comment 39 Lars E Pettersson 2015-02-27 20:08:51 UTC
OK, lets see...

# ./nvapeek e114 10
0000e114: 0000021c 000000f9 00000001 00000000

sometimes I get the following answer:

0000e114: 0000021c 000000fe 00000001 00000000

I then select manual control and do the following:

[root@tux nva]# ./nvapoke e118 80000005
[root@tux nva]# ./nvapeek e114 10
0000e114: 0000021c 00000005 00000001 00000000

Silent fan. Check with 'sensors' command. RPM is 0! Temperature raising! Set fan control back to old setting and fan returns to normal.

Restart with the module blacklisted.

[root@tux nva]# ./nvapeek e114 10
0000e114: 0000021c 000000a2 00000001 00000000

Restart with the nouveau module in place again.

You asked K.-P. Schrage to use 'nvapoke e118 80000002' As I have 000000a2 I instead tried the following:

[root@tux nva]# ./nvapoke e118 800000a2

(If I try 'nvapoke e118 80000002' the RPM goes down to 0 rpm, so I think that 800000a2 is correct for me to use. The RPM then stays slightly above 2000 rpm and the temperature is at about 60-65 degC, as it was before the change in the 3.15 kernel.)

Not sure what all these numbers means though... :)
Comment 40 poma 2015-03-11 15:59:41 UTC
(In reply to K.-P. Schrage from comment #38)
...
> I put the two lines
> echo 1 >
> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/hwmon/hwmon1/pwm1_enable
> /usr/local/bin/nvapoke e118 80000002
> into /etc/rc.d/rc.local (don't know any other way to make this workaround
> permanent).
> 

$ udevadm info -a -p /sys/class/drm/card0 | grep -m2 'ATTRS{device}\|ATTRS{vendor}'
    ATTRS{device}=="0xabcd"
    ATTRS{vendor}=="0x1234"

/etc/udev/rules.d/10-unladen-swallow.rules
ACTION=="add", ATTRS{vendor}=="0x1234", ATTRS{device}=="0xabcd", RUN+="/bin/sh -c '/bin/echo 1 > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/hwmon/hwmon1/pwm1_enable'", RUN+="/usr/local/bin/nvapoke e118 80000002"
Comment 41 poma 2015-03-11 16:35:46 UTC
(In reply to Lars E Pettersson from comment #39)
...
> 
> Not sure what all these numbers means though... :)


'nvapeek' might be something like "nvidia peek(read)"
and
'nvapoke' might be something like "nvidia poke(write)",
MMIO regs.

Addresses 'e114'&'e118' with their 'values' should fall into this range:
- G80:GF100 MMIO map
  0x00e000 	all 	PNVIO 	GPIOs, I2C buses, PWM fan control, and other external devices
OR
- GF100+ MMIO map
  0x00e000 	all 	PNVIO 	GPIOs, I2C buses, PWM fan control, and other external devices

How Martin reached the actual addresses and values, he knows better. :)


Ref.
PEEK and POKE
http://en.wikipedia.org/wiki/PEEK_and_POKE

MMIO
http://en.wikipedia.org/wiki/Memory-mapped_I/O

MMIO register ranges
http://envytools.readthedocs.org/en/latest/hw/mmio.html

PNVIO: external device interface
http://envytools.readthedocs.org/en/latest/hw/io/pnvio.html

Pokémon
http://en.wikipedia.org/wiki/Pokémon
Comment 42 K.-P. Schrage 2015-03-13 12:36:46 UTC
(In reply to poma from comment #40)

> /etc/udev/rules.d/10-unladen-swallow.rules
> ACTION=="add", ATTRS{vendor}=="0x1234", ATTRS{device}=="0xabcd",
> RUN+="/bin/sh -c '/bin/echo 1 >
> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/hwmon/hwmon1/pwm1_enable'",
> RUN+="/usr/local/bin/nvapoke e118 80000002"

Thanks, Poma, this rule (all in one line, with my appropriate device and vendor id's) seems to work correctly, but it only reduces fan speed for a second or so during the boot process, then speed is up again, and the value that nvapoke writes is overwritten (800000d8 instead of 80000002).
Perhaps this rule comes up too early, but changing the prefix number from 10 to e. g. 99 doesn't help.
Comment 43 poma 2015-03-13 16:05:11 UTC
(In reply to K.-P. Schrage from comment #42)
> (In reply to poma from comment #40)
> 
> > /etc/udev/rules.d/10-unladen-swallow.rules
> > ACTION=="add", ATTRS{vendor}=="0x1234", ATTRS{device}=="0xabcd",
> > RUN+="/bin/sh -c '/bin/echo 1 >
> > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/hwmon/hwmon1/pwm1_enable'",
> > RUN+="/usr/local/bin/nvapoke e118 80000002"
> 
> Thanks, Poma, this rule (all in one line, with my appropriate device and
> vendor id's) seems to work correctly, but it only reduces fan speed for a
> second or so during the boot process, then speed is up again, and the value
> that nvapoke writes is overwritten (800000d8 instead of 80000002).
> Perhaps this rule comes up too early, but changing the prefix number from 10
> to e. g. 99 doesn't help.


Exactly, these are oneliners.

There is no the GPU fan here, so I tested the CPU fan, and this is how it works:
/etc/udev/rules.d/10-cpu-fan-manual-mode.rules
RUN+="/bin/sh -c '/bin/echo 1 > /sys/devices/platform/it87.656/pwm1_enable'"

Therefore remove ACTION & ATTRS part, so it runs unconditionally.

Before you reboot, check with:
# udevadm trigger
Comment 44 K.-P. Schrage 2015-03-13 18:51:40 UTC
(In reply to poma from comment #43)

> There is no the GPU fan here, so I tested the CPU fan, and this is how it
> works:
> /etc/udev/rules.d/10-cpu-fan-manual-mode.rules
> RUN+="/bin/sh -c '/bin/echo 1 > /sys/devices/platform/it87.656/pwm1_enable'"
> 
> Therefore remove ACTION & ATTRS part, so it runs unconditionally.
> 
> Before you reboot, check with:
> # udevadm trigger

Now my 10-...rules files in /etc/udev/rules.d/ looks like this:
RUN+="/bin/sh -c '/bin/echo 1 > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/hwmon/hwmon1/pwm1_enable'", RUN+="/usr/local/bin/nvapoke e118 80000002"

During boot, it sounds as if it gets triggered several times (noise-silence-noise-silence), but it ends up with a silent gpu fan when the graphical desktop has started.
Startup logs are flooded with messages like:

nvapoke:473 conflicting memory types e8000000-f0000000 uncached-minus<->write-combining
reserve_memtype failed [mem 0xe8000000-0xefffffff], track uncached-minus, req uncached-minus
Comment 45 poma 2015-03-13 21:16:10 UTC
(In reply to K.-P. Schrage from comment #44)

> Now my 10-...rules file in /etc/udev/rules.d/ looks like this:
> RUN+="/bin/sh -c '/bin/echo 1 >
> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/hwmon/hwmon1/pwm1_enable'",
> RUN+="/usr/local/bin/nvapoke e118 80000002"
> 
> During boot, it sounds as if it gets triggered several times
> (noise-silence-noise-silence), but it ends up with a silent gpu fan when the
> graphical desktop has started.
> Startup logs are flooded with messages like:
> 
> nvapoke:473 conflicting memory types e8000000-f0000000
> uncached-minus<->write-combining
> reserve_memtype failed [mem 0xe8000000-0xefffffff], track uncached-minus,
> req uncached-minus


# rm /etc/udev/rules.d/10-gpu-fan.rules

And try this:
/etc/systemd/system/gpu-fan.service 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Unit]
Description=GPU fan lower speed

[Service]
Type=oneshot
ExecStart=/bin/sh -c '/bin/echo 1 > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/hwmon/hwmon1/pwm1_enable'
ExecStart=/usr/local/bin/nvapoke e118 80000002
StandardOutput=null
StandardError=null

[Install]
WantedBy=basic.target
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# systemctl enable gpu-fan.service
# systemctl start gpu-fan.service
# systemctl status gpu-fan.service

If everything is OK:
# systemctl reboot
...

BOOT
...
# systemctl status gpu-fan.service
Comment 46 K.-P. Schrage 2015-03-14 10:53:43 UTC
(In reply to poma from comment #45)

> And try this:
> /etc/systemd/system/gpu-fan.service 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> [Unit]
> Description=GPU fan lower speed
> 
> [Service]
> Type=oneshot
> ExecStart=/bin/sh -c '/bin/echo 1 >
> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/hwmon/hwmon1/pwm1_enable'
> ExecStart=/usr/local/bin/nvapoke e118 80000002
> StandardOutput=null
> StandardError=null
> 
> [Install]
> WantedBy=basic.target
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> # systemctl enable gpu-fan.service
> # systemctl start gpu-fan.service
> # systemctl status gpu-fan.service
> 
> If everything is OK:
> # systemctl reboot
> ...
> 
> BOOT
> ...
> # systemctl status gpu-fan.service

After reboot:
[root@linux_keller kp]# systemctl status gpu-fan.service
● gpu-fan.service - GPU fan lower speed
   Loaded: loaded (/etc/systemd/system/gpu-fan.service; enabled)
   Active: inactive (dead) since Sa 2015-03-14 11:40:03 CET; 33s ago
  Process: 683 ExecStart=/usr/local/bin/nvapoke e118 80000002 (code=exited, status=0/SUCCESS)
  Process: 672 ExecStart=/bin/sh -c /bin/echo 1 > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/hwmon/hwmon1/pwm1_enable (code=exited, status=0/SUCCESS)
 Main PID: 683 (code=exited, status=0/SUCCESS)

Mär 14 11:40:03 linux_keller systemd[1]: Starting GPU fan lower speed...
Mär 14 11:40:03 linux_keller systemd[1]: Started GPU fan lower speed.
---------------
BUT: Fan speed is high, register e118 shows value 000000d8, not 00000002, as expected. I have to restart the service manually to calm down the fan.

Let me be honest, poma: I am very grateful for all your help, but I think I'll stick to the old-school method rc.local which seems to be rather straightforward to me (although even that is now governed by systemd and not so old-school anymore).
Comment 47 poma 2015-03-14 12:05:13 UTC
(In reply to K.-P. Schrage from comment #46)
> (In reply to poma from comment #45)
> 
> > And try this:
> > /etc/systemd/system/gpu-fan.service 
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > [Unit]
> > Description=GPU fan lower speed
> > 
> > [Service]
> > Type=oneshot
> > ExecStart=/bin/sh -c '/bin/echo 1 >
> > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/hwmon/hwmon1/pwm1_enable'
> > ExecStart=/usr/local/bin/nvapoke e118 80000002
> > StandardOutput=null
> > StandardError=null
> > 
> > [Install]
> > WantedBy=basic.target
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 
> > # systemctl enable gpu-fan.service
> > # systemctl start gpu-fan.service
> > # systemctl status gpu-fan.service
> > 
> > If everything is OK:
> > # systemctl reboot
> > ...
> > 
> > BOOT
> > ...
> > # systemctl status gpu-fan.service
> 
> After reboot:
> [root@linux_keller kp]# systemctl status gpu-fan.service
> ● gpu-fan.service - GPU fan lower speed
>    Loaded: loaded (/etc/systemd/system/gpu-fan.service; enabled)
>    Active: inactive (dead) since Sa 2015-03-14 11:40:03 CET; 33s ago
>   Process: 683 ExecStart=/usr/local/bin/nvapoke e118 80000002 (code=exited,
> status=0/SUCCESS)
>   Process: 672 ExecStart=/bin/sh -c /bin/echo 1 >
> /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/hwmon/hwmon1/pwm1_enable
> (code=exited, status=0/SUCCESS)
>  Main PID: 683 (code=exited, status=0/SUCCESS)
> 
> Mär 14 11:40:03 linux_keller systemd[1]: Starting GPU fan lower speed...
> Mär 14 11:40:03 linux_keller systemd[1]: Started GPU fan lower speed.
> ---------------
> BUT: Fan speed is high, register e118 shows value 000000d8, not 00000002, as
> expected. I have to restart the service manually to calm down the fan.
> 
> Let me be honest, poma: I am very grateful for all your help, but I think
> I'll stick to the old-school method rc.local which seems to be rather
> straightforward to me (although even that is now governed by systemd and not
> so old-school anymore).


No problemos. ;)
Comment 48 Martin Peres 2015-03-18 20:42:10 UTC
Sorry guys, I'm back at the problem ... again ...

I really want to fix this upstream!
Comment 49 K.-P. Schrage 2015-03-19 12:38:26 UTC
(In reply to Martin Peres from comment #48)
> Sorry guys, I'm back at the problem ... again ...
> 
> I really want to fix this upstream!

Fine. Tell me if I can supply any more information.
Comment 50 Lars E Pettersson 2016-03-29 21:39:54 UTC
Any news on this issue?

I removed the lines mentioned in comment 39 to see what happens, and up goes the fan speed. I.e. the problem seem to still exist... :(

Running kernel at the moment is:
Linux tux.home.rpz 4.4.6-300.fc23.x86_64 #1 SMP Wed Mar 16 22:10:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Comment 51 Lars E Pettersson 2016-08-04 10:09:08 UTC
The bug is still there.

I am now running Fedora 24 with the following kernel:

Linux tux.home.rpz 4.6.4-301.fc24.x86_64 #1 SMP Tue Jul 12 11:50:00 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

The fan noise is annoying...

Any news on this issue?
Comment 52 Lars E Pettersson 2016-12-11 11:11:02 UTC
Just an update that the bug is still there. Running kernel at the moment:

Linux tux.home.rpz 4.8.11-200.fc24.x86_64 #1 SMP Mon Nov 28 19:36:57 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Will this bug be fixed?
Comment 53 Martin Peres 2016-12-11 12:12:24 UTC
(In reply to Lars E Pettersson from comment #52)
> Just an update that the bug is still there. Running kernel at the moment:
> 
> Linux tux.home.rpz 4.8.11-200.fc24.x86_64 #1 SMP Mon Nov 28 19:36:57 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux
> 
> Will this bug be fixed?

Hey,

I have not forgotten you. I found the information in the table to do the right thing for your fan and I sort of managed to make sense of it... but I am apparently unable to make a model of what the proprietary driver does :s It is so frustrating because I compute the right value most of the time, but when I don't the error is quite catastrophic.

I will now swallow my pride and ask for help from Nvidia :s
Comment 54 Martin Peres 2019-12-04 08:46:53 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/116.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.