Bug 78116

Summary: Auto fan speed management doesn't do anything in non critical temperature range (NVC0)
Product: xorg Reporter: Marcel Dopita <mdop>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: enhancement    
Priority: medium    
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg with PTHERM=debug
none
drm/nouveau/therm/fan: add debug information in fan_update()
none
dmesg test after patch
none
Please patch -p1 < additional_debug.patch none

Description Marcel Dopita 2014-04-30 14:45:19 UTC
I have Gigabyte GTX 560 Ti Ultra Durable card. (Logs for this issue are same as in Bug 78106)

1a) When I finish booting, nouveau selects AUTO mode (pwm1_enable: 2) and set's the fan power to 55% (pwm1: 55).

1b) Enable manual mode, set pwm1 to 100, enable auto mode.

In both cases I would expect from the AUTO mode to lower PWM to go as low as pwm1_min if the temperature is low enough. My card temperature is 31-33 °C but pwm1 is never lowered in AUTO mode (neither from 55 or 100).

2) On Windows pwm goes as low as pwm1_min (which is 40 by default for GTX 560 Ti but it also works with my modded value of 30). I thought that Nvidia card can manage it by itself (is it really done by operating system/software?).
Comment 1 Martin Peres 2014-05-02 17:19:31 UTC
Hi Marcel,

Which version of the kernel are you using?
Comment 2 Marcel Dopita 2014-05-03 15:23:45 UTC
I'm using Arch Linux so I'm on the latest stable: 3.14.2 (3.14.2-1-ARCH).
Comment 3 Martin Peres 2014-05-03 21:31:55 UTC
(In reply to comment #2)
> I'm using Arch Linux so I'm on the latest stable: 3.14.2 (3.14.2-1-ARCH).

Hmm, OK. Can you confirm manual fan management works as expected?

I have just tried on my nvc4 and automatic fan management brings back my fan speed to 40% from the 100% set manually.

As for your question about fan management being done by the hw or software, it is always scaled in software (either in an RTOS running in the card or in the kernel driver). In the fermi family, the first chipsets were actually using a dedicated hardware chip for fan management (ADT7473) but your card is not one of them. So, in our case, we run the code in the linux kernel but it will be moved to the RTOS when I'm done pushing my patches to fix timers and when I take the time to write the code for fan management.
Comment 4 Marcel Dopita 2014-05-09 07:31:53 UTC
Yes, manual control works and fan speed does change accordingly.
Comment 5 Martin Peres 2014-05-10 09:21:37 UTC
(In reply to comment #4)
> Yes, manual control works and fan speed does change accordingly.

Can you load nouveau with the following parameter and send us the kernel logs?

debug="PTHERM=debug"
Comment 6 Marcel Dopita 2014-05-10 13:12:58 UTC
Created attachment 98820 [details]
dmesg with PTHERM=debug

Boot with debug parameter, after which I tried case 1b) (switching to manual mode, changing pwm1 to 100 and switching back to automatic).
Comment 7 Martin Peres 2014-05-10 20:39:35 UTC
Thanks Marcel. Can I ask you for your vbios?

http://nouveau.freedesktop.org/wiki/DumpingVideoBios/
Comment 8 Marcel Dopita 2014-05-10 20:42:17 UTC
I already attached my bios to bug 78106 if that's ok.
Comment 9 Martin Peres 2014-05-10 21:00:57 UTC
(In reply to comment #8)
> I already attached my bios to bug 78106 if that's ok.

Great, thanks, I'll have a look at both bugs tomorrow :)
Comment 10 Martin Peres 2014-05-21 12:35:48 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > I already attached my bios to bug 78106 if that's ok.
> 
> Great, thanks, I'll have a look at both bugs tomorrow :)

I can't reproduce the bug, but I'll send you a patch for us to get more visibility on what's going wrong. Sorry for being so slow :s
Comment 11 Marcel Dopita 2014-05-21 12:53:52 UTC
Thanks for taking a look. I still have older card (GTX 260) so I can try that (other disto too?) and maybe other cards as well. 
Can you confirm that it actually resets for you fan rpm to pwm 55 (or any other low/default value) once auto mode is enabled and the previous state was different/higher (say pwm 100)?
Comment 12 Martin Peres 2014-05-21 17:48:13 UTC
(In reply to comment #11)
> Thanks for taking a look. I still have older card (GTX 260) so I can try
> that (other disto too?) and maybe other cards as well. 

Well, you can try to check if they work but the point remains, your current card doesn't work and it needs to be fixed.

> Can you confirm that it actually resets for you fan rpm to pwm 55 (or any
> other low/default value) once auto mode is enabled and the previous state
> was different/higher (say pwm 100)?

Yes, I confirm this. We'll check why it doesn't work on your card and fix it!
Comment 13 Marcel Dopita 2014-05-23 16:25:30 UTC
Seems that my old card - GTX 260 doesn't support manual mode nor does it report any temperature/fan speed. 

[    0.491166] nouveau  [  PTHERM][0000:01:00.0] FAN control: none / external
[    0.491175] nouveau  [  PTHERM][0000:01:00.0] fan management: automatic
[    0.491179] nouveau  [  PTHERM][0000:01:00.0] internal sensor: yes
Comment 14 Martin Peres 2014-05-24 04:41:02 UTC
(In reply to comment #13)
> Seems that my old card - GTX 260 doesn't support manual mode nor does it
> report any temperature/fan speed. 
> 
> [    0.491166] nouveau  [  PTHERM][0000:01:00.0] FAN control: none / external
> [    0.491175] nouveau  [  PTHERM][0000:01:00.0] fan management: automatic
> [    0.491179] nouveau  [  PTHERM][0000:01:00.0] internal sensor: yes

It doesn't surprise me. However, it should find an external temperature sensor and fan management chip (the ADT7473), can you confirm that or paste your logs here?
Comment 15 Marcel Dopita 2014-05-25 10:22:53 UTC
I put logs into separate bug 79204 not to clutter this not related issue.
Comment 16 Martin Peres 2014-05-25 14:55:25 UTC
(In reply to comment #15)
> I put logs into separate bug 79204 not to clutter this not related issue.

Perfect!
Comment 17 Martin Peres 2014-05-25 14:57:45 UTC
Created attachment 99775 [details] [review]
drm/nouveau/therm/fan: add debug information in fan_update()

Please add this patch and send me the dmesg output (add the debug info for ptherm).
Comment 18 Marcel Dopita 2014-05-25 16:19:51 UTC
Created attachment 99782 [details]
dmesg test after patch

I had to modify the patch a bit as in my source there were goto's instead of returns (I'm not sure what's newer, I used the source in Arch Build System). However it applied correctly as I checked.
Comment 19 Martin Peres 2014-05-25 18:53:35 UTC
Created attachment 99790 [details] [review]
Please patch -p1 < additional_debug.patch

Hmm, seems like the problem is not where I thought it was. Let's see what's the output with this patch applied!
Comment 20 Marcel Dopita 2014-05-25 21:36:09 UTC
(In reply to comment #19)
> Hmm, seems like the problem is not where I thought it was. Let's see what's
> the output with this patch applied!

I cannot compile because of following error:

In file included from drivers/gpu/drm/nouveau/core/include/core/object.h:5:0,
                 from drivers/gpu/drm/nouveau/core/subdev/therm/base.c:25:
drivers/gpu/drm/nouveau/core/subdev/therm/base.c: In function ‘nouveau_therm_update’:
drivers/gpu/drm/nouveau/core/subdev/therm/base.c:136:25: error: ‘struct nvbios_therm_fan’ has no member named ‘fan_mode’
    duty, priv->fan->bios.fan_mode);
                         ^
drivers/gpu/drm/nouveau/core/include/core/printk.h:14:45: note: in definition of macro ‘nv_printk’
   nv_printk_(nv_object(o), NV_DBG_##l, f, ##a);                  \
                                             ^
drivers/gpu/drm/nouveau/core/subdev/therm/base.c:132:2: note: in expansion of macro ‘nv_debug’
  nv_debug(therm, "nouveau_therm_update; mode = %d, poll = %d, "
  ^
  CC [M]  drivers/infiniband/ulp/ipoib/ipoib_cm.o
  CC [M]  drivers/infiniband/hw/qib/qib_ud.o
scripts/Makefile.build:308: recipe for target 'drivers/gpu/drm/nouveau/core/subdev/therm/base.o' failed
make[4]: *** [drivers/gpu/drm/nouveau/core/subdev/therm/base.o] Error 1
scripts/Makefile.build:455: recipe for target 'drivers/gpu/drm/nouveau' failed
make[3]: *** [drivers/gpu/drm/nouveau] Error 2
scripts/Makefile.build:455: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:455: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Comment 21 Martin Peres 2014-05-25 22:35:21 UTC
Could you try using a 3.15-rcX kernel instead of a 3.14?
Comment 22 Marcel Dopita 2014-05-26 17:19:53 UTC
(In reply to comment #21)
> Could you try using a 3.15-rcX kernel instead of a 3.14?

It works correctly under 3.15-rc7 (fan starts at lowest pwm1 value and re-sets after re-enabling auto mode) so I guess that it resolves this issue.
Comment 23 Emil Velikov 2014-05-26 18:51:12 UTC
(In reply to comment #22)
> (In reply to comment #21)
> > Could you try using a 3.15-rcX kernel instead of a 3.14?
> 
> It works correctly under 3.15-rc7 (fan starts at lowest pwm1 value and
> re-sets after re-enabling auto mode) so I guess that it resolves this issue.

Most likely fixed with the following

commit dcd9262b3baf881285e9e0fd5459d54723cc992e
Author: Ben Skeggs <bskeggs@redhat.com>
Date:   Mon Mar 24 13:34:47 2014 +1000

    drm/nouveau/therm: check for sensor presence with requested mode, not current


perhaps we can forward it to linus-stable ?
Comment 24 Martin Peres 2014-05-26 19:55:29 UTC
(In reply to comment #22)
> (In reply to comment #21)
> > Could you try using a 3.15-rcX kernel instead of a 3.14?
> 
> It works correctly under 3.15-rc7 (fan starts at lowest pwm1 value and
> re-sets after re-enabling auto mode) so I guess that it resolves this issue.

I'm glad it works! That means we fixed all the bugs you reported (with the exception of the i2c issue that still needs to be pulled by Ben before I can close it)? If so, I would like to thank you again for being an exemplary bug reporter!

Sorry for being slow to answer to your bug reports though...

Should we close this bug now?
Comment 25 Marcel Dopita 2014-05-26 21:00:02 UTC
Yep, switch it to appropriate state, please (I'm not sure if resolved is right as it already was fixed in rc). I will recheck it once stable is out. Thanks Martin
Comment 26 Martin Peres 2014-05-26 21:07:39 UTC
Closing this bug as fixed then!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.