Bug 66022

Summary: pwm1[_enable] faults after hibernate/restore related to automatic (or otherwise) fan management
Product: xorg Reporter: Mr-4 <mr.dash.four>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED DUPLICATE QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
fix for bug#1
none
Allow "automatic" fan mode from the start none

Description Mr-4 2013-06-21 19:58:27 UTC
This is a collection of 2 bugs, which I think are related (hence why I am reporting them both here) as they both happen as a result of hibernate/restore:

1. After a hibernate/restore cycle, the fan management option (the value of pwm1_enable) is not restored; and also 

2. After a hibernate/restore cycle, when automatic fan management is applied ("echo 2 > .../pwm1_enable"), this doesn't work - the nouveau driver takes the fan speed to 2 and keeps it there *regardless* of the temperature of the video card, which may cause the card to BURN OUT!

The kernel in use is 3.9.6 with the stock-supplied nvidia driver. For more details please see https://bugzilla.redhat.com/show_bug.cgi?id=976658
Comment 1 Emil Velikov 2013-06-22 01:24:11 UTC
Thanks for the report

Please attach the output of dmesg, after a hibernate/resume cycle. Does the issue occur on suspend/resume?
Comment 2 Emil Velikov 2013-06-22 01:26:27 UTC
Created attachment 81189 [details] [review]
fix for bug#1

This patch should resolve the first bug. Please give it a try and as usual observe the temperatures

Cheers
Emil
Comment 3 Mr-4 2013-06-24 03:29:37 UTC
Created attachment 81291 [details] [review]
Allow "automatic" fan mode from the start
Comment 4 Mr-4 2013-06-24 03:30:29 UTC
(In reply to comment #2)
> This patch should resolve the first bug. Please give it a try and as usual
> observe the temperatures
OK, I've had mixed results so far.

The patch does work, but not always - sometimes even though the hibernate/restore is successful (and the fans are running at low speed from the beginning) my system locks up completely, usually a few seconds after restore has done its job.

I'll keep testing this and post any new developments.

In the meantime, I just attached a patch of my own - I've modified nouveau_pm.c to start in "automatic" fan mode from the very start. That is made possible by specifying "nouveau.therm_fan_mode_auto=1" as kernel parameter, so that I don't have to rely on my rc.local and other such scripts.
Comment 5 Mr-4 2013-06-25 22:10:12 UTC
Some more feedback on this, after I was able to test this more thoroughly:

The automatic thermal management seems, at least for my card (Chipset G71 - NV49), completely buggy!

After hibernate/resume, when my machine doesn't lock up a few seconds after restore is complete (see Comment #4), I also get the following possible scenarios developing:

pwm1=0
pwm1_enable=2
pwm1_max=100
pwm1_min=0
temp1_auto_point1_pwm=100
temp1_auto_point1_temp=90000
temp1_auto_point1_temp_hyst=3000
temp1_crit=115000
temp1_crit_hyst=2000
temp1_emergency=135000
temp1_emergency_hyst=5000
temp1_input=28000
temp1_max=95000
temp1_max_hyst=3000
update_rate=1000

[...]

pwm1=0
pwm1_enable=2
pwm1_max=100
pwm1_min=0
temp1_auto_point1_pwm=100
temp1_auto_point1_temp=90000
temp1_auto_point1_temp_hyst=3000
temp1_crit=115000
temp1_crit_hyst=2000
temp1_emergency=135000
temp1_emergency_hyst=5000
temp1_input=52000
temp1_max=95000
temp1_max_hyst=3000
update_rate=1000

As evident, pwm1_enable is restored, but the nouveau driver doesn't seem to bother with any sort of fan management at all (the pwm1 is reduced from 100 to 0 after restore and it stays there for ever!).

When I do this:

echo 2 > /sys/class/hwmon/hwmon0/pwm1_enable 

I still get:

pwm1=0
pwm1_enable=2
pwm1_max=100
pwm1_min=0
temp1_auto_point1_pwm=100
temp1_auto_point1_temp=90000
temp1_auto_point1_temp_hyst=3000
temp1_crit=115000
temp1_crit_hyst=2000
temp1_emergency=135000
temp1_emergency_hyst=5000
temp1_input=55000
temp1_max=95000
temp1_max_hyst=3000
update_rate=1000

Next, I tried the following sequence:

echo 1 > /sys/class/hwmon/hwmon0/pwm1_enable 
echo 2 > /sys/class/hwmon/hwmon0/pwm1_enable 

pwm1=26
pwm1_enable=2
pwm1_max=100
pwm1_min=0
temp1_auto_point1_pwm=100
temp1_auto_point1_temp=90000
temp1_auto_point1_temp_hyst=3000
temp1_crit=115000
temp1_crit_hyst=2000
temp1_emergency=135000
temp1_emergency_hyst=5000
temp1_input=53000
temp1_max=95000
temp1_max_hyst=3000
update_rate=1000

[...]

pwm1=26
pwm1_enable=2
pwm1_max=100
pwm1_min=0
temp1_auto_point1_pwm=100
temp1_auto_point1_temp=90000
temp1_auto_point1_temp_hyst=3000
temp1_crit=115000
temp1_crit_hyst=2000
temp1_emergency=135000
temp1_emergency_hyst=5000
temp1_input=47000
temp1_max=95000
temp1_max_hyst=3000
update_rate=1000

As evident, the pwm1 "twitched" for a bit, and then "froze" at 26%, reducing the card temperature from about 55 degrees to 47. I then did this:

echo 1 > /sys/class/hwmon/hwmon0/pwm1_enable 
echo 100 > /sys/class/hwmon/hwmon0/pwm1 
echo 2 > /sys/class/hwmon/hwmon0/pwm1_enable 

pwm1=88
pwm1_enable=2
pwm1_max=100
pwm1_min=0
temp1_auto_point1_pwm=100
temp1_auto_point1_temp=90000
temp1_auto_point1_temp_hyst=3000
temp1_crit=115000
temp1_crit_hyst=2000
temp1_emergency=135000
temp1_emergency_hyst=5000
temp1_input=44000
temp1_max=95000
temp1_max_hyst=3000
update_rate=1000

[...]

pwm1=13
pwm1_enable=2
pwm1_max=100
pwm1_min=0
temp1_auto_point1_pwm=100
temp1_auto_point1_temp=90000
temp1_auto_point1_temp_hyst=3000
temp1_crit=115000
temp1_crit_hyst=2000
temp1_emergency=135000
temp1_emergency_hyst=5000
temp1_input=47000
temp1_max=95000
temp1_max_hyst=3000
update_rate=1000

pwm1=13
pwm1_enable=2
pwm1_max=100
pwm1_min=0
temp1_auto_point1_pwm=100
temp1_auto_point1_temp=90000
temp1_auto_point1_temp_hyst=3000
temp1_crit=115000
temp1_crit_hyst=2000
temp1_emergency=135000
temp1_emergency_hyst=5000
temp1_input=58000
temp1_max=95000
temp1_max_hyst=3000
update_rate=1000

Again, pwm1 "froze" at a value of 13 and stayed there. Since I do not want to burn my card out, I then switched to manual and set the pwm1 value to 100.

There are 3 additional bugs I discovered also, but will submit a separate reports (and patches) for these.
Comment 6 Martin Peres 2013-08-13 02:58:16 UTC

*** This bug has been marked as a duplicate of bug 66177 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.