Bug 60704 - [nouveau, git regression] - I2C PWM fan control broken on nv50 adt7475 on kernels 3.3.x+
[nouveau, git regression] - I2C PWM fan control broken on nv50 adt7475 on ker...
Status: RESOLVED FIXED
Product: xorg
Classification: Unclassified
Component: Driver/nouveau
git
x86-64 (AMD64) Linux (All)
: medium major
Assigned To: Nouveau Project
Xorg Project Team
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-02-11 23:25 UTC by Marc Meledandri
Modified: 2014-01-23 07:06 UTC (History)
1 user (show)

See Also:


Attachments
ic2-bit-banging-nouveau-git-bisect.log (6.43 KB, text/plain)
2013-02-11 23:25 UTC, Marc Meledandri
no flags Details
properly-functioning-sensors-modules.log (3.60 KB, text/plain)
2013-02-11 23:30 UTC, Marc Meledandri
no flags Details
failed-dmesg.log (58.65 KB, text/plain)
2013-02-11 23:35 UTC, Marc Meledandri
no flags Details
video-bios-gtx-260-nv50.rom (53.50 KB, application/octet-stream)
2013-02-11 23:37 UTC, Marc Meledandri
no flags Details
git-master-NAK-Bailout-dmesg.log (54.35 KB, text/plain)
2013-02-12 01:52 UTC, Marc Meledandri
no flags Details
Use 25kbps datarate for i2c transfers (1.02 KB, patch)
2013-02-12 20:14 UTC, Emil Velikov
no flags Details | Splinter Review
i2c datarate transfer patch ported upstream (1.31 KB, patch)
2013-02-13 13:28 UTC, Marc Meledandri
no flags Details | Splinter Review
Another try (4.37 KB, patch)
2013-08-28 20:17 UTC, Emil Velikov
no flags Details | Splinter Review
This patch should work (8.92 KB, patch)
2013-10-21 00:02 UTC, Martin Peres
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Meledandri 2013-02-11 23:25:30 UTC
Created attachment 74657 [details]
ic2-bit-banging-nouveau-git-bisect.log

When resuming from suspend on affected kernels the lack of fan pwm control causes the GPU fan to run at 100%. This is noisy, annoying and inefficient.

After performing a git bisect (attached), I've determined the commit that broke pwm fan control functionality for all kernels 3.3.x+ with adt7475 controller (nv50 GPU in my case):

commit f553b79c03f0dbd52f6f03abe8233a2bef8cbd0d
Author: Ben Skeggs <bskeggs@redhat.com>
Date:   Wed Dec 21 18:09:12 2011 +1000

drm/nouveau/i2c: handle bit-banging ourselves

Along with the many kernels I built and tested during bisection, I've also verified this broke sensors detecting on mainline builds of 3.3.8, 3.4.29, 3.7.3 and the current Nouveau git master.

I can verify proper PWM fan control/i2c enumeration on kernel 3.2.35 and 3.2.38. I've not tested earlier kernels than the 3.2 series, but git bisection went back to 3.2-rc1 and was working there.

I'm happy to provide any further testing to help expedite getting this patched for stable kernels. I'm particularly interested in the 3.4.x LTS series which is what led me to discover this issue on my system after I strayed from the happy confines of the Debian 3.2.x LTS series kernel.
Comment 1 Marc Meledandri 2013-02-11 23:30:10 UTC
Created attachment 74658 [details]
properly-functioning-sensors-modules.log
Comment 2 Marc Meledandri 2013-02-11 23:35:06 UTC
Created attachment 74659 [details]
failed-dmesg.log
Comment 3 Marc Meledandri 2013-02-11 23:37:45 UTC
Created attachment 74660 [details]
video-bios-gtx-260-nv50.rom
Comment 4 Marc Meledandri 2013-02-12 01:52:27 UTC
Created attachment 74661 [details]
git-master-NAK-Bailout-dmesg.log

With kernels 3.4.29, 3.7.3 and the nouveau git master observing:

i2c i2c-4: sendbytes: NAK bailout.

messages in syslog.
Comment 5 Vincent Pelletier 2013-02-12 07:11:31 UTC
I looked into this issue a year ago, reported the breakage, bisect result and my attempts to fix in this thread:
http://lists.freedesktop.org/archives/nouveau/2012-January/009779.html
Comment 6 Vincent Pelletier 2013-02-12 07:14:34 UTC
The hash in my mail doesn't seem to exist anymore. I believe this is because the branch was since rebased after a merge to linus' tree, and I'm quite sure it was the commit you bisected to.
Comment 7 Emil Velikov 2013-02-12 20:14:37 UTC
Created attachment 74708 [details] [review]
Use 25kbps datarate for i2c transfers

The offending commit has been almost completely reverted later on

The only differences being
* timeout (5000 > 2200)
Data is based on the VESA spec
* bitrate (12,5kbps > 50kbps)
Bitrate of 50kbps is quite reasonable, considering the standards

There are a few possible solutions
A select correct delay depending on the i2c port type
B presence of adt7475(and other) i2c sensor(s) in the extdev vbios table
C use 25kbps for all transfers
D quirk it

Patch for option C is attached
Comment 8 Marc Meledandri 2013-02-13 13:23:48 UTC
Thanks Emil! I confirm that your patch resolves the issue for my hardware.

I can now control my GPU fan with the PWM bits. This also solves the suspend/resume case with automatic fan control enabled.
Comment 9 Marc Meledandri 2013-02-13 13:28:24 UTC
Created attachment 74749 [details] [review]
i2c datarate transfer patch ported upstream

I built Emil's patch:  Use 25kbps datarate for i2c transfers on kernel against kernel 3.4.30.

To test this fix upstream on kernels 3.7.7 and 3.8-rc7, I've attached the same fix ported to the new file/directory structure as the code has been rebased and is no longer in the same file or location.

I hope I documented properly to credit the original author!
Comment 10 Martin Peres 2013-06-15 21:41:03 UTC
Marc: Has this patch been applied yet?
Comment 11 Marc Meledandri 2013-06-15 21:51:19 UTC
Martin: Applied, as in accepted to the upstream kernel tree? No.

I continue building each 3.4.x kernel with this patch though. It still patches cleanly and fixes my issue.

For the latest stable kernels, it was applying cleanly to 3.8.x kernels, but I've not yet built or tested any from 3.9.x series.
Comment 12 Martin Peres 2013-06-26 15:41:40 UTC
@Marc: Ok, thanks for the precision.

@Darktama: what do we do with this problem? I know that merging this would break other stuff so what can we do?
Comment 13 Marc Meledandri 2013-07-29 23:32:40 UTC
FWIW, on kernel series 3.10.x, adjusting the i2c datarate still provides a working solution on my nv50 hardware. Sensors detect properly and fan control is enabled. The necessary patch has changed again:

--- a/drivers/gpu/drm/nouveau/core/subdev/i2cbase.c	2013-07-25 18:16:45.000000000 -0400
+++ b/drivers/gpu/drm/nouveau/core/subdev/i2cbase.c.patched	2013-07-29 19:26:09.393689862 -0400
@@ -122,7 +122,7 @@
 		if (!bit)
 			return -ENOMEM;
 
-		bit->udelay = 10;
+		bit->udelay = 20;
 		bit->timeout = usecs_to_jiffies(2200);
 		bit->data = port;
 		bit->pre_xfer = nouveau_i2c_pre_xfer;
Comment 14 Marc Meledandri 2013-07-29 23:36:22 UTC
Disregard previous diff. This rather:

--- a/drivers/gpu/drm/nouveau/core/subdev/i2c/base.c    2013-07-25 18:16:45.000000000 -0400
+++ b/drivers/gpu/drm/nouveau/core/subdev/i2c/base.c    2013-07-29 19:26:09.393689862 -0400
@@ -122,7 +122,7 @@
 		if (!bit)
 			return -ENOMEM;
 
-		bit->udelay = 10;
+		bit->udelay = 20;
 		bit->timeout = usecs_to_jiffies(2200);
 		bit->data = port;
 		bit->pre_xfer = nouveau_i2c_pre_xfer;
Comment 15 Emil Velikov 2013-07-31 12:34:55 UTC
To make things even better adt7473 seems to load fine in some cases :)

User has the original nv50, running a 3.9.x kernel.

[  DEVICE][0000:01:00.0] BOOT0  : 0x450300a2
[  DEVICE][0000:01:00.0] Chipset: G80 (NV50)
...
adt7475 4-002e: ADT7473 device, revision 0
adt7475 4-002e: Optional features: fan4
[  PTHERM][0000:01:00.0] Found an adt7473 at address 0x2e (controlled by lm_sensors, temp offset +0 C)
[     I2C][0000:01:00.0] detected monitoring device: adt7473

https://bugs.freedesktop.org/show_bug.cgi?id=67382
Comment 16 Marc Meledandri 2013-07-31 12:50:50 UTC
That's good news. I've been so used to the required patch to build the 3.4.x LTS kernels I didn't bother to build a vanilla 3.10.x.

I'll try a vanilla build tonight as I got a nouveau PGRAPH crash/hard lockup last night on 3.10.4 with the bit->udelay = 20.
Comment 17 Marc Meledandri 2013-07-31 23:49:41 UTC
Unfortunately, no adt7473 is detected for my gpu hardware (NVA0 (GT200)) with vanilla kernel 3.10.4 build. Still seeing:

[  2.083495] i2c i2c-4: sendbytes: NAK bailout.
[  2.086618] i2c i2c-4: sendbytes: NAK bailout.

I'll still need to patch, which is working for me. We must be closing in on another LTS kernel becoming maintained, and I was hoping a fix would have made it into the stable tree.

I'll file a separate bug regarding the pgraph crasher.
Comment 18 Emil Velikov 2013-08-28 20:17:37 UTC
Created attachment 84814 [details] [review]
Another try

Please give the attached patch a try. It should handle the case in a more appropriate manner.

Cheers
Emil
Comment 19 Martin Peres 2013-10-21 00:02:33 UTC
Created attachment 87907 [details] [review]
This patch should work

This patch should work, at least it does for me. Sorry it took me so long to actually do something about it.
Comment 20 Ilia Mirkin 2014-01-23 07:06:48 UTC
Martin's patch should be in 3.13. Marking this as fixed, since there hasn't been any response from the reporter for several months.