Bug 82834 - [NV92] GeForce 8800 GS shows incorrect temperature with nouveau
Summary: [NV92] GeForce 8800 GS shows incorrect temperature with nouveau
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: git
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-20 03:16 UTC by Andrew Randrianasulu
Modified: 2017-11-25 12:19 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
dmesg with nouveau.debug=debug (65.91 KB, text/plain)
2014-08-20 03:16 UTC, Andrew Randrianasulu
no flags Details
VBIOS (57.50 KB, application/octet-stream)
2014-08-20 03:17 UTC, Andrew Randrianasulu
no flags Details
dmesg from nouveau/linux-3.17 + increased udelay (115.22 KB, text/plain)
2014-08-20 16:07 UTC, Andrew Randrianasulu
no flags Details

Description Andrew Randrianasulu 2014-08-20 03:16:08 UTC
Created attachment 104931 [details]
dmesg with nouveau.debug=debug

If I  try  to  see  how  hot  my  new cards  is I get:

nouveau-pci-0500
Adapter: PCI adapter
temp1:         +0.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +100.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)


This is  using kernel 3.17-rc1 (32-bit), but same  happens  with older  kernels , both 32 and 64 bit.

lspci -vvn:
05:00.0 0300: 10de:0606 (rev a2) (prog-if 00 [VGA controller])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 28
        Region 0: Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at f8000000 (64-bit, non-prefetchable) [size=32M]
        Region 5: I/O ports at 9c00 [size=128]
        Expansion ROM at fbfe0000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee0300c  Data: 4122
        Capabilities: [78] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <512ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [128 v1] Power Budgeting <?>
        Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Kernel driver in use: nouveau
Comment 1 Andrew Randrianasulu 2014-08-20 03:17:29 UTC
Created attachment 104932 [details]
VBIOS
Comment 2 Ilia Mirkin 2014-08-20 03:24:19 UTC
<AndrewR> nvapeek 20400 only gives  me "..." ! 
<AndrewR> 00020008: c008360d
<AndrewR> 00020008: c0083615
<AndrewR> 00020008: c008361b 
<AndrewR> 00020008: c0083622 ... after  fullscreen ...
<AndrewR> 00020008: c0083643 (after GALLIUM_MSAA=8 was added, fps  dropped down ...)
Comment 3 Martin Peres 2014-08-20 08:07:06 UTC
Your card is supposed to have an external temperature probe (an ADT7473). For some reason, it fails to be detected *AGAIN*.

Here is what your vbios says: EXTDEV 0: type 0x70 [ADT7473] at 0x5c defbus 0

You may try to put 100 instead of 40 at this line: http://cgit.freedesktop.org/~darktama/nouveau/tree/nvkm/subdev/therm/ic.c#n63
If it still does not appear, then I'm sorry, you'll have to check it works with the proprietary driver and if so, I would need a mmiotrace :s
Comment 4 Andrew Randrianasulu 2014-08-20 16:07:14 UTC
Created attachment 104991 [details]
dmesg from nouveau/linux-3.17 + increased  udelay

Changes:

diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c b/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c
index ca9ad9f..8afd3ba 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c
@@ -60,9 +60,9 @@ static struct nouveau_i2c_board_info
 nv_board_infos[] = {
        { { I2C_BOARD_INFO("w83l785ts", 0x2d) }, 0 },
        { { I2C_BOARD_INFO("w83781d", 0x2d) }, 0  },
-       { { I2C_BOARD_INFO("adt7473", 0x2e) }, 40  },
-       { { I2C_BOARD_INFO("adt7473", 0x2d) }, 40  },
-       { { I2C_BOARD_INFO("adt7473", 0x2c) }, 40  },
+       { { I2C_BOARD_INFO("adt7473", 0x2e) }, 100  },
+       { { I2C_BOARD_INFO("adt7473", 0x2d) }, 100  },
+       { { I2C_BOARD_INFO("adt7473", 0x2c) }, 100  },
        { { I2C_BOARD_INFO("f75375", 0x2e) }, 0  },
        { { I2C_BOARD_INFO("lm99", 0x4c) }, 0  },
        { { I2C_BOARD_INFO("lm90", 0x4c) }, 0  },

on top of 4898ac046d24894d7b2a5a96a1cff4e095844323 ("drm/nouveau/platform: fix compilation error")

It doesn't help - same 0.0 C
Comment 5 Martin Peres 2014-08-20 18:58:54 UTC
(In reply to comment #4)
> Created attachment 104991 [details]
> dmesg from nouveau/linux-3.17 + increased  udelay
> 
> Changes:
> 
> diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c
> b/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c
> index ca9ad9f..8afd3ba 100644
> --- a/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c
> +++ b/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c
> @@ -60,9 +60,9 @@ static struct nouveau_i2c_board_info
>  nv_board_infos[] = {
>         { { I2C_BOARD_INFO("w83l785ts", 0x2d) }, 0 },
>         { { I2C_BOARD_INFO("w83781d", 0x2d) }, 0  },
> -       { { I2C_BOARD_INFO("adt7473", 0x2e) }, 40  },
> -       { { I2C_BOARD_INFO("adt7473", 0x2d) }, 40  },
> -       { { I2C_BOARD_INFO("adt7473", 0x2c) }, 40  },
> +       { { I2C_BOARD_INFO("adt7473", 0x2e) }, 100  },
> +       { { I2C_BOARD_INFO("adt7473", 0x2d) }, 100  },
> +       { { I2C_BOARD_INFO("adt7473", 0x2c) }, 100  },
>         { { I2C_BOARD_INFO("f75375", 0x2e) }, 0  },
>         { { I2C_BOARD_INFO("lm99", 0x4c) }, 0  },
>         { { I2C_BOARD_INFO("lm90", 0x4c) }, 0  },
> 
> on top of 4898ac046d24894d7b2a5a96a1cff4e095844323 ("drm/nouveau/platform:
> fix compilation error")

Looks good.

> 
> It doesn't help - same 0.0 C

It is not supposed to fix the internal temperature sensor, it supposed to create another temperature sensor. Can I see the kernel logs with this patch applied ? Make sure your initramfs doesn't contain another version of nouveau.ko
Comment 6 Andrew Randrianasulu 2014-08-20 19:16:32 UTC
Hm, log  should  be available as https://bugs.freedesktop.org/attachment.cgi?id=104991 (at 11.55 you  can see  "[11.544288] nouveau D[     I2C][0000:05:00.0] using custom udelay 100 instead of 10" - I assumed  it  worked at this stage. But then - I saw  no another  nouveau adapter ..what if nouveau just stops  adding sensors  after  finding  'ghost' internal thermal sensor  ?
Comment 7 Martin Peres 2014-08-20 21:42:36 UTC
(In reply to comment #6)
> Hm, log  should  be available as
> https://bugs.freedesktop.org/attachment.cgi?id=104991 (at 11.55 you  can see
> "[11.544288] nouveau D[     I2C][0000:05:00.0] using custom udelay 100
> instead of 10" - I assumed  it  worked at this stage. But then - I saw  no
> another  nouveau adapter ..what if nouveau just stops  adding sensors  after
> finding  'ghost' internal thermal sensor  ?

Nah, that's not how the code works. Can you test with the proprietary driver to see if it exposes a temperature? If it does work, could you make a mmiotrace of it, please?
Comment 8 Andrew Randrianasulu 2014-08-21 09:02:03 UTC
Not  worked  with NVIDIA-Linux-x86-325.15.run (no temperature  data in nvidia-settings,  no temperature  data  in nvidia-smi output).

I also  found  those  threads:
http://en.expreview.com/2008/01/21/review-palit-8800gs-384mb-768mb/214.html/25
http://www.techpowerup.com/forums/threads/palit-geforce-8800gs-384mb-with-no-sensor-and-two-pin-fan.61530/

so, may be  this  is  hardware defect ...
Comment 9 Martin Peres 2014-08-21 12:07:12 UTC
(In reply to comment #8)
> Not  worked  with NVIDIA-Linux-x86-325.15.run (no temperature  data in
> nvidia-settings,  no temperature  data  in nvidia-smi output).
> 
> I also  found  those  threads:
> http://en.expreview.com/2008/01/21/review-palit-8800gs-384mb-768mb/214.html/
> 25
> http://www.techpowerup.com/forums/threads/palit-geforce-8800gs-384mb-with-no-
> sensor-and-two-pin-fan.61530/
> 
> so, may be  this  is  hardware defect ...

Possibly... I'll add a test and only display the temperature if the test succeeds. This way, people won't be mislead into thinking their temperature probe works but is not accurate.

Thank you for reporting the bug though! I'll close it when I have submitted the patch.
Comment 10 Andrew Randrianasulu 2017-11-25 12:19:17 UTC
It works on 4.14 and 4.12 at least:

nouveau-pci-0100
Adapter: PCI adapter
GPU core:     +1.05 V  (min =  +0.95 V, max =  +1.10 V)
temp1:        +42.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +100.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.