Created attachment 104931 [details] dmesg with nouveau.debug=debug If I try to see how hot my new cards is I get: nouveau-pci-0500 Adapter: PCI adapter temp1: +0.0°C (high = +95.0°C, hyst = +3.0°C) (crit = +100.0°C, hyst = +5.0°C) (emerg = +135.0°C, hyst = +5.0°C) This is using kernel 3.17-rc1 (32-bit), but same happens with older kernels , both 32 and 64 bit. lspci -vvn: 05:00.0 0300: 10de:0606 (rev a2) (prog-if 00 [VGA controller]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 28 Region 0: Memory at fa000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at f8000000 (64-bit, non-prefetchable) [size=32M] Region 5: I/O ports at 9c00 [size=128] Expansion ROM at fbfe0000 [disabled] [size=128K] Capabilities: [60] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee0300c Data: 4122 Capabilities: [78] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <512ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [128 v1] Power Budgeting <?> Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Kernel driver in use: nouveau
Created attachment 104932 [details] VBIOS
<AndrewR> nvapeek 20400 only gives me "..." ! <AndrewR> 00020008: c008360d <AndrewR> 00020008: c0083615 <AndrewR> 00020008: c008361b <AndrewR> 00020008: c0083622 ... after fullscreen ... <AndrewR> 00020008: c0083643 (after GALLIUM_MSAA=8 was added, fps dropped down ...)
Your card is supposed to have an external temperature probe (an ADT7473). For some reason, it fails to be detected *AGAIN*. Here is what your vbios says: EXTDEV 0: type 0x70 [ADT7473] at 0x5c defbus 0 You may try to put 100 instead of 40 at this line: http://cgit.freedesktop.org/~darktama/nouveau/tree/nvkm/subdev/therm/ic.c#n63 If it still does not appear, then I'm sorry, you'll have to check it works with the proprietary driver and if so, I would need a mmiotrace :s
Created attachment 104991 [details] dmesg from nouveau/linux-3.17 + increased udelay Changes: diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c b/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c index ca9ad9f..8afd3ba 100644 --- a/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c +++ b/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c @@ -60,9 +60,9 @@ static struct nouveau_i2c_board_info nv_board_infos[] = { { { I2C_BOARD_INFO("w83l785ts", 0x2d) }, 0 }, { { I2C_BOARD_INFO("w83781d", 0x2d) }, 0 }, - { { I2C_BOARD_INFO("adt7473", 0x2e) }, 40 }, - { { I2C_BOARD_INFO("adt7473", 0x2d) }, 40 }, - { { I2C_BOARD_INFO("adt7473", 0x2c) }, 40 }, + { { I2C_BOARD_INFO("adt7473", 0x2e) }, 100 }, + { { I2C_BOARD_INFO("adt7473", 0x2d) }, 100 }, + { { I2C_BOARD_INFO("adt7473", 0x2c) }, 100 }, { { I2C_BOARD_INFO("f75375", 0x2e) }, 0 }, { { I2C_BOARD_INFO("lm99", 0x4c) }, 0 }, { { I2C_BOARD_INFO("lm90", 0x4c) }, 0 }, on top of 4898ac046d24894d7b2a5a96a1cff4e095844323 ("drm/nouveau/platform: fix compilation error") It doesn't help - same 0.0 C
(In reply to comment #4) > Created attachment 104991 [details] > dmesg from nouveau/linux-3.17 + increased udelay > > Changes: > > diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c > b/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c > index ca9ad9f..8afd3ba 100644 > --- a/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c > +++ b/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c > @@ -60,9 +60,9 @@ static struct nouveau_i2c_board_info > nv_board_infos[] = { > { { I2C_BOARD_INFO("w83l785ts", 0x2d) }, 0 }, > { { I2C_BOARD_INFO("w83781d", 0x2d) }, 0 }, > - { { I2C_BOARD_INFO("adt7473", 0x2e) }, 40 }, > - { { I2C_BOARD_INFO("adt7473", 0x2d) }, 40 }, > - { { I2C_BOARD_INFO("adt7473", 0x2c) }, 40 }, > + { { I2C_BOARD_INFO("adt7473", 0x2e) }, 100 }, > + { { I2C_BOARD_INFO("adt7473", 0x2d) }, 100 }, > + { { I2C_BOARD_INFO("adt7473", 0x2c) }, 100 }, > { { I2C_BOARD_INFO("f75375", 0x2e) }, 0 }, > { { I2C_BOARD_INFO("lm99", 0x4c) }, 0 }, > { { I2C_BOARD_INFO("lm90", 0x4c) }, 0 }, > > on top of 4898ac046d24894d7b2a5a96a1cff4e095844323 ("drm/nouveau/platform: > fix compilation error") Looks good. > > It doesn't help - same 0.0 C It is not supposed to fix the internal temperature sensor, it supposed to create another temperature sensor. Can I see the kernel logs with this patch applied ? Make sure your initramfs doesn't contain another version of nouveau.ko
Hm, log should be available as https://bugs.freedesktop.org/attachment.cgi?id=104991 (at 11.55 you can see "[11.544288] nouveau D[ I2C][0000:05:00.0] using custom udelay 100 instead of 10" - I assumed it worked at this stage. But then - I saw no another nouveau adapter ..what if nouveau just stops adding sensors after finding 'ghost' internal thermal sensor ?
(In reply to comment #6) > Hm, log should be available as > https://bugs.freedesktop.org/attachment.cgi?id=104991 (at 11.55 you can see > "[11.544288] nouveau D[ I2C][0000:05:00.0] using custom udelay 100 > instead of 10" - I assumed it worked at this stage. But then - I saw no > another nouveau adapter ..what if nouveau just stops adding sensors after > finding 'ghost' internal thermal sensor ? Nah, that's not how the code works. Can you test with the proprietary driver to see if it exposes a temperature? If it does work, could you make a mmiotrace of it, please?
Not worked with NVIDIA-Linux-x86-325.15.run (no temperature data in nvidia-settings, no temperature data in nvidia-smi output). I also found those threads: http://en.expreview.com/2008/01/21/review-palit-8800gs-384mb-768mb/214.html/25 http://www.techpowerup.com/forums/threads/palit-geforce-8800gs-384mb-with-no-sensor-and-two-pin-fan.61530/ so, may be this is hardware defect ...
(In reply to comment #8) > Not worked with NVIDIA-Linux-x86-325.15.run (no temperature data in > nvidia-settings, no temperature data in nvidia-smi output). > > I also found those threads: > http://en.expreview.com/2008/01/21/review-palit-8800gs-384mb-768mb/214.html/ > 25 > http://www.techpowerup.com/forums/threads/palit-geforce-8800gs-384mb-with-no- > sensor-and-two-pin-fan.61530/ > > so, may be this is hardware defect ... Possibly... I'll add a test and only display the temperature if the test succeeds. This way, people won't be mislead into thinking their temperature probe works but is not accurate. Thank you for reporting the bug though! I'll close it when I have submitted the patch.
It works on 4.14 and 4.12 at least: nouveau-pci-0100 Adapter: PCI adapter GPU core: +1.05 V (min = +0.95 V, max = +1.10 V) temp1: +42.0°C (high = +95.0°C, hyst = +3.0°C) (crit = +100.0°C, hyst = +5.0°C) (emerg = +135.0°C, hyst = +5.0°C)
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.