Bug 37922 - NV86: too high power usage.
NV86: too high power usage.
Status: NEW
Product: xorg
Classification: Unclassified
Component: Driver/nouveau
git
Other All
: medium normal
Assigned To: Nouveau Project
Xorg Project Team
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-06-04 06:38 UTC by maximlevitsky
Modified: 2015-01-17 05:24 UTC (History)
0 users

See Also:


Attachments
1st part (6.99 KB, text/plain)
2011-06-04 06:39 UTC, maximlevitsky
no flags Details
2nd part (427 bytes, text/plain)
2011-06-04 06:40 UTC, maximlevitsky
no flags Details
3rd part (1.96 KB, text/plain)
2011-06-04 06:42 UTC, maximlevitsky
no flags Details
4th part (1.56 KB, text/plain)
2011-06-04 06:43 UTC, maximlevitsky
no flags Details
5th part (982 bytes, text/plain)
2011-06-04 06:45 UTC, maximlevitsky
no flags Details
6th part (1.87 KB, text/plain)
2011-06-04 06:46 UTC, maximlevitsky
no flags Details
even smaller trace (11.75 KB, text/plain)
2011-06-04 10:16 UTC, maximlevitsky
no flags Details
latest trace (9.27 KB, text/plain)
2011-07-08 11:12 UTC, maximlevitsky
no flags Details
almost minimal trace (1.37 KB, text/plain)
2011-07-29 07:14 UTC, maximlevitsky
no flags Details
final trace (1.47 KB, text/plain)
2011-07-29 10:31 UTC, maximlevitsky
no flags Details
Enable xfer only when we need it to save power (1.83 KB, patch)
2011-07-30 14:18 UTC, Martin Peres
no flags Details | Splinter Review
full reclocking trace (34.87 KB, text/plain)
2011-08-01 00:16 UTC, maximlevitsky
no flags Details
new reclocking trace (28.97 KB, text/plain)
2011-08-02 20:13 UTC, maximlevitsky
no flags Details
c040 woes (3.16 KB, text/plain)
2011-08-02 20:13 UTC, maximlevitsky
no flags Details
trace with reclock culled (4.28 KB, text/plain)
2011-08-03 18:01 UTC, maximlevitsky
no flags Details
minimal trace of non reclock magic (2.42 KB, text/plain)
2011-08-03 19:38 UTC, maximlevitsky
no flags Details
minimal trace of non reclock magic - v2 (3.02 KB, text/plain)
2011-08-04 16:57 UTC, maximlevitsky
no flags Details
minimal trace of non reclock magic - v3 (3.02 KB, text/plain)
2011-08-04 17:03 UTC, maximlevitsky
no flags Details
minimal trace of non reclock magic - v4 (2.51 KB, text/plain)
2011-08-27 14:43 UTC, maximlevitsky
no flags Details
information about DAC clock regs I gathered (4.59 KB, application/octet-stream)
2011-08-28 12:18 UTC, maximlevitsky
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description maximlevitsky 2011-06-04 06:38:04 UTC
I want to track all the progress I made on reverse-engineering power-management on my NV86 card.

First, the statistics that should give us the idea about what to expect.
(All measurements done using acpi counters, in more or less controlled environment. 0% brightness, using vga console and X server running on other VT.
VGA console were running powertop).
The measurement is what appeared to be minimum value.
Wireless was on, and no special knobs to decrease power usage were applied (like unloading the ethernet driver).

following suggestion of Emil Velikov I changed the vbios slightly so it shows 0 available power levels, thus nvidia driver wasn't able to reclock the card.
(unless I say otherwice)

no drivers loaded..........:  22.6W
nouveau....................:  22.6W
nouveau, pm level 0........:  20.9W

nvidia (no reclock)........:  19.4W
no driver + mmiotrace
replay of above............:  19.4W

nvidia + reclocking........:  15.8W


I was able to reduce the mmiotrace to very small list
and with it I can bring power level to 19.4 (or even less sometimes)
I will attach it here.
Note that trace consists of 6 parts. If I concat them, power usage seems to
increase. Removal of many commands of above traces also increases power usage a bit, so the conclusion is that traces contain many many commands that each turns down that or other part of the card.
Comment 1 maximlevitsky 2011-06-04 06:39:45 UTC
Created attachment 47519 [details]
1st part

lots of PTHERM writes. They do matter (I trimmed some of them though).
Comment 2 maximlevitsky 2011-06-04 06:40:58 UTC
Created attachment 47520 [details]
2nd part

Changes clock of one of DACS.
Appears to decrease power usage by 0.1W
Comment 3 maximlevitsky 2011-06-04 06:42:39 UTC
Created attachment 47521 [details]
3rd part

Most intersting part is PMC.ENABLE write.
I tried to merge that part with part1, but got higher power usage.
PFB accesses appear to lower power usage a bit too (~0.1W)
Comment 4 maximlevitsky 2011-06-04 06:43:59 UTC
Created attachment 47522 [details]
4th part

Blob touches a clock register here.
Comment 5 maximlevitsky 2011-06-04 06:45:11 UTC
Created attachment 47523 [details]
5th part

PCRYPT, PBSP, PVP accesses...
Comment 6 maximlevitsky 2011-06-04 06:46:50 UTC
Created attachment 47524 [details]
6th part

More PVP, PCRYPT, writes.
I tried merging that with former part, didn't seem to succeed
(need wait between them I guess, but I'll check that again later).
Comment 7 maximlevitsky 2011-06-04 10:16:14 UTC
Created attachment 47531 [details]
even smaller trace

I checked, actually double-checked that this trace gives same power usage as non-reclocked blob.
Comment 8 maximlevitsky 2011-07-08 11:12:11 UTC
Created attachment 48901 [details]
latest trace

Latest trace. Might give a bit higher power usage, dunno.
Comment 9 maximlevitsky 2011-07-29 07:07:35 UTC
So thats it, I reduced and more or less understood the involved register writes, the end result is about 0.150W higher wattage that blob and this trace:


# enable everything in PMC.ENABLE - VPE bit decreases power usage dramaticly
[0] 1226.455757 MMIO32 W 0x000200 0xffffffff PMC.ENABLE <= everything

# some power magic
[0] 1226.458120 MMIO32 W 0x001098 0x21ca003c PBUS+0x98 <= 0x21ca003c
[0] 1226.458192 MMIO32 W 0x001604 0x00020804 PBUS+0x604 <= 0x20804
[0] 1226.458299 MMIO32 W 0x001588 0x00000001 PBUS+0x588 <= 0x1

#disable VC2 xtensa clock
[0] 1226.471062 MMIO32 W 0x00c040 0x2ee01233 0xc040 <= 0x2ee01233

# disable secondary DAC 
[0] 1226.425262 MMIO32 W 0x61a010 0x80000002 PDISPLAY.DAC_REGS[0].CLK_CTRL1 <= { CONNECTED = 0 | 0x80000002 }
[0] 1226.425394 MMIO32 W 0x61a004 0xd0150000 PDISPLAY.DAC_REGS[0].DPMS_CTRL <= { PENDING | 0x50150000 }
[0] 1226.425610 MMIO32 W 0x61a810 0x00000003 PDISPLAY.DAC_REGS[0x1].CLK_CTRL1 <= { CONNECTED = 0 | 0x3 }
[0] 1226.425717 MMIO32 W 0x61a804 0xd0150000 PDISPLAY.DAC_REGS[0x1].DPMS_CTRL <= { PENDING | 0x50150000 }

#PFB magic - some dram optimization ???
[0] 1226.423318 MMIO32 W 0x100000 0x0000c042 PFB+0 <= 0xc042
[0] 1226.423388 MMIO32 W 0x100004 0x0000c042 PFB+0x4 <= 0xc042
[0] 1226.423457 MMIO32 W 0x100008 0x0000c042 PFB+0x8 <= 0xc042
[0] 1226.423527 MMIO32 W 0x100b78 0x0000c042 PFB+0xb78 <= 0xc042
[0] 1226.423596 MMIO32 W 0x100c0c 0x0000c042 PFB+0xc0c <= 0xc042
[0] 1226.423735 MMIO32 W 0x100d04 0x0000c042 PFB+0xd04 <= 0xc042
[0] 1226.423804 MMIO32 W 0x100e0c 0x0000c042 PFB+0xe0c <= 0xc042
Comment 10 maximlevitsky 2011-07-29 07:14:55 UTC
Created attachment 49720 [details]
almost minimal trace

Also attach it here
Comment 11 maximlevitsky 2011-07-29 10:31:42 UTC
Created attachment 49725 [details]
final trace

Thats it, I found what register write spoiled the work done by trace.
It would seem to me that that is some sort of PFIFO idle mode register, clock gating or whatever. nouveau sets register 0x400824, PGRAPH.CTXCTL_FLAGS_0 to 0x4000 and spoils things. Now the latest trace or now more correctly set of register writes undos that register setting. So thats it 3W less while running normal desktop and compiz.
Comment 12 Martin Peres 2011-07-30 14:18:52 UTC
Created attachment 49752 [details] [review]
Enable xfer only when we need it to save power

drm/nv50/ctxprog: enable xfer only when we need it to save power
    
This patch adds instructions to ctxprog and by doing, impact context-switching performance.
My testcase showed a 1% performance cost using glxgears that is a context-switch bound application.

Please test and report bugs/performance/power/other.
Comment 13 maximlevitsky 2011-08-01 00:16:20 UTC
Created attachment 49772 [details]
full reclocking trace

Thanks!, I didn't yet test this, but I am 100% sure that patch will work on my system.

Meanwhile, I was playing with full reclocking trace, and I attach it here.
I kept all (or at least vast majority) of PFB accesses so that you could look what it does to memory timings.

It reduces power usage to 15.5W (and blob it about 15.3W).
When I replay it, then load nouveau, then fix flag register and powerdown unused DAC, I get 15.8W. If I run full blown desktop, I am getting ~16W while I let the system go idle. Pretty impressive, considering that on stock nouveau, 21W was lower bound for power usage....
Comment 14 maximlevitsky 2011-08-02 20:12:30 UTC
@Martin: I tested your patch and of course it works.
Once small note is that you probably also want nouveau stop writing 0x4000 to that register initially. With your patch that flag is unset on first ctxprog invocation, so it doesn't matter much, yet nouveau shouldn't set it if blob doesn't.

I also reduced trace futher.
I also sadly note that my full reclocking trace reduces system stability a lot (supertuxkart crashes just after 1~2 laps and hangs the system).
(Although due to low clocks involved it might just expose better hangs I had due to nouveau drivers....)

Also, the register 0xc040, I poked this evening a lot does play some role in blob reclocking, yet though I don't have any ideas on how to nail down its contents.
Comment 15 maximlevitsky 2011-08-02 20:13:02 UTC
Created attachment 49857 [details]
new reclocking trace
Comment 16 maximlevitsky 2011-08-02 20:13:59 UTC
Created attachment 49858 [details]
c040 woes

My poor man attempts to understand what that thing really does
Comment 17 maximlevitsky 2011-08-03 18:01:27 UTC
Created attachment 49894 [details]
trace with reclock culled

Its actually is simplier that I expected.
I took my last trace, and removed all reclocking scripts.
(So the difference between this and non-reclocking trace is fiddling with GPIO ports and PPCI writes) the result, guess what is 16.0W, just 0.5W higher that with full reclocking trace and 0.7W higher that with blob (ether missed something in trace or some commands are uploaded via ring buffer - can't trace these)....

Also system is now stable and as fast as pm level 1 allows it to be (fine with me).

Yay!
Comment 18 maximlevitsky 2011-08-03 18:06:49 UTC
Of course that is power usage while idle and many things (like wireless) turned off, but I for very long time measure power usage in these units (and to be sure I have a boot script triggered by grub parameter that sets these for me).

While nouveau is loaded power usage is 16.5W on framebuffer console (and I check and print it every 1 sec, so screen redraw load isn't that low).

While in full KDE session with compiz running, but system otherwise idle, power usage is 16.9W.

Of course when I actually use the system power usage is at 20-22W, but thats the same as with blob.

Full screen brightness (compared to lowest) of course adds ~3W, but I always  set it to lowest while on battery and in room conditions its more that enough.

So standard stuff, you know....
Comment 19 maximlevitsky 2011-08-03 19:38:29 UTC
Created attachment 49897 [details]
minimal trace of non reclock magic

So this is more or less minimal trace. Reduces power usage from 21W to 16W.

If I in addition to that use Martin's reclocking code I can reduce the power usage further by 0.5W (of course nouveau adds 0.5W on its own, so end power usage on level 0 is ~16W in fb console).
Comment 20 maximlevitsky 2011-08-04 16:57:49 UTC
Created attachment 49933 [details]
minimal trace of non reclock magic - v2

New trace with comments on how each register setting affects power usage and default values of registers.
Comment 21 maximlevitsky 2011-08-04 17:03:23 UTC
Created attachment 49934 [details]
minimal trace of non reclock magic - v3

some cosmetic updates
Comment 22 Martin Peres 2011-08-09 15:24:31 UTC
Maxim, could you try this?

nvapoke 1588 30 # disable VPE
nvapoke 1590 3c00 # disable PBSP and PCRYPT)

Please tell me if it helps with power consumption :)

Also, you can have fun by disabling PGRAPH:
nvapoke 1588 3 (or 33 if you also want to disable VPE)

Looking forward to your answer
Comment 23 maximlevitsky 2011-08-27 14:43:10 UTC
Created attachment 50622 [details]
minimal trace of non reclock magic - v4

Another version, with GPIO bits commented out as they are probably wrong on other systems.

Also some cosmetic cleanup
Comment 24 maximlevitsky 2011-08-27 19:39:31 UTC
1099/1588/1590 information as requested:


register 1098:

0x21ca0004 - default
0x21ca0034 - set by blob
0xf9fe007e - writeable bits
0x00000020 - enable automatic clock gating for register 0x1588
0x00000040 - enable automatic clock gating for register 0x1590


register 1588/1590:

* registers consists of 2bit subregisters (or 4bit subregisters maybe???):
0x00 - nothing
0x01 - automatic clock gating, doesn't affect engine usability 
        (tested on PGRAPH only). see register 0x1098

0x02 - nothing/reserved - tested this in all combinations
0x03 - disable clock - stops the engine - slightly larger power saving that 0x01, but difference small


1588 subregisters:
0x00000003 - PGRAPH - (or part of it).
0x00000030 - VPE - no power savings at all.

1590 subregisters:
0x00003000 - some VP2 engine
0x00000300 - some another VP2 engine
0x00000c00 - supposed to be also VP2 related - no reaction

also note that writing 0x8000 to this register while 1098 bit 0x40 set harms power usage
Comment 25 maximlevitsky 2011-08-28 12:18:40 UTC
Created attachment 50637 [details]
information about DAC clock regs I gathered

This is information on DAC registers I gathered recently
Comment 26 maximlevitsky 2011-08-28 13:09:23 UTC
After applying power down magic, the power usage appears differently:

---------------- after whole magic --------------------


DAC0 - should be for LVDS 

0x00000000 - 16.400W
0x00000001 - 16.400W
0x00000002 - 16.400W
0x00000003 - 16.400W

0x80000000 - 16.480W
0x80000001 - 16.200W
0x80000002 - 16.085W
0x80000003 - 16.200W



DAC1 - VGA/...

0x00000000 - 16.400W
0x00000001 - 16.400W
0x00000002 - 16.100W
0x00000003 - 16.100W

0x80000000 - 16.400W
0x80000001 - 16.200W
0x80000002 - 16.085W
0x80000003 - 16.175W
Comment 27 Maarten Maathuis 2011-08-28 13:20:41 UTC
Why would LVDS use DAC registers?
Comment 28 maximlevitsky 2011-08-28 13:24:14 UTC
Dunno, there writes only affect VGA visually. On LVDS and DVI only effect is lower power usage.
Comment 29 maximlevitsky 2011-08-30 17:59:20 UTC
Once again, stupid me.
I somehow assumed that DAC == CRTC, thus all this nonsense about hardwiring, LVDS, etc...

Its actually much simpler:
My system has 2 DACs:

First one is assigned to TV-OUT which isn't yet supported (and I might help add that support, although it somewhat pointless in our digital age - this days its even hard to buy an S-Video cable without resorting to Ebay, etc)

The second DAC is for the VGA output then.
So the bits that confuse it, and set only for first DAC probably just enable some TV oriented knobs.
Comment 30 Florian Mickler 2012-01-21 08:51:57 UTC
A patch referencing this bug report has been merged in Linux v3.2-rc1:

commit fbba036a56fe0e5c5e8c91daf3fa211f88d94a03
Author: Martin Peres <martin.peres@ensi-bourges.fr>
Date:   Sat Jul 30 23:08:45 2011 +0200

    drm/nv50/gr: enable ctxprog xfer only when we need it to save power
Comment 31 Pierre Moreau 2015-01-17 00:04:39 UTC
Is this still an issue with latest kernel version (3.18, 3.19-rc4)?
Comment 32 Martin Peres 2015-01-17 05:24:01 UTC
(In reply to Pierre Moreau from comment #31)
> Is this still an issue with latest kernel version (3.18, 3.19-rc4)?

Yes, it is :) There is still a lot of work to be done in clock and power gating.