I want to track all the progress I made on reverse-engineering power-management on my NV86 card. First, the statistics that should give us the idea about what to expect. (All measurements done using acpi counters, in more or less controlled environment. 0% brightness, using vga console and X server running on other VT. VGA console were running powertop). The measurement is what appeared to be minimum value. Wireless was on, and no special knobs to decrease power usage were applied (like unloading the ethernet driver). following suggestion of Emil Velikov I changed the vbios slightly so it shows 0 available power levels, thus nvidia driver wasn't able to reclock the card. (unless I say otherwice) no drivers loaded..........: 22.6W nouveau....................: 22.6W nouveau, pm level 0........: 20.9W nvidia (no reclock)........: 19.4W no driver + mmiotrace replay of above............: 19.4W nvidia + reclocking........: 15.8W I was able to reduce the mmiotrace to very small list and with it I can bring power level to 19.4 (or even less sometimes) I will attach it here. Note that trace consists of 6 parts. If I concat them, power usage seems to increase. Removal of many commands of above traces also increases power usage a bit, so the conclusion is that traces contain many many commands that each turns down that or other part of the card.
Created attachment 47519 [details] 1st part lots of PTHERM writes. They do matter (I trimmed some of them though).
Created attachment 47520 [details] 2nd part Changes clock of one of DACS. Appears to decrease power usage by 0.1W
Created attachment 47521 [details] 3rd part Most intersting part is PMC.ENABLE write. I tried to merge that part with part1, but got higher power usage. PFB accesses appear to lower power usage a bit too (~0.1W)
Created attachment 47522 [details] 4th part Blob touches a clock register here.
Created attachment 47523 [details] 5th part PCRYPT, PBSP, PVP accesses...
Created attachment 47524 [details] 6th part More PVP, PCRYPT, writes. I tried merging that with former part, didn't seem to succeed (need wait between them I guess, but I'll check that again later).
Created attachment 47531 [details] even smaller trace I checked, actually double-checked that this trace gives same power usage as non-reclocked blob.
Created attachment 48901 [details] latest trace Latest trace. Might give a bit higher power usage, dunno.
So thats it, I reduced and more or less understood the involved register writes, the end result is about 0.150W higher wattage that blob and this trace: # enable everything in PMC.ENABLE - VPE bit decreases power usage dramaticly [0] 1226.455757 MMIO32 W 0x000200 0xffffffff PMC.ENABLE <= everything # some power magic [0] 1226.458120 MMIO32 W 0x001098 0x21ca003c PBUS+0x98 <= 0x21ca003c [0] 1226.458192 MMIO32 W 0x001604 0x00020804 PBUS+0x604 <= 0x20804 [0] 1226.458299 MMIO32 W 0x001588 0x00000001 PBUS+0x588 <= 0x1 #disable VC2 xtensa clock [0] 1226.471062 MMIO32 W 0x00c040 0x2ee01233 0xc040 <= 0x2ee01233 # disable secondary DAC [0] 1226.425262 MMIO32 W 0x61a010 0x80000002 PDISPLAY.DAC_REGS[0].CLK_CTRL1 <= { CONNECTED = 0 | 0x80000002 } [0] 1226.425394 MMIO32 W 0x61a004 0xd0150000 PDISPLAY.DAC_REGS[0].DPMS_CTRL <= { PENDING | 0x50150000 } [0] 1226.425610 MMIO32 W 0x61a810 0x00000003 PDISPLAY.DAC_REGS[0x1].CLK_CTRL1 <= { CONNECTED = 0 | 0x3 } [0] 1226.425717 MMIO32 W 0x61a804 0xd0150000 PDISPLAY.DAC_REGS[0x1].DPMS_CTRL <= { PENDING | 0x50150000 } #PFB magic - some dram optimization ??? [0] 1226.423318 MMIO32 W 0x100000 0x0000c042 PFB+0 <= 0xc042 [0] 1226.423388 MMIO32 W 0x100004 0x0000c042 PFB+0x4 <= 0xc042 [0] 1226.423457 MMIO32 W 0x100008 0x0000c042 PFB+0x8 <= 0xc042 [0] 1226.423527 MMIO32 W 0x100b78 0x0000c042 PFB+0xb78 <= 0xc042 [0] 1226.423596 MMIO32 W 0x100c0c 0x0000c042 PFB+0xc0c <= 0xc042 [0] 1226.423735 MMIO32 W 0x100d04 0x0000c042 PFB+0xd04 <= 0xc042 [0] 1226.423804 MMIO32 W 0x100e0c 0x0000c042 PFB+0xe0c <= 0xc042
Created attachment 49720 [details] almost minimal trace Also attach it here
Created attachment 49725 [details] final trace Thats it, I found what register write spoiled the work done by trace. It would seem to me that that is some sort of PFIFO idle mode register, clock gating or whatever. nouveau sets register 0x400824, PGRAPH.CTXCTL_FLAGS_0 to 0x4000 and spoils things. Now the latest trace or now more correctly set of register writes undos that register setting. So thats it 3W less while running normal desktop and compiz.
Created attachment 49752 [details] [review] Enable xfer only when we need it to save power drm/nv50/ctxprog: enable xfer only when we need it to save power This patch adds instructions to ctxprog and by doing, impact context-switching performance. My testcase showed a 1% performance cost using glxgears that is a context-switch bound application. Please test and report bugs/performance/power/other.
Created attachment 49772 [details] full reclocking trace Thanks!, I didn't yet test this, but I am 100% sure that patch will work on my system. Meanwhile, I was playing with full reclocking trace, and I attach it here. I kept all (or at least vast majority) of PFB accesses so that you could look what it does to memory timings. It reduces power usage to 15.5W (and blob it about 15.3W). When I replay it, then load nouveau, then fix flag register and powerdown unused DAC, I get 15.8W. If I run full blown desktop, I am getting ~16W while I let the system go idle. Pretty impressive, considering that on stock nouveau, 21W was lower bound for power usage....
@Martin: I tested your patch and of course it works. Once small note is that you probably also want nouveau stop writing 0x4000 to that register initially. With your patch that flag is unset on first ctxprog invocation, so it doesn't matter much, yet nouveau shouldn't set it if blob doesn't. I also reduced trace futher. I also sadly note that my full reclocking trace reduces system stability a lot (supertuxkart crashes just after 1~2 laps and hangs the system). (Although due to low clocks involved it might just expose better hangs I had due to nouveau drivers....) Also, the register 0xc040, I poked this evening a lot does play some role in blob reclocking, yet though I don't have any ideas on how to nail down its contents.
Created attachment 49857 [details] new reclocking trace
Created attachment 49858 [details] c040 woes My poor man attempts to understand what that thing really does
Created attachment 49894 [details] trace with reclock culled Its actually is simplier that I expected. I took my last trace, and removed all reclocking scripts. (So the difference between this and non-reclocking trace is fiddling with GPIO ports and PPCI writes) the result, guess what is 16.0W, just 0.5W higher that with full reclocking trace and 0.7W higher that with blob (ether missed something in trace or some commands are uploaded via ring buffer - can't trace these).... Also system is now stable and as fast as pm level 1 allows it to be (fine with me). Yay!
Of course that is power usage while idle and many things (like wireless) turned off, but I for very long time measure power usage in these units (and to be sure I have a boot script triggered by grub parameter that sets these for me). While nouveau is loaded power usage is 16.5W on framebuffer console (and I check and print it every 1 sec, so screen redraw load isn't that low). While in full KDE session with compiz running, but system otherwise idle, power usage is 16.9W. Of course when I actually use the system power usage is at 20-22W, but thats the same as with blob. Full screen brightness (compared to lowest) of course adds ~3W, but I always set it to lowest while on battery and in room conditions its more that enough. So standard stuff, you know....
Created attachment 49897 [details] minimal trace of non reclock magic So this is more or less minimal trace. Reduces power usage from 21W to 16W. If I in addition to that use Martin's reclocking code I can reduce the power usage further by 0.5W (of course nouveau adds 0.5W on its own, so end power usage on level 0 is ~16W in fb console).
Created attachment 49933 [details] minimal trace of non reclock magic - v2 New trace with comments on how each register setting affects power usage and default values of registers.
Created attachment 49934 [details] minimal trace of non reclock magic - v3 some cosmetic updates
Maxim, could you try this? nvapoke 1588 30 # disable VPE nvapoke 1590 3c00 # disable PBSP and PCRYPT) Please tell me if it helps with power consumption :) Also, you can have fun by disabling PGRAPH: nvapoke 1588 3 (or 33 if you also want to disable VPE) Looking forward to your answer
Created attachment 50622 [details] minimal trace of non reclock magic - v4 Another version, with GPIO bits commented out as they are probably wrong on other systems. Also some cosmetic cleanup
1099/1588/1590 information as requested: register 1098: 0x21ca0004 - default 0x21ca0034 - set by blob 0xf9fe007e - writeable bits 0x00000020 - enable automatic clock gating for register 0x1588 0x00000040 - enable automatic clock gating for register 0x1590 register 1588/1590: * registers consists of 2bit subregisters (or 4bit subregisters maybe???): 0x00 - nothing 0x01 - automatic clock gating, doesn't affect engine usability (tested on PGRAPH only). see register 0x1098 0x02 - nothing/reserved - tested this in all combinations 0x03 - disable clock - stops the engine - slightly larger power saving that 0x01, but difference small 1588 subregisters: 0x00000003 - PGRAPH - (or part of it). 0x00000030 - VPE - no power savings at all. 1590 subregisters: 0x00003000 - some VP2 engine 0x00000300 - some another VP2 engine 0x00000c00 - supposed to be also VP2 related - no reaction also note that writing 0x8000 to this register while 1098 bit 0x40 set harms power usage
Created attachment 50637 [details] information about DAC clock regs I gathered This is information on DAC registers I gathered recently
After applying power down magic, the power usage appears differently: ---------------- after whole magic -------------------- DAC0 - should be for LVDS 0x00000000 - 16.400W 0x00000001 - 16.400W 0x00000002 - 16.400W 0x00000003 - 16.400W 0x80000000 - 16.480W 0x80000001 - 16.200W 0x80000002 - 16.085W 0x80000003 - 16.200W DAC1 - VGA/... 0x00000000 - 16.400W 0x00000001 - 16.400W 0x00000002 - 16.100W 0x00000003 - 16.100W 0x80000000 - 16.400W 0x80000001 - 16.200W 0x80000002 - 16.085W 0x80000003 - 16.175W
Why would LVDS use DAC registers?
Dunno, there writes only affect VGA visually. On LVDS and DVI only effect is lower power usage.
Once again, stupid me. I somehow assumed that DAC == CRTC, thus all this nonsense about hardwiring, LVDS, etc... Its actually much simpler: My system has 2 DACs: First one is assigned to TV-OUT which isn't yet supported (and I might help add that support, although it somewhat pointless in our digital age - this days its even hard to buy an S-Video cable without resorting to Ebay, etc) The second DAC is for the VGA output then. So the bits that confuse it, and set only for first DAC probably just enable some TV oriented knobs.
A patch referencing this bug report has been merged in Linux v3.2-rc1: commit fbba036a56fe0e5c5e8c91daf3fa211f88d94a03 Author: Martin Peres <martin.peres@ensi-bourges.fr> Date: Sat Jul 30 23:08:45 2011 +0200 drm/nv50/gr: enable ctxprog xfer only when we need it to save power
Is this still an issue with latest kernel version (3.18, 3.19-rc4)?
(In reply to Pierre Moreau from comment #31) > Is this still an issue with latest kernel version (3.18, 3.19-rc4)? Yes, it is :) There is still a lot of work to be done in clock and power gating.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/17.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.