Summary: | [945GME] poor 3d performance in deep c-states | ||
---|---|---|---|
Product: | DRI | Reporter: | Antonio Orefice <kokoko3k> |
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> |
Severity: | normal | ||
Priority: | low | CC: | anarsoul, b.buschinski, jbarnes, jeramy.smith, lambchop468, linux, maxijac, mcepl, rodrigo.vivi, sergio.callegari |
Version: | unspecified | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Created attachment 38938 [details]
dmesg log
Created attachment 38939 [details]
xorg.conf
Sorry for the typo: "(in my case it went from 14~125[..] Is: "(in my case it went from 14~15[..] Swapbuffer vs interrupts. Is the teapot fullscreen? Are page-flips enabled? Is something disabling the interrupts on your system? Looks like the processor c-state issue we have on some other 945 machines. If you boot with processor.max_cstate=1 does the problem go away? The issue is that we rely on vblank interrupts arriving at the correct frequency, and on some platforms when the CPU is in a deep sleep state, it won't wake up when a vblank interrupt arrives, but it will wake up when other device interrupts arrive. That's why you see the performance increase when you move the mouse. I still don't know the root cause, but if the above works for you, then it's a duplicate of a known bug at least. (In reply to comment #5) > Is the teapot fullscreen? Are page-flips enabled? Is something disabling the > interrupts on your system? I don't know how to make teapot runs in fullscreen mode, the best i've done was to launch it in a empty X screen by xterm without any WM and the problem persists. With compiz enabled (and unredirect fullscreen windows) i was able to make it fullscreen too with a shortcut, the problem persists. Pagefilps were disabled in my system because of stability issues (enabled in kernel, disabled in X), i tried to recompile the driver to enable them again for a while and tried again, no success. I can't say if something is blocking interrupts in my system, sorry, anyway, listening to an mp3 in background helped a bit (fps went from 15 to 20 in teapot) (In reply to comment #6) > Looks like the processor c-state issue we have on some other 945 machines. If > you boot with processor.max_cstate=1 does the problem go away? > > The issue is that we rely on vblank interrupts arriving at the correct > frequency, and on some platforms when the CPU is in a deep sleep state, it > won't wake up when a vblank interrupt arrives, but it will wake up when other > device interrupts arrive. That's why you see the performance increase when you > move the mouse. > > I still don't know the root cause, but if the above works for you, then it's a > duplicate of a known bug at least. I tried that boot option and the problem disappeared. Unfortunately, and as expected that thing makes my netbook power hungry, using powertop i noticed that it went from ~6.5..7W to ~8W+ just idling, and expected uptime battery life dropped from about ~10hrs to ~8. Out of curiosity, is a vblank interrupt still needed when one doesn't need (or doesn't care) about vsync? At least, thank you very much for answering and claryfing things, at this point it is clear that this is a duplicate bug, if could you mark it to the right one? (In reply to comment #8) > I tried that boot option and the problem disappeared. > Unfortunately, and as expected that thing makes my netbook power hungry, using > powertop i noticed that it went from ~6.5..7W to ~8W+ just idling, and expected > uptime battery life dropped from about ~10hrs to ~8. Yeah, it's unfortunate. I don't think they see this problem on Windows because they probably can't reach a deep enough sleep state to be affected (Windows and its applications tend to have lots of timers running that keep the CPU awake). > Out of curiosity, is a vblank interrupt still needed when one doesn't need (or > doesn't care) about vsync? Yes, if you don't have apps waiting for vsync or doing buffer swaps, you shouldn't need the vblank interrupt (the kernel will shut it off). But anything using GL will do buffer swaps and thus need the vsync interrupt, unless you disable it entirely using vblank_mode=0 in your dri configuration file (.drirc or /etc/drirc iirc). > At least, thank you very much for answering and claryfing things, at this point > it is clear that this is a duplicate bug, if could you mark it to the right > one? Actually I don't think we have bug open on this, so we'll use this one. :) All the discussion of this so far has just been on the mailing lists. Jesse, what about pm_qos stuff mentioned on maillist? I don't have a tool to set that from userspace, and I didn't see a good way of doing it from within the kernel, but I expect it just limits the processor max c state, just like the boot param. Another thing to try, that worked on my aspireone, is to boot with maxcpus=1. (In reply to comment #9) Anyway, the same driver on kernel 2.6.33 performs just fine for me (low power consumption and right vblank interrupts), so i think this problem has definitely a solution lying around. 2.6.33 doesn't support vblank events, so you wouldn't be able to run the code that exposes this problem. I'm sure the interrupt issue still exists on 2.6.33 though, you just don't see it because you're not running code that's sensitive to interrupt latency. I tried using ShadowFB as workaround, and found that it works _much_ better with KDE 4.5 and latest intel driver :) (at least konsole is not jerky) (In reply to comment #13) > 2.6.33 doesn't support vblank events, so you wouldn't be able to run the code > that exposes this problem. I'm sure the interrupt issue still exists on 2.6.33 > though, you just don't see it because you're not running code that's sensitive > to interrupt latency. Please, excuse in advance my ignorance and probably the stupid question, but what are the advantages (if any) on running that code? I'm asking because i didn't noticed any performance or tearing difference with 2.6.35+processor.max_cstate=1 compared to 2.6.33. The new code has some potential performance benefits (it allows page flipping and won't waste GPU time on frames that won't be displayed), and adds back several missing GL features. You can get the same behavior with current code as in 2.6.33 by disabling the new features. You can do this by setting vblank_mode=0 in your environment or drirc config file. (In reply to comment #16) > The new code has some potential performance benefits (it allows page flipping > and won't waste GPU time on frames that won't be displayed), and adds back > several missing GL features. > > You can get the same behavior with current code as in 2.6.33 by disabling the > new features. You can do this by setting vblank_mode=0 in your environment or > drirc config file. I just readed that answer by Vasily Khoruzhick on the mailing list: "That doesn't help, glxgears shows ~1000fps, but it's output is jerky" Anyway thank you for the suggestion, i'll try by myself as soon as possible. (In reply to comment #17) > I just readed that answer by Vasily Khoruzhick on the mailing list: > "That doesn't help, glxgears shows ~1000fps, but it's output is jerky" > > Anyway thank you for the suggestion, i'll try by myself as soon as possible. If I've got it right, that should be fixed on -next with the per-process throttling. (In reply to comment #18) > (In reply to comment #17) > > I just readed that answer by Vasily Khoruzhick on the mailing list: > > "That doesn't help, glxgears shows ~1000fps, but it's output is jerky" > > > > Anyway thank you for the suggestion, i'll try by myself as soon as possible. > > If I've got it right, that should be fixed on -next with the per-process > throttling. Can't understand fully what you said, but let's wait for the next release then. (In reply to comment #18) > (In reply to comment #17) > > I just readed that answer by Vasily Khoruzhick on the mailing list: > > "That doesn't help, glxgears shows ~1000fps, but it's output is jerky" > > > > Anyway thank you for the suggestion, i'll try by myself as soon as possible. > > If I've got it right, that should be fixed on -next with the per-process > throttling. Please give a link to commit/patch when it's ready. Thanks (In reply to comment #18) > If I've got it right, that should be fixed on -next with the per-process > throttling. Tried drm-intel-next from today, bug still remains. Created attachment 39347 [details] [review] ICH7 LPC debug driver Can you load this driver and tell me what it outputs? I wonder if BM_BREAK_EN is 0 on your machine as well... This patch on top of the last attachment should let the CPU wake up much more frequently, assuming the break reg is 0, give it a try and see if it helps your performance problem. diff --git a/drivers/platform/x86/intel_lpc.c b/drivers/platform/x86/intel_lpc.c index d3c5ef5..3be93c1 100644 --- a/drivers/platform/x86/intel_lpc.c +++ b/drivers/platform/x86/intel_lpc.c @@ -50,6 +50,8 @@ static int lpc_probe(struct pci_dev *dev, const struct pci_dev dev_err(&dev->dev, "ACPI_CX_STATE_CONF: 0x%02x\n", cxstate); dev_err(&dev->dev, "ACPI_BM_BREAK_EN: 0x%02x\n", break_en); + pci_write_config_byte(dev, ACPI_BM_BREAK_EN, 0xf3); + out: return ret; } [ 565.573458] intel lpc 0000:00:1f.0: ACPI_CX_STATE_CONF: 0x1c [ 565.573464] intel lpc 0000:00:1f.0: ACPI_BM_BREAK_EN: 0x00 (In reply to comment #23) > This patch on top of the last attachment should let the CPU wake up much more > frequently, assuming the break reg is 0, give it a try and see if it helps your > performance problem. I didn't tried out the patch yet because i'm not so familiar with kernel patching and we need this netbook daily. But i was wondering if is possible (and how) to use setpci to try different configurations for BM_BREAK_EN register at runtime. Thank you very much for your efforts. Bug is reproducible on following machines: Lenovo 3000 N100 laptop, Core 2 Duo T5500 CPU, 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03), pciid: 8086:27a2 Acer Aspire AOA110 netbook, Atom N270 CPU, 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GME Express Integrated Graphics Controller (rev 03), pciid: 8086:27ae Also reproducible on Acer extensa 5513 laptop, with C2D T5500 CPU, 00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03), pciid: 8086:27a2 Just to add my two cents: have same issue atom+945gm if i add more load on cpu frame rate will grow too. processor.max_cstat option didn't changed anything, powertop show there is still c4 (may be some other kernel bug) maxcpus=1 solve the problem, it work with C4, powersaving and better performance. So haw about the problem with sheduler or irq balancing on SMP? I'll will test the patch von Jesse ASAP. The patch from comment 23 do not make any difference for me. disable SMP is best configuration for me. This bug probably affects Intel HD Graphics too: glxgears with idle CPU: 4925 frames in 5.0 seconds = 984.918 FPS 4941 frames in 5.0 seconds = 988.052 FPS 4996 frames in 5.0 seconds = 999.137 FPS 4973 frames in 5.0 seconds = 994.512 FPS glxgears with 100% loaded CPU (one thread only): 7544 frames in 5.0 seconds = 1508.685 FPS 7458 frames in 5.0 seconds = 1491.536 FPS 7378 frames in 5.0 seconds = 1475.574 FPS 7415 frames in 5.0 seconds = 1482.973 FPS roughly 50%(!) faster. Today i tried with 2.6.36, and obviously the results are the same, so i'm still using 2.6.33. For me disabling a core or the hyperthreading is not an option due to the higher power consumption and the shorter battery life. If i understood properly, the issue still appears to be unresolved and the hypothesis made doesn't seems to be able to make anything really useful. I understood thet the new code is looking forward to provide a "gain" in performance, but now my proposal is to do some kind of workaround for the specific chipsets that expose the problem so that at least their users will be able to upgrade to newer kernels without suffernig any performance "loss". Could such a thing be done in the video driver itself or does it requires patches or special config options of the kernel (?). Created attachment 40923 [details] [review] Use PM QoS latency to prevent dropping below C2 on Atom Proof-of-principle? (In reply to comment #32) > Created an attachment (id=40923) [details] > Use PM QoS latency to prevent dropping below C2 on Atom > > Proof-of-principle? This patch helped only marginally (10% better than without it in idle mode): $ glxgears (power savings on, CPU running @ 1.2GHz) 5601 frames in 5.0 seconds = 1120.104 FPS 5612 frames in 5.0 seconds = 1122.275 FPS 5603 frames in 5.0 seconds = 1120.483 FPS 5606 frames in 5.0 seconds = 1121.091 FPS 5587 frames in 5.0 seconds = 1117.238 FPS $ glxgears (power savings off, CPU running @ 3.2GHz) 7089 frames in 5.0 seconds = 1417.741 FPS 7068 frames in 5.0 seconds = 1413.511 FPS 7082 frames in 5.0 seconds = 1416.285 FPS 7079 frames in 5.0 seconds = 1415.792 FPS 7057 frames in 5.0 seconds = 1411.390 FPS P.S. I have Intel HD 1st generation graphics. As .drirc configuration file is finally honoured in the latest intel-dri/mesa (i have 7.9.0.git20101207), setting vblank_mode=0 (as explicitely suggested by Jesse Barnes) now works and the issue is gone for me. Strangely enough, i can't see any tearing in glxgears. I know this is a workaround, but on such poor hardware enabling vsync would be a bad idea anyway. (In reply to comment #32) > Created an attachment (id=40923) [details] > Use PM QoS latency to prevent dropping below C2 on Atom > > Proof-of-principle? As I stated on IRC, it does not help in my case - glxgears still shows 30-40fps instead of 60. I want to note that it's not only tearing/vblank issue, response on user actions in KDE with effects enabled is not good (it was much better earlier) Created attachment 41720 [details] [review] Use PM QoS latency to keep CPU from dropping below C1 when vblanks enabled (In reply to comment #32) > Created an attachment (id=40923) [details] > Use PM QoS latency to prevent dropping below C2 on Atom > > Proof-of-principle? Here is a variant of that patch I tried that does fix the issue on my hardware: Acer Aspire One 9" Netbook AOA150, 945GSE and Intel N270 Processor It does produce a few WARNs because I am calling pm_qos_add_request from an interrupt disabled context. (also attached) testcase used is vblank_mode=2 glxgears Created attachment 41721 [details]
WARNs from using "Use PM QoS latency to keep CPU from dropping below C1 when vblanks enabled"
*** Bug 32916 has been marked as a duplicate of this bug. *** Created attachment 41796 [details] [review] Use PM QoS to prevent C-State starvation of gen3 GPU Raise you a work function. (In reply to comment #39) > Created an attachment (id=41796) [details] > Use PM QoS to prevent C-State starvation of gen3 GPU > > Raise you a work function. It does not apply on top of 2.6.37, could you please prepare version for stable kernel? Created attachment 41814 [details] [review] Use PM QoS to prevent C-State starvation of gen3 GPU for 2.6.37 (In reply to comment #40) > (In reply to comment #39) > > Created an attachment (id=41796) [details] [details] > > Use PM QoS to prevent C-State starvation of gen3 GPU > > > > Raise you a work function. > > It does not apply on top of 2.6.37, could you please prepare version for stable > kernel? Chris's patch mangled to work with 2.6.37 (two changes, s/irq_lock/user_irq_lock/ in two places) (In reply to comment #39) > Created an attachment (id=41796) [details] > Use PM QoS to prevent C-State starvation of gen3 GPU > > Raise you a work function. Confirming that this works on 2.6.37 on: Acer Aspire One 9" Netbook AOA150, 945GSE and Intel N270 Processor testcase vblank_mode=2 glxgears (I probably should test with -next but don't have time at the moment) (In reply to comment #41) > Created an attachment (id=41814) [details] > Use PM QoS to prevent C-State starvation of gen3 GPU for 2.6.37 > > Chris's patch mangled to work with 2.6.37 (two changes, > s/irq_lock/user_irq_lock/ in two places) Thanks, looks like it works. (In reply to comment #43) > Thanks, looks like it works. But it does not work after update to xf86-video-intel-2.14.0 :( 20-30 fps in glxgears instead of 60. (In reply to comment #44) > (In reply to comment #43) > > > Thanks, looks like it works. > > But it does not work after update to xf86-video-intel-2.14.0 :( 20-30 fps in > glxgears instead of 60. I'm not seeing this with xf86-video-intel-2.14.0 Hmm... libdrm-git version: bad5242a xf86-video-intel version: 2.14.0 mesa version: 7.10 xorg-server: 1.9.3.901-1 kernel: (not vanilla) 2.6.37 + patch in attachment 41814 [details] [review] Reassigning back to Chris; doesn't look like we'll be able to find a hardware solution to this one. I've applied Alexander's patch to drm-intel-next, so please give that branch a thorough testing! Tentatively closing with the patch landing in -next. Things to look out for: 1. fps stuttering (i.e. the reoccurrence of the original bug); 2. obscene power consumption; 3. aliens. Created attachment 42959 [details] [review] Twiddle INSTPM bit11 New patch time! I tested last patch (replace vblank PM QoS with "Interrupt-Based AGPBUSY#"), it return first issue, fps stuttering. power usage is ok. Created attachment 44065 [details] [review] Move INSTPM bit twiddling to intel_mark_busy How about with this patch? no noticeable difference. (In reply to comment #51) > Created an attachment (id=44065) [details] > Move INSTPM bit twiddling to intel_mark_busy > > How about with this patch? plain drm-intel-next (47ae63e) with and without this patch resulted in missing vblanks & stuttery glxgears. As discussed on IRC, my BIOS doesn't set INSTPM_AGPBUSY_DIS (INSTPM bit 11), so this won't fix it anyway. *** Bug 37966 has been marked as a duplicate of this bug. *** This might be interesting: http://cgit.freedesktop.org/~danvet/drm/log/?h=better-gpu_cpufreq Is this issue still there at new kernel? What is the latest kernel this issue was seen? Does any one tested this better-gpu_cpufreq branch? Still here on 3.6, will test on 3.7 as soon as it get into archlinux repos No need, it's a known design feature of the power management hardware. The only question is whether we can find an acceptable workaround. *** Bug 59895 has been marked as a duplicate of this bug. *** Thanks for pointing out so quickly the status of Bug 59895 as a duplicate of this one! This thread was an intersting read. I guess it's time to give up - the only approach with restricting the deep sleep states resulted in horrid power consumption figures ... Just wiggle your mouse a bit :( I don't know if anybody is still watching this bug, but these patches need testing: http://lists.freedesktop.org/archives/intel-gfx/2014-February/039493.html http://lists.freedesktop.org/archives/intel-gfx/2014-February/039494.html http://lists.freedesktop.org/archives/intel-gfx/2014-February/039492.html I'll be able to test them in 2-3 weeks. As a reminder to myself, my only surviving non-pnv machine (915gm) has a processor that does not support C-states (only speedstep). I tried the patches and only keeping the CPU at maximum is sufficient to hit glxgears vrefresh. So, i can test it. Are there any place where i can pull all patches together? On top of which branch should i test? (In reply to comment #65) > So, i can test it. > Are there any place where i can pull all patches together? On top of which > branch should i test? I pushed the patches here: git://gitorious.org/vsyrjala/linux.git agpbusy I also reorganized them so it's easy to revert the top commit, which is something you might as well try in case there's no improvement with the branch as is. Hmm... i do not see noticeable changes. I tested this patches on ubuntu 13.10 with unity/compize deskotop. Glxgears show same performance before and after patches - about 58fps. C4ATM usage seems to be identical too. Do you have some suggestions what should i test? It would be easier to reproduce on a bare X. If you do from a vt: sudo service ligthdm stop sudo Xorg -ac -noreset & sleep 3; DISPLAY=:0 xterm then launch glxgears from the xterm, does it show the behaviour we need to fix? i.e. runs at below refresh rate unless there is another source of interrupts (e.g. wiggling the mouse)? If you can reproduce that, we can begin to test the patches. No, i can't reproduce initial bug. After powertop optimisation i get about 20 wk/s. Just to make sure the suystem is idle. On plain Xorg i get 125fps. Without any glitches. With and without patches i get same results. (In reply to comment #69) > On plain Xorg i get 125fps. Without any glitches. > With and without patches i get same results. Ah, that's broken - we are not using vsync. Presumably it failed to get permission to open /dev/dri/card0 and so is using indirect rendering (which does not respect vsync). Try "LIBGL_DEBUG=1 glxinfo" and see if (a) reports indirect rendering and (b) why. You was right, there was no access to dri. Now i tested it with sudo glxgears. So results are absolutely unusable. with moving mouse fps will drop to 5fps. With moving mouse - 60fps. Results are same, before and after this patch set. Typo in previous comment: without mouse - 5fps with mouse - 60fps If it will some how help, i can give ssh access to this machine. (In reply to comment #71) > You was right, there was no access to dri. > Now i tested it with sudo glxgears. > So results are absolutely unusable. with moving mouse fps will drop to 5fps. > With moving mouse - 60fps. Results are same, before and after this patch set. Hmm. Was it running fullscreen or under a GL compositor that page flips? Something like: 'vblank_mode=3 glxgears -fullscreen' should force it to do what we want, assuming your wm isn't totally crap. (In reply to comment #74) > (In reply to comment #71) > > You was right, there was no access to dri. > > Now i tested it with sudo glxgears. > > So results are absolutely unusable. with moving mouse fps will drop to 5fps. > > With moving mouse - 60fps. Results are same, before and after this patch set. > > Hmm. Was it running fullscreen or under a GL compositor that page flips? Windowed under bare X. > Something like: 'vblank_mode=3 glxgears -fullscreen' should force it to do > what we want, assuming your wm isn't totally crap. We don't need to force fullscreen to cause us to loose vblank interrupts whilst the processor is asleep (and so render very slowly). (In reply to comment #75) > We don't need to force fullscreen to cause us to loose vblank interrupts > whilst the processor is asleep (and so render very slowly). Oh right. Not sure where I got the idea that we wouldn't use vblank irqs unless fullscreen. After thinking about this for a while I started to question why we're frobbing the AGPBUSY bit all the time. It won't force an exit from C3 unless there's a pending interrupt, so we should just be able to leave it on all the time. I pushed that idea here: git://gitorious.org/vsyrjala/linux.git agpbusy2 I guess the chances of it working are slim, but migth as well try. kernel 3.13.0-00966-gec441a0, same result. 5-10fps on idle system, and 60fps with moving mouse. Yeah I guess that's it, time to give up on this one. Wiggling the mouse or running with wayland should fix this. Thanks for reporting this bug and testing ideas, sorry that we couldn't make this work :( I got fed up with my 945gm not being capabile of 60fps glxgears. commit d938da6b132a2d6addeba4c57a67ec3c07824843 Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Fri Mar 22 20:08:03 2019 +0200 drm/i915: Disable C3 when enabling vblank interrupts on i945gm The main difference compared to the older pm_qos attempts is that I found a way to dig out the exact c3 disable latency, so we should have a reasonable guarantee that we do disable c3 but not c2. The power cost of not using c3 seems to be about 0.7W on my machine (with the display on), so this isn't exactly cheap :( I did spend quite a bit of time at some point digging through the chipset docs (such as they are). It's been a while since I did that but I'll try to summarize what I recall; Gen3 introduced some kind of new mechanism by which the gmch can wake up the CPU. The old AGPBUSY/PM_BUSY involved the ICH as well IIRC, whereas the new mechanism supposedly does not. IIRC the new mechanism already appears in the i915gm docs, but my theory is that i945gm is where it actually got into use and either it is broken or we're missing some magic undocumented bit somewhere. I did try (blindly if necessary) poking at various registers that seemed relevant. Alas, I was unable to find a magic bit to make C3+vblank interrupts cooperate. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 38937 [details] xorg log Chipset: 945GM Kernel version: 2.6.35 (same problem in 2.6.34 but not in 2.6.33) Arch: i686 xorg-server: 1.8.1.902 mesa / intel-dri: 7.8.2 xf86-video-intel: 2.12.0 libdrm version: 2.4.21 Linux distribution: Arch linux (similar issues reported for fedora too) Machine model: Asus 1005HA Display Connector: LVDS (happens on VGA too) Reproducible: Always Step to reproduce: ------------------- Compile mesa demos Launch teapot Observe the framerate Move the mouse around Observe the framerate jumping (in my case it went from 14~125 to 30 just by putting my finger on the touchpad) Roll back to kernel 2.6.33 Launch teapot Observe that there is no difference in framerate if you move the mouse around, notice the framerate is "high" (30fps for me) in any case. Even if just changing kernel version makes the bug disappear, i think it is more logical to file a bug report here. I i made i mistake, i apologize.