Summary: | [SKL dmc] Headless mode media transcoding is 20-30% slower comparing to connected monitor use case | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Dmitry Rogozhkin <dmitry.v.rogozhkin> | ||||||||
Component: | DRM/Intel | Assignee: | Imre Deak <imre.deak> | ||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||
Severity: | major | ||||||||||
Priority: | medium | CC: | armando.antoniox.mora.reos, eero.t.tamminen, hector.franciscox.velazquez.suriano, imre.deak, intel-gfx-bugs, tvrtko.ursulin | ||||||||
Version: | unspecified | ||||||||||
Hardware: | Other | ||||||||||
OS: | Linux (All) | ||||||||||
URL: | https://patchwork.freedesktop.org/patch/150314/ | ||||||||||
Whiteboard: | ReadyForDev | ||||||||||
i915 platform: | BXT, KBL, SKL | i915 features: | display/HDMI, firmware/dmc | ||||||||
Attachments: |
|
Description
Dmitry Rogozhkin
2017-04-05 00:06:26 UTC
I am attaching 3 dmesg shots corresponding to the following command sequence: # dmesg > dmesg.0.after_boot # # ./run_libyami.sh 3596 frame decoded, fps = 143.44. fps after 5 frames = 143.55. transcode done # ./run_libyami.sh 3596 frame decoded, fps = 160.02. fps after 5 frames = 160.18. transcode done # # dmesg > dmesg.1.headless # # cat /sys/class/drm/card0-HDMI-A-1/status disconnected # # echo on > /sys/class/drm/card0-HDMI-A-1/status # cat /sys/class/drm/card0-HDMI-A-1/status connected # # ./run_libyami.sh 3596 frame decoded, fps = 197.47. fps after 5 frames = 197.67. transcode done # ./run_wrap.libyami.sh 3596 frame decoded, fps = 209.68. fps after 5 frames = 209.91. transcode done # # dmesg > dmesg.1.hdmi-a-ON Created attachment 130680 [details]
dmesg.0.after_boot
Created attachment 130681 [details]
dmesg.1.headless
Created attachment 130682 [details]
dmesg.2.hdmi-a-ON
Theory for the bug is that it is related to DMC FW. Need to check whether bug will disappear if DMC will not be loaded. I did not try that myself, sorry. I tried not loading the DMC firmware and can confirm that the issue is not present in that case. Also, it is possible to reproduce this in the default kernel config (no pinning is required) simply with igt/benchmarks/gem_latency -n 0 in which case the perf difference between the two setups was ~8x in my testing. (In reply to Tvrtko Ursulin from comment #6) > I tried not loading the DMC firmware and can confirm that the issue is not > present in that case. > > Also, it is possible to reproduce this in the default kernel config (no > pinning is required) simply with igt/benchmarks/gem_latency -n 0 in which > case the perf difference between the two setups was ~8x in my testing. One possibility is that DC6 enables deeper system level power states and this causes latency elsewhere. What are the PC state residencies shown by powertop or the kernel's tools/power/x86/turbostat when DMC is loaded and not? What's the effect of limiting max_cstates to 0 (and having DMC loaded)? An other problem could be that the GPU is trying to access the display, (maybe checking scan line counts or something?). Does /sys/kernel/debug/dri/0/i915_dmc_info show any transitions during the test when DMC is loaded? (In reply to Imre Deak from comment #7) > (In reply to Tvrtko Ursulin from comment #6) > > I tried not loading the DMC firmware and can confirm that the issue is not > > present in that case. > > > > Also, it is possible to reproduce this in the default kernel config (no > > pinning is required) simply with igt/benchmarks/gem_latency -n 0 in which > > case the perf difference between the two setups was ~8x in my testing. > > One possibility is that DC6 enables deeper system level power states and > this causes latency elsewhere. What are the PC state residencies shown by > powertop or the kernel's tools/power/x86/turbostat when DMC is loaded and > not? > > What's the effect of limiting max_cstates to 0 (and having DMC loaded)? > > An other problem could be that the GPU is trying to access the display, > (maybe checking scan line counts or something?). > > Does /sys/kernel/debug/dri/0/i915_dmc_info show any transitions during the > test when DMC is loaded? Also the actual RC6 residency reported by powertop/turbostat would be interesting. (even though it's disabled) (In reply to Imre Deak from comment #7) > (In reply to Tvrtko Ursulin from comment #6) > > I tried not loading the DMC firmware and can confirm that the issue is not > > present in that case. > > > > Also, it is possible to reproduce this in the default kernel config (no > > pinning is required) simply with igt/benchmarks/gem_latency -n 0 in which > > case the perf difference between the two setups was ~8x in my testing. > > One possibility is that DC6 enables deeper system level power states and > this causes latency elsewhere. What are the PC state residencies shown by > powertop or the kernel's tools/power/x86/turbostat when DMC is loaded and > not? 1. With DMC, idle system, no displays: PKG is in PC2, CPU is in C7, GPU is in RC6. When looking in i915_dmc_info I can see that the "DC3 - > DC5" transition counter increases exactly by one each second. "DC5 -> DC6 counter is zero". If I now run gem_latency -n 0: "DC3 -> DC5" counter starts increasing by ~2k per second. PKG is not any deeper states now. CPU split between C2/C3/C6/C7 is approx. 42/2/10/40%. GPU is 0% RC6. Benchmark goes slow. 2. Now I force turn on a display (echo on | tee /sys/class/drm/card0-HDMI-A-1/status). "DC3 -> DC5" transition counter stops increasing. PKG is still in PC2, CPU in C7 and GPU in RC6. Benchmark is not normal speed and while it is running PKG is not in any low power states, RC6 is 0% and CPU C2/C3/C6/C7 is approx 52/0/0/25%. 3. DMC not loaded, idle system, no displays PKG is now in PC7 (not PC2 as above!), CPU is C7, GPU is RC6. gem_latency is now normal speed with power states like above. Out of curiosity I tried forcing the display on in this config. That makes the PKG go to ~3% PC2, rest in PC7. Turning it off again brings it back to <0.5% PC2 and the rest in PC7. > What's the effect of limiting max_cstates to 0 (and having DMC loaded)? No effect on benchmark speed or reported "DC3 -> DC5" transitions. > An other problem could be that the GPU is trying to access the display, > (maybe checking scan line counts or something?). You mean something behind the covers or explicitly by i915? > Does /sys/kernel/debug/dri/0/i915_dmc_info show any transitions during the > test when DMC is loaded? Yes, see above. :) (In reply to Tvrtko Ursulin from comment #9) > (In reply to Imre Deak from comment #7) > > (In reply to Tvrtko Ursulin from comment #6) > > > I tried not loading the DMC firmware and can confirm that the issue is not > > > present in that case. > > > > > > Also, it is possible to reproduce this in the default kernel config (no > > > pinning is required) simply with igt/benchmarks/gem_latency -n 0 in which > > > case the perf difference between the two setups was ~8x in my testing. > > > > One possibility is that DC6 enables deeper system level power states and > > this causes latency elsewhere. What are the PC state residencies shown by > > powertop or the kernel's tools/power/x86/turbostat when DMC is loaded and > > not? > > 1. With DMC, idle system, no displays: > > PKG is in PC2, CPU is in C7, GPU is in RC6. > > When looking in i915_dmc_info I can see that the "DC3 - > DC5" transition > counter increases exactly by one each second. "DC5 -> DC6 counter is zero". > > If I now run gem_latency -n 0: > > "DC3 -> DC5" counter starts increasing by ~2k per second. > > PKG is not any deeper states now. "not in" ! > CPU split between C2/C3/C6/C7 is approx. 42/2/10/40%. > GPU is 0% RC6. > > Benchmark goes slow. > > 2. Now I force turn on a display (echo on | > tee /sys/class/drm/card0-HDMI-A-1/status). > > "DC3 -> DC5" transition counter stops increasing. > > PKG is still in PC2, CPU in C7 and GPU in RC6. > > Benchmark is not normal speed and while it is running PKG is not in any low s/not/now/ :( So it is normal speed now! > power states, RC6 is 0% and CPU C2/C3/C6/C7 is approx 52/0/0/25%. > > 3. DMC not loaded, idle system, no displays > > PKG is now in PC7 (not PC2 as above!), CPU is C7, GPU is RC6. > > gem_latency is now normal speed with power states like above. > > Out of curiosity I tried forcing the display on in this config. That makes > the PKG go to ~3% PC2, rest in PC7. Turning it off again brings it back to > <0.5% PC2 and the rest in PC7. > > > What's the effect of limiting max_cstates to 0 (and having DMC loaded)? > > No effect on benchmark speed or reported "DC3 -> DC5" transitions. > > > An other problem could be that the GPU is trying to access the display, > > (maybe checking scan line counts or something?). > > You mean something behind the covers or explicitly by i915? > > > Does /sys/kernel/debug/dri/0/i915_dmc_info show any transitions during the > > test when DMC is loaded? > > Yes, see above. :) (In reply to Tvrtko Ursulin from comment #9) > (In reply to Imre Deak from comment #7) > > (In reply to Tvrtko Ursulin from comment #6) > > > I tried not loading the DMC firmware and can confirm that the issue is not > > > present in that case. > > > > > > Also, it is possible to reproduce this in the default kernel config (no > > > pinning is required) simply with igt/benchmarks/gem_latency -n 0 in which > > > case the perf difference between the two setups was ~8x in my testing. > > > > One possibility is that DC6 enables deeper system level power states and > > this causes latency elsewhere. What are the PC state residencies shown by > > powertop or the kernel's tools/power/x86/turbostat when DMC is loaded and > > not? > > 1. With DMC, idle system, no displays: > > PKG is in PC2, PC2 vs. PC7 without DMC is weird, no idea for the reason. Normally you should reach PC8+ with display off, but for that you'd also need to enable power saving for other devices too. > CPU is in C7, GPU is in RC6. Was this also by booting with 'intel_idle.max_cstate=1 i915.enable_rc6=0'? Those should prevent C7 and RC6.. Dmitry saw the problem even with these settings, but would be good to double check on your side too, since RC6 would be the most logical root cause. Did you check the CPU cstate also when you ran with max_cstate=0? > When looking in i915_dmc_info I can see that the "DC3 - > DC5" transition > counter increases exactly by one each second. "DC5 -> DC6 counter is zero". Err, forgot to say that reading that file itself increases the counter (if DC states are enabled, so display is off):/ So you should sample only at the beginning and end of the test and deduct the increment caused by the sampling. > > If I now run gem_latency -n 0: > > "DC3 -> DC5" counter starts increasing by ~2k per second. Same here as above, in case you now sampled with higher freq. > > PKG is not any deeper states now. > CPU split between C2/C3/C6/C7 is approx. 42/2/10/40%. > GPU is 0% RC6. > > Benchmark goes slow. > > 2. Now I force turn on a display (echo on | > tee /sys/class/drm/card0-HDMI-A-1/status). > > "DC3 -> DC5" transition counter stops increasing. Right, display-on keeps it in DC0. > > PKG is still in PC2, CPU in C7 and GPU in RC6. > > Benchmark is not normal speed and while it is running PKG is not in any low > power states, RC6 is 0% and CPU C2/C3/C6/C7 is approx 52/0/0/25%. Hm, so now we are constantly in DC0 and so DMC should be completely inactive (it only ever activates when either entering DC5 or DC6). Yet there is a slow-down, seemingly caused by it. > > 3. DMC not loaded, idle system, no displays > > PKG is now in PC7 (not PC2 as above!), CPU is C7, GPU is RC6. > > gem_latency is now normal speed with power states like above. > > Out of curiosity I tried forcing the display on in this config. That makes > the PKG go to ~3% PC2, rest in PC7. Turning it off again brings it back to > <0.5% PC2 and the rest in PC7. > > > What's the effect of limiting max_cstates to 0 (and having DMC loaded)? > > No effect on benchmark speed or reported "DC3 -> DC5" transitions. As above, did you double check if the cstate limit is really in effect? > > > An other problem could be that the GPU is trying to access the display, > > (maybe checking scan line counts or something?). > > You mean something behind the covers or explicitly by i915? It was just a wild guess, not sure at all if it's possible. The kernel shouldn't do anything while the display is off, unless you have runtime PM enabled (if /sys/bus/pci/devices/0000\:00\:02.0/power/control contains 'auto') Ville said that X does the scan line readout when rendering to the front buffer, but that shouldn't be the case here. Yea, could be still something under the hood by the HW itself, DC transitions would be an indication for that. > > > Does /sys/kernel/debug/dri/0/i915_dmc_info show any transitions during the > > test when DMC is loaded? > > Yes, see above. :) So no good idea still. One other thing to try would be to limit the package state to PC2 in BIOS if there is an option for that and boot with DMC; would show if somehow the PC7 vs. PC2 difference itself would be the cause. (In reply to Imre Deak from comment #11) > Ville said that X does the scan line readout when rendering to the > front buffer, but that shouldn't be the case here. Only if root and gen <= 8 && !(vlv || chv), fwiw. (In reply to Chris Wilson from comment #12) > (In reply to Imre Deak from comment #11) > > Ville said that X does the scan line readout when rendering to the > > front buffer, but that shouldn't be the case here. > > Only if root and gen <= 8 && !(vlv || chv), fwiw. Ah ok, so that's ruled out then. (In reply to Imre Deak from comment #11) > (In reply to Tvrtko Ursulin from comment #9) > > (In reply to Imre Deak from comment #7) > > > (In reply to Tvrtko Ursulin from comment #6) > > > > I tried not loading the DMC firmware and can confirm that the issue is not > > > > present in that case. > > > > > > > > Also, it is possible to reproduce this in the default kernel config (no > > > > pinning is required) simply with igt/benchmarks/gem_latency -n 0 in which > > > > case the perf difference between the two setups was ~8x in my testing. > > > > > > One possibility is that DC6 enables deeper system level power states and > > > this causes latency elsewhere. What are the PC state residencies shown by > > > powertop or the kernel's tools/power/x86/turbostat when DMC is loaded and > > > not? > > > > 1. With DMC, idle system, no displays: > > > > PKG is in PC2, > > PC2 vs. PC7 without DMC is weird, no idea for the reason. Normally you > should reach PC8+ with display off, but for that you'd also need to enable > power saving for other devices too. > > > CPU is in C7, GPU is in RC6. > > Was this also by booting with 'intel_idle.max_cstate=1 i915.enable_rc6=0'? > Those should prevent C7 and RC6.. Dmitry saw the problem even with these > settings, but would be good to double check on your side too, since RC6 I did not bother running with disabled rc6 since that does not seem to have any effect to all this. > would be the most logical root cause. Did you check the CPU cstate also when > you ran with max_cstate=0? Yeah I did check, PKG and CPU were both in top states then. > > When looking in i915_dmc_info I can see that the "DC3 - > DC5" transition > > counter increases exactly by one each second. "DC5 -> DC6 counter is zero". > > Err, forgot to say that reading that file itself increases the counter (if > DC states are enabled, so display is off):/ So you should sample only at the > beginning and end of the test and deduct the increment caused by the > sampling. > > > > > If I now run gem_latency -n 0: > > > > "DC3 -> DC5" counter starts increasing by ~2k per second. > > Same here as above, in case you now sampled with higher freq. I was sampling once per second so the ~2k per second increase still sounds valid. > > PKG is not any deeper states now. > > CPU split between C2/C3/C6/C7 is approx. 42/2/10/40%. > > GPU is 0% RC6. > > > > Benchmark goes slow. > > > > 2. Now I force turn on a display (echo on | > > tee /sys/class/drm/card0-HDMI-A-1/status). > > > > "DC3 -> DC5" transition counter stops increasing. > > Right, display-on keeps it in DC0. > > > > > PKG is still in PC2, CPU in C7 and GPU in RC6. > > > > Benchmark is not normal speed and while it is running PKG is not in any low > > power states, RC6 is 0% and CPU C2/C3/C6/C7 is approx 52/0/0/25%. > > Hm, so now we are constantly in DC0 and so DMC should be completely inactive > (it only ever activates when either entering DC5 or DC6). Yet there is a > slow-down, seemingly caused by it. > > > > > 3. DMC not loaded, idle system, no displays > > > > PKG is now in PC7 (not PC2 as above!), CPU is C7, GPU is RC6. > > > > gem_latency is now normal speed with power states like above. > > > > Out of curiosity I tried forcing the display on in this config. That makes > > the PKG go to ~3% PC2, rest in PC7. Turning it off again brings it back to > > <0.5% PC2 and the rest in PC7. > > > > > What's the effect of limiting max_cstates to 0 (and having DMC loaded)? > > > > No effect on benchmark speed or reported "DC3 -> DC5" transitions. > > As above, did you double check if the cstate limit is really in effect? Yep. > > > An other problem could be that the GPU is trying to access the display, > > > (maybe checking scan line counts or something?). > > > > You mean something behind the covers or explicitly by i915? > > It was just a wild guess, not sure at all if it's possible. The kernel > shouldn't do anything while the display is off, unless you have runtime PM > enabled (if /sys/bus/pci/devices/0000\:00\:02.0/power/control contains > 'auto') Ville said that X does the scan line readout when rendering to the > front buffer, but that shouldn't be the case here. Yea, could be still > something under the hood by the HW itself, DC transitions would be an > indication for that. I got 'on' in /sys/bus/pci/devices/0000\:00\:02.0/power/control. And no X running or anything. Just fbcon but no displays connected. Should I try without fbcon perhaps? > > > Does /sys/kernel/debug/dri/0/i915_dmc_info show any transitions during the > > > test when DMC is loaded? > > > > Yes, see above. :) > > So no good idea still. One other thing to try would be to limit the package > state to PC2 in BIOS if there is an option for that and boot with DMC; would > show if somehow the PC7 vs. PC2 difference itself would be the cause. Will try. More datapoints but no idea if they will provide any clues.. I tried compiling out fbcon and it does change things: 1. DMC loaded, idle system, no display, fbcon: PKG is in PC2 2. DMC loaded, idle system, no display, no fbcon: PKG is in PC7 Continuing with no fbcon and trying the HDMI force on - this now does not restore the performance to normal. And the i915_dmc_info "DC3 -> DC5" counter is still increasing rapidly as in the case with DMC, fbcon and no displays. Without DMC and no fbcon performance is good. (PKG PC7) i915.disable_power_wells=0 also fixes the performance. Workaround: diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 33fb11cc5acc..b5c262f629f7 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3203,6 +3203,10 @@ i915_gem_idle_work_handler(struct work_struct *work) if (INTEL_GEN(dev_priv) >= 6) gen6_rps_idle(dev_priv); + + if (IS_SKYLAKE(dev_priv) && dev_priv->csr.dmc_payload) + intel_display_power_put(dev_priv, POWER_DOMAIN_MODESET); + intel_runtime_pm_put(dev_priv); out_unlock: mutex_unlock(&dev->struct_mutex); diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c index 313cdff7c6dd..d5ea4ce47306 100644 --- a/drivers/gpu/drm/i915/i915_gem_request.c +++ b/drivers/gpu/drm/i915/i915_gem_request.c @@ -838,6 +838,9 @@ static void i915_gem_mark_busy(const struct intel_engine_cs *engine) if (INTEL_GEN(dev_priv) >= 6) gen6_rps_busy(dev_priv); + if (IS_SKYLAKE(dev_priv) && dev_priv->csr.dmc_payload) + intel_display_power_get(dev_priv, POWER_DOMAIN_MODESET); + queue_delayed_work(dev_priv->wq, &dev_priv->gt.retire_work, round_jiffies_up_relative(HZ)); And a first draft of an IGT: https://patchwork.freedesktop.org/patch/150314/ I looks to me that the workaround: echo on | tee /sys/class/drm/card0-HDMI-A-1/status works only for HDMI-A-1. I have tried to fake DP-1 and performance did not go up. And in automated validation on our side someone faked HDMI-A-2 and performance also did not go up. Maybe this will give some clue... Some details and further tests to be tried are: Though by avoiding DMC fw load, issue is going away issue is not lilted just to DMC fw. (proved by other tests results) Real way to address this is to gracefully handle power infra change for a headless system. “i915.disable_power_wells=0” fixes performance issue on DP too? (In reply to Sunil Kamath from comment #20) > Some details and further tests to be tried are: > > Though by avoiding DMC fw load, issue is going away issue is not lilted just > to DMC fw. (proved by other tests results) Which results are you referring to? So far I haven't been able to reproduce the performance regression without the DMC loaded. But yeah, it is possible it is an driver - firmware interaction of some sorts. > Real way to address this is to gracefully handle power infra change for a > headless system. > > “i915.disable_power_wells=0” fixes performance issue on DP too? It fixes the issue in headless mode and headless mode cannot have DP connected so not sure what you mean by this? What Dmitry observed in #19 is that forcing the DP connector on hasn't got the same workaround effect as forcing the HDMI on does. As far as I could see, there is some difference in the display code paths, where forcing the DP on does not actually turn on the power well(s). Contrary to forcing the HDMI connector to on which does this. There is also a difference in system behaviour depending on presence or absence of fbcon, but this is just a secondary effect of modeset happening or not when the connector is forced. >> Some details and further tests to be tried are: >> >> Though by avoiding DMC fw load, issue is going away issue is not lilted just >> to DMC fw. (proved by other tests results) > >Which results are you referring to? So far I haven't been able to reproduce the performance regression without the DMC loaded. But yeah, it is possible it >is an driver - firmware interaction of some sorts. I am referring to results with “i915.disable_power_wells=0”. > >> Real way to address this is to gracefully handle power infra change for a >> headless system. >> >> “i915.disable_power_wells=0” fixes performance issue on DP too? > >It fixes the issue in headless mode and headless mode cannot have DP connected so not sure what you mean by this? Here im referring to comment #19. If the same scenario is tested with i915.disable_power_wells=0. But I got further clarity in below comments from you. > >What Dmitry observed in #19 is that forcing the DP connector on hasn't got the same workaround effect as forcing the HDMI on does. As far as I could see, >there is some difference in the display code paths, where forcing the DP on does not actually turn on the power well(s). Contrary to forcing the HDMI >connector to on which does this. There is also a difference in system behaviour depending on presence or absence of fbcon, but this is just a secondary >effect of modeset happening or not when the connector is forced. This is the area where I was seeking efforts from Imre – to clarify further on power infrastructure handling for headless-system. WA from comment 17 works for me performance-wise: performance restored to the expected level. Summarizing various experiments: - Animesh/Tvrtko tried various experiments to route cause the problem. - As discussed/mentioned before major issue seems like: 1. Handling power infrastructure for headless system. 2. How to make headless system really headless in real way. For 2nd test, below option was tried: i915.disable_display=1 and with this issue goes away. In addition to the above various experiments were done. Imre to confirm back if already done experiments are sufficient to get answer to both 1 and 2. One of the experiments Animesh asked me to do was to look at the state of the DC_STATE_EN register at runtime. It looked that it had reverted to a value other than what was programmed by i915. On top of that, it looked impossible to manually modify it at runtime. The value it is reverting to was DC_STATE_EN_UPTO_DC6, regardless of whether the driver has programmed DC_STATE_EN_UPTO_DC5 or DC_STATE_EN_UPTO_DC5 | DC_STATE_EN_UPTO_DC6. This coupled with a comment in gen9_write_dc_state makes me suspicious whether DMC is not doing things behind the drivers back? (In reply to Tvrtko Ursulin from comment #25) > One of the experiments Animesh asked me to do was to look at the state of > the DC_STATE_EN register at runtime. > > It looked that it had reverted to a value other than what was programmed by > i915. On top of that, it looked impossible to manually modify it at runtime. > > The value it is reverting to was DC_STATE_EN_UPTO_DC6, regardless of whether > the driver has programmed DC_STATE_EN_UPTO_DC5 or DC_STATE_EN_UPTO_DC5 | > DC_STATE_EN_UPTO_DC6. > > This coupled with a comment in gen9_write_dc_state makes me suspicious > whether DMC is not doing things behind the drivers back? We don't use DC5 on SKL normally only DC6. Did you boot with i915.enable_dc set to something non-default? (In reply to Imre Deak from comment #26) > (In reply to Tvrtko Ursulin from comment #25) > > One of the experiments Animesh asked me to do was to look at the state of > > the DC_STATE_EN register at runtime. > > > > It looked that it had reverted to a value other than what was programmed by > > i915. On top of that, it looked impossible to manually modify it at runtime. > > > > The value it is reverting to was DC_STATE_EN_UPTO_DC6, regardless of whether > > the driver has programmed DC_STATE_EN_UPTO_DC5 or DC_STATE_EN_UPTO_DC5 | > > DC_STATE_EN_UPTO_DC6. > > > > This coupled with a comment in gen9_write_dc_state makes me suspicious > > whether DMC is not doing things behind the drivers back? > > We don't use DC5 on SKL normally only DC6. Did you boot with i915.enable_dc > set to something non-default? Not via the param but on Animesh'es suggestion I had the gen9_write_dc_state modified to only ever program DC_STATE_EN_UPTO_DC5, if non zero state was passed in. Even after that the read back from DC_STATE_EN at runtime was DC_STATE_EN_UPTO_DC6. But during programming the read-back was getting the programmed value. So I guess some time after the initial programming it gets modified by someone. And I couldn't find any other place in i915 which would do it. Which is why I thought it could be DMC. (In reply to Tvrtko Ursulin from comment #27) > (In reply to Imre Deak from comment #26) > > (In reply to Tvrtko Ursulin from comment #25) > > > One of the experiments Animesh asked me to do was to look at the state of > > > the DC_STATE_EN register at runtime. > > > > > > It looked that it had reverted to a value other than what was programmed by > > > i915. On top of that, it looked impossible to manually modify it at runtime. > > > > > > The value it is reverting to was DC_STATE_EN_UPTO_DC6, regardless of whether > > > the driver has programmed DC_STATE_EN_UPTO_DC5 or DC_STATE_EN_UPTO_DC5 | > > > DC_STATE_EN_UPTO_DC6. > > > > > > This coupled with a comment in gen9_write_dc_state makes me suspicious > > > whether DMC is not doing things behind the drivers back? > > > > We don't use DC5 on SKL normally only DC6. Did you boot with i915.enable_dc > > set to something non-default? > > Not via the param but on Animesh'es suggestion I had the gen9_write_dc_state > modified to only ever program DC_STATE_EN_UPTO_DC5, if non zero state was > passed in. Even after that the read back from DC_STATE_EN at runtime was > DC_STATE_EN_UPTO_DC6. But during programming the read-back was getting the > programmed value. So I guess some time after the initial programming it gets > modified by someone. And I couldn't find any other place in i915 which would > do it. Which is why I thought it could be DMC. Yes, as I understand there is a firmware bug that prevents using DC5 on SKL. After exiting a low-power DC state it will always "restore" DC_STATE_EN_UPTO_DC6 to DC_STATE_EN regardless of what was programmed there. quick query: when we do not have any display connected, isn't it expected that display goes to lowest possible power state? that's DC6? (In reply to Sunil Kamath from comment #29) > quick query: > when we do not have any display connected, isn't it expected that display > goes to lowest possible power state? that's DC6? If all the conditions allow this to happen. That is DC6 is allowed in DC_STATE_EN, PW2 is disabled (and all display outputs are disabled). Note that DMC will signal an actual DC6 transition (via its DC6 transition debug counter) only if other peripherals on the system also allow this. That is you need to do powertop --auto-tune, and make sure no other devices would block deeper PC states (it's PC9+ on SKL AFAIR). Examples for these are SSD without ALPM enabled, active network link, USB device. We also seem to keep DPLL0 enabled at all times since according to Ville if we turn it off, DMC turns it back on again. It is possible DPLL0 being on prevents DC6? (In reply to Tvrtko Ursulin from comment #31) > We also seem to keep DPLL0 enabled at all times since according to Ville if > we turn it off, DMC turns it back on again. It is possible DPLL0 being on > prevents DC6? No, DPLL0 being on all the time from the driver's POV is normal. DMC will turn it off when transitioning to DC5/6 and turn it back on when exiting these states. It seems that the number of DC state transitions done by the DMC (0x80030) is correlated to the activity on the GT IIR. Initially I was trying to correlate with the number of GT interrupts, which had a correlation, but also strangely the number of DC transitions could be higher than the number of interrupts (up to exactly double!). But if we consider that one GT interrupt can have multiple GT IIR accesses (one write from GT, one write from the CPU to clear it), then it starts making more sense. And in fact nosing around the DMC fw I can spot MMIO addresses of the GT IIR registers (among other things). So the question I think is - why is DMC fw looking at GT IIR registers, and even if it has to for some reason, why it is triggering DC state transitions in case of pure GT command submission with no display activity whatsoever? Is there any update in this case? If so, could you share the information. Thank you. (In reply to Elizabeth from comment #34) > Is there any update in this case? If so, could you share the information. > Thank you. This is currently blocked on a DMC register context corruption issue during DC state transitions, which is tracked in the internal bugtracker. (In reply to Imre Deak from comment #35) > (In reply to Elizabeth from comment #34) > > Is there any update in this case? If so, could you share the information. > > Thank you. > > This is currently blocked on a DMC register context corruption issue during > DC state transitions, which is tracked in the internal bugtracker. Changing to REOPEN. Thanks. *** Bug 102589 has been marked as a duplicate of this bug. *** *** Bug 102563 has been marked as a duplicate of this bug. *** See also https://patchwork.freedesktop.org/series/30196/ for the same patch after seeing CI hit this problem on bxt. Chris, Do you know which other platforms are affected? For now we are aware of SKL and BXT, what about KBL and CFL? And another question, the patch we consider to merge, it disables DMC entirely or it is doing something else? why we do not want to disable it fully till DMC FW will not be fixed? Honestly, the presumption is that the dmc must be good for something. Completely blacklisting the firmware until fixed is one of the options. Just will anyone notice if we do use the nuclear option? This issue is reproducible also on KBL. On CFL this issue did not occur. Correcting last comment. Also Happends on CFL. Now SKL DMC 1.27 merged so we need patch on top of it? Yes, and we have this patch in the mailing list already: https://patchwork.freedesktop.org/series/24017/#rev6. This one fixes the perf. issue for all platforms where we are aware of it (previous revision excluded SKL). commit b68763741aa29f2541c7ca58bcb0c2bb6cb5f449 Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com> AuthorDate: Tue Dec 5 13:28:54 2017 +0000 Commit: Imre Deak <imre.deak@intel.com> CommitDate: Fri Dec 8 12:23:07 2017 +0200 drm/i915: Restore GT performance in headless mode with DMC loaded It seems that the DMC likes to transition between the DC states a lot when there are no connected displays (no active power domains) during command submission. This activity on DC states has a negative impact on the performance of the chip with huge latencies observed in the interrupt handlers and elsewhere. Simple tests like igt/gem_latency -n 0 are slowed down by a factor of eight. Work around it by introducing a new power domain named, POWER_DOMAIN_GT_IRQ, associtated with the "DC off" power well, which is held for the duration of command submission activity. CNL has the same problem which will be addressed as a follow-up. Doing that requires a fix for a DC6 context corruption problem in the CNL DMC firmware which is yet to be released. v2: * Add commit text as comment in i915_gem_mark_busy. (Chris Wilson) * Protect macro body with braces. (Jani Nikula) v3: * Add dedicated power domain for clarity. (Chris, Imre) * Commit message and comment text updates. * Apply to all big-core GEN9 parts apart for Skylake which is pending DMC firmware release. v4: * Power domain should be inner to device runtime pm. (Chris) * Simplify NEEDS_CSR_GT_PERF_WA macro. (Chris) * Handle async DMC loading by moving the GT_IRQ power domain logic into intel_runtime_pm. (Daniel, Chris) * Include small core GEN9 as well. (Imre) v5 * Special handling for async DMC load is not needed since on failure the power domain reference is kept permanently taken. (Imre) v6: * Drop the NEEDS_CSR_GT_PERF_WA macro since all firmwares have now been deployed. (Imre, Chris) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100572 Testcase: igt/gem_exec_nop/headless Cc: Imre Deak <imre.deak@intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> (v2) Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v5) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> [Imre: Add note about applying the WA on CNL as a follow-up] Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171205132854.26380-1-tvrtko.ursulin@linux.intel.com Unless we really want to keep this open for cnl? I suggest to open another one for CNL if we will spot the issue there and link it to this one. There is a need to have the issue fixed for 4.14 LTS kernel. How do you track fixes backport? Should ther be this/separate bug or anything else? Closing, please re-open if still occurs. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.