Summary: | [BYT]igt/gem_concurrent_blit/gtt-gpu-read-after-write-forked causes *ERROR* timed out waiting for Punit | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | lu hua <huax.lu> | ||||||||||||||||
Component: | DRM/Intel | Assignee: | Deepak S <deepak.s> | ||||||||||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||||
Severity: | normal | ||||||||||||||||||
Priority: | low | CC: | chris, intel-gfx-bugs, mark.a.janes, yi.sun | ||||||||||||||||
Version: | unspecified | ||||||||||||||||||
Hardware: | All | ||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||
Attachments: |
|
We dropped these waits from other paths, so it's probably not necessary to query punit status when going back to idle either (and I think it can take up to 50ms or maybe even more before the update occurs). At any rate I don't think this is a serious bug on our side... Presumably this fixes it? diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index a6b877a..ed44b04 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -3084,10 +3084,6 @@ static void vlv_set_rps_idle(struct drm_i915_private *dev vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, dev_priv->rps.min_delay); - if (wait_for(((vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS)) - & GENFREQSTATUS) == 0, 5)) - DRM_ERROR("timed out waiting for Punit\n"); - /* Release the Gfx clock */ I915_WRITE(VLV_GTLC_SURVIVABILITY_REG, I915_READ(VLV_GTLC_SURVIVABILITY_REG) & Daniel, any opinion? This is purely platform specific code, so whatever floats the boat is fine with me ... I'd like to have Deepak's review on the patch though, and some positive test result from Lu, too. Lu, can you please test the patch from Jesse in comment #2 Fixed by the patch. Created attachment 100521 [details] [review] increase punit timeout Can you try this one instead? Created attachment 100700 [details] dmesg (In reply to comment #5) > Created attachment 100521 [details] [review] [review] > increase punit timeout > > Can you try this one instead? Test this patch, It still has this error. output: IGT-Version: 1.6-g18d2130 (x86_64) (Linux: 3.15.0-rc8_kcloud_d26d81_20140609+ x86_64) using 2x512 buffers, each 1MiB Subtest gtt-gpu-read-after-write-forked: SUCCESS # dmesg -r | egrep "<[1-3]>" |grep drm <3>[ 415.785361] [drm:vlv_set_rps_idle] *ERROR* timed out waiting for Punit Should be fixed by: drm/i915: Drop WA to fix Voltage not getting dropped to Vmin when Gfx is power gated. on the list. Just asked Jani or Rodrigo to pick it up. Deepak has a fix for this, but needs to repost with the stepping check. I have submitted new patch. Please try http://lists.freedesktop.org/archives/intel-gfx/2014-June/048095.html Created attachment 101842 [details] dmesg(patch) (In reply to comment #9) > I have submitted new patch. Please try > > http://lists.freedesktop.org/archives/intel-gfx/2014-June/048095.html It still happens with this patch. (In reply to comment #10) > Created attachment 101842 [details] > dmesg(patch) > > (In reply to comment #9) > > I have submitted new patch. Please try > > > > http://lists.freedesktop.org/archives/intel-gfx/2014-June/048095.html > > It still happens with this patch. Deepak, Jesse, new ideas? Hi Jani, I tried this on my local VLV system and I am not seeing the issue. Can you please give me your BIOS info? I will try with your bios once. Then I can talk to HW team and get more info on why Gfx clock is failing Thanks Deepak Can we try with "wait_for_atomic" ? Thanks Deepak (In reply to comment #12) > Hi Jani, > > I tried this on my local VLV system and I am not seeing the issue. > Can you please give me your BIOS info? I will try with your bios once. Then > I can talk to HW team and get more info on why Gfx clock is failing Hua, please fill in the info Deepak asks. BIOS: v93.R25 add "wait_for_atomic" output: IGT-Version: 1.7-g67e29a3 (x86_64) (Linux: 3.16.0-rc2_drm-intel-nightly_a7665f_20140701+ x86_64) using 2x512 buffers, each 1MiB Subtest gtt-gpu-read-after-write-forked: SUCCESS Created attachment 102042 [details]
dmesg(wait_for_atomic)
Can you try the same test case with RC6 disabled? Created attachment 102414 [details] dmesg(disable RC6) (In reply to comment #17) > Can you try the same test case with RC6 disabled? Disable RC6, this error goes away. It reports error "<3>[ 218.281683] [drm:vlv_force_gfx_clock] *ERROR* timeout waiting for GFX clock force-off (00000008)" Since disabling RC6 is helping us. Shall we try doing a forcewake before request the freq? If this does not help we might need help from HW team. diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 0115689..1fea122 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -3254,18 +3254,18 @@ static void vlv_set_rps_idle(struct drm_i915_private *dev_priv) /* Mask turbo interrupt so that they will not come in between */ I915_WRITE(GEN6_PMINTRMSK, 0xffffffff); - vlv_force_gfx_clock(dev_priv, true); - dev_priv->rps.cur_freq = dev_priv->rps.min_freq_softlimit; + gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL); + vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, dev_priv->rps.min_freq_softlimit); if (wait_for(((vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS)) - & GENFREQSTATUS) == 0, 5)) + & GENFREQSTATUS) == 0, 10)) DRM_ERROR("timed out waiting for Punit\n"); - vlv_force_gfx_clock(dev_priv, false); + gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL); I915_WRITE(GEN6_PMINTRMSK, gen6_rps_pm_mask(dev_priv, dev_priv->rps.cur_freq)); Created attachment 102457 [details] dmesg > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c > b/drivers/gpu/drm/i915/intel_pm.c > index 0115689..1fea122 100644 > --- a/drivers/gpu/drm/i915/intel_pm.c > +++ b/drivers/gpu/drm/i915/intel_pm.c > @@ -3254,18 +3254,18 @@ static void vlv_set_rps_idle(struct drm_i915_private > *dev_priv) > /* Mask turbo interrupt so that they will not come in between */ > I915_WRITE(GEN6_PMINTRMSK, 0xffffffff); > > - vlv_force_gfx_clock(dev_priv, true); > - > dev_priv->rps.cur_freq = dev_priv->rps.min_freq_softlimit; > > + gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL); > + > vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ, > dev_priv->rps.min_freq_softlimit); > > if (wait_for(((vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS)) > - & GENFREQSTATUS) == 0, 5)) > + & GENFREQSTATUS) == 0, 10)) > DRM_ERROR("timed out waiting for Punit\n"); > > - vlv_force_gfx_clock(dev_priv, false); > + gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL); > > I915_WRITE(GEN6_PMINTRMSK, > gen6_rps_pm_mask(dev_priv, dev_priv->rps.cur_freq)); Test this patch, it still exists. Thanks Lu, Let me try to reproduce on my system again This bug still able to reproduce on latest -nightly(186631131a9289dad22f51315d78b9b6ac5b425f) root@x-byt06:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# ./gem_concurrent_blit --run-subtest gtt-gpu-read-after-write-forked IGT-Version: 1.7-g5c7bcb1 (x86_64) (Linux: 3.16.0_drm-intel-nightly_186631_20140818+ x86_64) using 2x512 buffers, each 1MiB Subtest gtt-gpu-read-after-write-forked: SUCCESS root@x-byt06:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# dmesg -r|egrep "<[1-4]>"|grep drm <3>[10491.334009] [drm:vlv_set_rps_idle] *ERROR* timed out waiting for Punit Have tried with latest BIOS? Please share the BIOS your using. Last time when i tried on my system. I did not see the issue :( (In reply to comment #23) > Have tried with latest BIOS? Please share the BIOS your using. Indeed, the last time I saw those messages was on a VLV A0. Is this reproducible on a current BYT system with recent BIOS? Test on latest -nightly kernel.It works well. Verified. *** Bug 81101 has been marked as a duplicate of this bug. *** this bug reported here: https://bugzilla.kernel.org/show_bug.cgi?id=92421 is related ? Thanks. I am seeing "*ERROR* timed out waiting for Punit" on production Baytrail systems, and it seems related to an eventual system hang. Environment: - Debian sid - 3.18 kernel - Lenovo 500s - mesa, drm, piglit, and waffle from master Running the piglit "quick" test suite will generate hundreds of these messages. The machine hangs after one of these messages, every day or so. Two separate systems display the same behavior. Systems are available in JF1 if needed to reproduce the message. Did we try disabling the wait for punit? can you try below patch diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index f7c9938..c40296e 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -3841,7 +3841,7 @@ static void vlv_set_rps_idle(struct drm_i915_private *dev_priv) struct drm_device *dev = dev_priv->dev; /* CHV and latest VLV don't need to force the gfx clock */ - if (IS_CHERRYVIEW(dev) || dev->pdev->revision >= 0xd) { + if (IS_CHERRYVIEW(dev) || IS_VALLEYVIEW(dev)) { valleyview_set_rps(dev_priv->dev, dev_priv->rps.min_freq_softlimit); return; } with the patch disabling wait for punit, the messages are no longer produced. I'll let the system run and report if any hangs occur. Unfortunately, one of my bay trail systems hung in the same manner as before, when it was under load. There are no PUnit messages in syslog, or any other message that looks out of the ordinary. I think that this information indicates that the PUnit messages are unrelated to the system hangs that I have been experiencing. Sounds like you can push your patch Deepak, and make the gfx clock force apply to all VLV. Sure Jesse. I will push the patch http://lists.freedesktop.org/archives/intel-gfx/2015-March/061322.html Submitted patch for review. Fixed in commit a7f6e231150c93c4e15f258f0d4b1ffe97da3971 Author: Deepak S <deepak.s@linux.intel.com> Date: Sat May 9 18:04:44 2015 +0530 drm/i915/vlv: Remove wait for for punit to updates freq. in -nightly. Mark Janes, please open a new bug report if you are still experiencing system hangs, since it seems unrelated to the Punit messages. Verified. Closing old verified. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 94408 [details] dmesg System Environment: -------------------------- Platform: Baytrail Kernel(drm-intel-nightly)1be8f2b4dd6d3db00af24d4891c82d2650bd282d Bug detailed description: ------------------------- run ./gem_concurrent_blit --run-subtest gtt-gpu-read-after-write-forked, reports <3>[ 570.041279] [drm:vlv_set_rps_idle] *ERROR* timed out waiting for Punit. It happens on Baytrail with -nightly and -queued kernel. igt/gem_concurrent_blit/gtt-overwrite-source-forked also has this issue. The latest known good commit: 87adfb03493141c7c61df06440d07f6c6c9fd24c The latest known bad commit: 4c0e552882114d1edb588242d45035246ab078a0 output: IGT-Version: 1.5-g06189c6 (x86_64) (Linux: 3.13.0_drm-intel-next-queued_4c0e55_2 0140219+ x86_64) using 2x512 buffers, each 1MiB Subtest gtt-gpu-read-after-write-forked: SUCCESS # dmesg -r | egrep "<[1-3]>" |grep drm <3>[ 570.041279] [drm:vlv_set_rps_idle] *ERROR* timed out waiting for Punit <3>[ 821.361236] [drm:vlv_set_rps_idle] *ERROR* timed out waiting for Punit Reproduce steps: ------------------------- 1. ./gem_concurrent_blit --run-subtest gtt-gpu-read-after-write-forked