Bug 75244 - [BYT]igt/gem_concurrent_blit/gtt-gpu-read-after-write-forked causes *ERROR* timed out waiting for Punit
Summary: [BYT]igt/gem_concurrent_blit/gtt-gpu-read-after-write-forked causes *ERROR* t...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: All Linux (All)
: low normal
Assignee: Deepak S
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 81101 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-02-20 06:04 UTC by lu hua
Modified: 2017-10-06 14:39 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (112.87 KB, text/plain)
2014-02-20 06:04 UTC, lu hua
no flags Details
increase punit timeout (535 bytes, patch)
2014-06-06 15:05 UTC, Jesse Barnes
no flags Details | Splinter Review
dmesg (93.50 KB, text/plain)
2014-06-09 06:36 UTC, lu hua
no flags Details
dmesg(patch) (102.63 KB, text/plain)
2014-06-27 08:08 UTC, lu hua
no flags Details
dmesg(wait_for_atomic) (89.74 KB, text/plain)
2014-07-01 03:20 UTC, lu hua
no flags Details
dmesg(disable RC6) (125.03 KB, text/plain)
2014-07-08 07:36 UTC, lu hua
no flags Details
dmesg (94.72 KB, text/plain)
2014-07-09 06:07 UTC, lu hua
no flags Details

Description lu hua 2014-02-20 06:04:52 UTC
Created attachment 94408 [details]
dmesg

System Environment:
--------------------------
Platform: Baytrail
Kernel(drm-intel-nightly)1be8f2b4dd6d3db00af24d4891c82d2650bd282d

Bug detailed description:
------------------------- 
run ./gem_concurrent_blit --run-subtest gtt-gpu-read-after-write-forked, reports 
<3>[  570.041279] [drm:vlv_set_rps_idle] *ERROR* timed out waiting for Punit.
It happens on Baytrail with -nightly and -queued kernel.
igt/gem_concurrent_blit/gtt-overwrite-source-forked also has this issue.

The latest known good commit: 87adfb03493141c7c61df06440d07f6c6c9fd24c
The latest known bad commit: 4c0e552882114d1edb588242d45035246ab078a0

output:                                                                                             
IGT-Version: 1.5-g06189c6 (x86_64) (Linux: 3.13.0_drm-intel-next-queued_4c0e55_2                                                                                                 0140219+ x86_64)
using 2x512 buffers, each 1MiB
Subtest gtt-gpu-read-after-write-forked: SUCCESS

# dmesg -r | egrep "<[1-3]>" |grep drm
<3>[  570.041279] [drm:vlv_set_rps_idle] *ERROR* timed out waiting for Punit
<3>[  821.361236] [drm:vlv_set_rps_idle] *ERROR* timed out waiting for Punit

Reproduce steps:
-------------------------
1. ./gem_concurrent_blit --run-subtest gtt-gpu-read-after-write-forked
Comment 1 Jesse Barnes 2014-02-25 01:14:05 UTC
We dropped these waits from other paths, so it's probably not necessary to query punit status when going back to idle either (and I think it can take up to 50ms or maybe even more before the update occurs).

At any rate I don't think this is a serious bug on our side...
Comment 2 Jesse Barnes 2014-03-04 01:13:01 UTC
Presumably this fixes it?

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index a6b877a..ed44b04 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3084,10 +3084,6 @@ static void vlv_set_rps_idle(struct drm_i915_private *dev
        vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ,
                                        dev_priv->rps.min_delay);
 
-       if (wait_for(((vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS))
-                               & GENFREQSTATUS) == 0, 5))
-               DRM_ERROR("timed out waiting for Punit\n");
-
        /* Release the Gfx clock */
        I915_WRITE(VLV_GTLC_SURVIVABILITY_REG,
                I915_READ(VLV_GTLC_SURVIVABILITY_REG) &

Daniel, any opinion?
Comment 3 Daniel Vetter 2014-03-04 18:51:27 UTC
This is purely platform specific code, so whatever floats the boat is fine with me ... I'd like to have Deepak's review on the patch though, and some positive test result from Lu, too.

Lu, can you please test the patch from Jesse in comment #2
Comment 4 lu hua 2014-03-05 06:59:40 UTC
Fixed by the patch.
Comment 5 Jesse Barnes 2014-06-06 15:05:10 UTC
Created attachment 100521 [details] [review]
increase punit timeout

Can you try this one instead?
Comment 6 lu hua 2014-06-09 06:36:36 UTC
Created attachment 100700 [details]
dmesg

(In reply to comment #5)
> Created attachment 100521 [details] [review] [review]
> increase punit timeout
> 
> Can you try this one instead?

Test this patch, It still has this error.

output:
IGT-Version: 1.6-g18d2130 (x86_64) (Linux: 3.15.0-rc8_kcloud_d26d81_20140609+ x86_64)
using 2x512 buffers, each 1MiB
Subtest gtt-gpu-read-after-write-forked: SUCCESS

# dmesg -r | egrep "<[1-3]>" |grep drm
<3>[  415.785361] [drm:vlv_set_rps_idle] *ERROR* timed out waiting for Punit
Comment 7 Jesse Barnes 2014-06-25 19:44:19 UTC
Should be fixed by:

drm/i915: Drop WA to fix Voltage not getting dropped to Vmin when Gfx is power 
gated.

on the list.  Just asked Jani or Rodrigo to pick it up.
Comment 8 Jesse Barnes 2014-06-25 19:47:34 UTC
Deepak has a fix for this, but needs to repost with the stepping check.
Comment 9 Deepak S 2014-06-27 06:09:06 UTC
I have submitted new patch. Please try

http://lists.freedesktop.org/archives/intel-gfx/2014-June/048095.html
Comment 10 lu hua 2014-06-27 08:08:36 UTC
Created attachment 101842 [details]
dmesg(patch)

(In reply to comment #9)
> I have submitted new patch. Please try
> 
> http://lists.freedesktop.org/archives/intel-gfx/2014-June/048095.html

It still happens with this patch.
Comment 11 Jani Nikula 2014-06-30 10:20:03 UTC
(In reply to comment #10)
> Created attachment 101842 [details]
> dmesg(patch)
> 
> (In reply to comment #9)
> > I have submitted new patch. Please try
> > 
> > http://lists.freedesktop.org/archives/intel-gfx/2014-June/048095.html
> 
> It still happens with this patch.

Deepak, Jesse, new ideas?
Comment 12 Deepak S 2014-06-30 10:26:03 UTC
Hi Jani,

I tried this on my local VLV system and I am not seeing the issue. 
Can you please give me your BIOS info? I will try with your bios once. Then I can talk to HW team and get more info on why Gfx clock is failing

Thanks
Deepak
Comment 13 Deepak S 2014-06-30 10:29:22 UTC
Can we try with "wait_for_atomic" ?

Thanks
Deepak
Comment 14 Jani Nikula 2014-06-30 10:42:20 UTC
(In reply to comment #12)
> Hi Jani,
> 
> I tried this on my local VLV system and I am not seeing the issue. 
> Can you please give me your BIOS info? I will try with your bios once. Then
> I can talk to HW team and get more info on why Gfx clock is failing

Hua, please fill in the info Deepak asks.
Comment 15 lu hua 2014-07-01 03:19:31 UTC
BIOS: v93.R25

add "wait_for_atomic"
output:
IGT-Version: 1.7-g67e29a3 (x86_64) (Linux: 3.16.0-rc2_drm-intel-nightly_a7665f_20140701+ x86_64)
using 2x512 buffers, each 1MiB
Subtest gtt-gpu-read-after-write-forked: SUCCESS
Comment 16 lu hua 2014-07-01 03:20:04 UTC
Created attachment 102042 [details]
dmesg(wait_for_atomic)
Comment 17 Deepak S 2014-07-08 04:32:30 UTC
Can you try the same test case with RC6 disabled?
Comment 18 lu hua 2014-07-08 07:36:12 UTC
Created attachment 102414 [details]
dmesg(disable RC6)

(In reply to comment #17)
> Can you try the same test case with RC6 disabled?

Disable RC6, this error goes away.
It reports error "<3>[  218.281683] [drm:vlv_force_gfx_clock] *ERROR* timeout waiting for GFX clock force-off (00000008)"
Comment 19 Deepak S 2014-07-09 03:41:07 UTC
Since disabling RC6 is helping us. Shall we try doing a forcewake before request the freq? If this does not help we might need help from HW team. 


diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 0115689..1fea122 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3254,18 +3254,18 @@ static void vlv_set_rps_idle(struct drm_i915_private *dev_priv)
        /* Mask turbo interrupt so that they will not come in between */
        I915_WRITE(GEN6_PMINTRMSK, 0xffffffff);

-       vlv_force_gfx_clock(dev_priv, true);
-
        dev_priv->rps.cur_freq = dev_priv->rps.min_freq_softlimit;

+       gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
+
        vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ,
                                        dev_priv->rps.min_freq_softlimit);

        if (wait_for(((vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS))
-                               & GENFREQSTATUS) == 0, 5))
+                               & GENFREQSTATUS) == 0, 10))
                DRM_ERROR("timed out waiting for Punit\n");

-       vlv_force_gfx_clock(dev_priv, false);
+       gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);

        I915_WRITE(GEN6_PMINTRMSK,
                   gen6_rps_pm_mask(dev_priv, dev_priv->rps.cur_freq));
Comment 20 lu hua 2014-07-09 06:07:58 UTC
Created attachment 102457 [details]
dmesg

> 
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c
> b/drivers/gpu/drm/i915/intel_pm.c
> index 0115689..1fea122 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3254,18 +3254,18 @@ static void vlv_set_rps_idle(struct drm_i915_private
> *dev_priv)
>         /* Mask turbo interrupt so that they will not come in between */
>         I915_WRITE(GEN6_PMINTRMSK, 0xffffffff);
> 
> -       vlv_force_gfx_clock(dev_priv, true);
> -
>         dev_priv->rps.cur_freq = dev_priv->rps.min_freq_softlimit;
> 
> +       gen6_gt_force_wake_get(dev_priv, FORCEWAKE_ALL);
> +
>         vlv_punit_write(dev_priv, PUNIT_REG_GPU_FREQ_REQ,
>                                         dev_priv->rps.min_freq_softlimit);
> 
>         if (wait_for(((vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS))
> -                               & GENFREQSTATUS) == 0, 5))
> +                               & GENFREQSTATUS) == 0, 10))
>                 DRM_ERROR("timed out waiting for Punit\n");
> 
> -       vlv_force_gfx_clock(dev_priv, false);
> +       gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
> 
>         I915_WRITE(GEN6_PMINTRMSK,
>                    gen6_rps_pm_mask(dev_priv, dev_priv->rps.cur_freq));


Test this patch, it still exists.
Comment 21 Deepak S 2014-07-09 06:10:38 UTC
Thanks Lu, Let me try to reproduce on my system again
Comment 22 Guo Jinxian 2014-08-18 05:49:43 UTC
This bug still able to reproduce on latest -nightly(186631131a9289dad22f51315d78b9b6ac5b425f)

root@x-byt06:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# ./gem_concurrent_blit --run-subtest gtt-gpu-read-after-write-forked
IGT-Version: 1.7-g5c7bcb1 (x86_64) (Linux: 3.16.0_drm-intel-nightly_186631_20140818+ x86_64)
using 2x512 buffers, each 1MiB
Subtest gtt-gpu-read-after-write-forked: SUCCESS
root@x-byt06:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# dmesg -r|egrep "<[1-4]>"|grep drm                  <3>[10491.334009] [drm:vlv_set_rps_idle] *ERROR* timed out waiting for Punit
Comment 23 Deepak S 2014-08-18 05:52:27 UTC
Have tried with latest BIOS? Please share the BIOS your using.

Last time when i tried on my system. I did not see the issue :(
Comment 24 Jani Nikula 2014-09-02 13:44:44 UTC
(In reply to comment #23)
> Have tried with latest BIOS? Please share the BIOS your using.

Indeed, the last time I saw those messages was on a VLV A0.

Is this reproducible on a current BYT system with recent BIOS?
Comment 25 lu hua 2014-09-03 08:54:01 UTC
Test on latest -nightly kernel.It works well.
Comment 26 lu hua 2014-09-03 08:54:15 UTC
Verified.
Comment 27 Rodrigo Vivi 2014-10-08 21:21:26 UTC
*** Bug 81101 has been marked as a duplicate of this bug. ***
Comment 28 Zouhair 2015-02-09 08:38:38 UTC
this bug reported here: https://bugzilla.kernel.org/show_bug.cgi?id=92421 is related ?
Thanks.
Comment 29 Mark Janes 2015-02-20 00:26:49 UTC
I am seeing "*ERROR* timed out waiting for Punit" on production Baytrail systems, and it seems related to an eventual system hang.

Environment:
 - Debian sid
 - 3.18 kernel
 - Lenovo 500s
 - mesa, drm, piglit, and waffle from master

Running the piglit "quick" test suite will generate hundreds of these messages.

The machine hangs after one of these messages, every day or so.

Two separate systems display the same behavior.

Systems are available in JF1 if needed to reproduce the message.
Comment 30 Deepak S 2015-02-20 09:13:48 UTC
Did we try disabling the wait for punit? can you try below patch

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index f7c9938..c40296e 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3841,7 +3841,7 @@ static void vlv_set_rps_idle(struct drm_i915_private *dev_priv)
        struct drm_device *dev = dev_priv->dev;

        /* CHV and latest VLV don't need to force the gfx clock */
-       if (IS_CHERRYVIEW(dev) || dev->pdev->revision >= 0xd) {
+       if (IS_CHERRYVIEW(dev) || IS_VALLEYVIEW(dev)) {
                valleyview_set_rps(dev_priv->dev, dev_priv->rps.min_freq_softlimit);
                return;
        }
Comment 31 Mark Janes 2015-02-20 20:04:37 UTC
with the patch disabling wait for punit, the messages are no longer produced.  I'll let the system run and report if any hangs occur.
Comment 32 Mark Janes 2015-03-02 17:39:57 UTC
Unfortunately, one of my bay trail systems hung in the same manner as before, when it was under load.  There are no PUnit messages in syslog, or any other message that looks out of the ordinary.

I think that this information indicates that the PUnit messages are unrelated to the system hangs that I have been experiencing.
Comment 33 Jesse Barnes 2015-03-03 20:46:52 UTC
Sounds like you can push your patch Deepak, and make the gfx clock force apply to all VLV.
Comment 34 Deepak S 2015-03-04 16:35:34 UTC
Sure Jesse. I will push the patch
Comment 35 Deepak S 2015-03-05 04:12:59 UTC
http://lists.freedesktop.org/archives/intel-gfx/2015-March/061322.html

Submitted patch for review.
Comment 36 Ander Conselvan de Oliveira 2015-06-05 12:52:56 UTC
Fixed in

commit a7f6e231150c93c4e15f258f0d4b1ffe97da3971
Author: Deepak S <deepak.s@linux.intel.com>
Date:   Sat May 9 18:04:44 2015 +0530

    drm/i915/vlv: Remove wait for for punit to updates freq.

in -nightly.

Mark Janes, please open a new bug report if you are still experiencing system hangs, since it seems unrelated to the Punit messages.
Comment 37 lu hua 2015-06-10 05:40:05 UTC
Verified.
Comment 38 Elizabeth 2017-10-06 14:39:44 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.