Bug 99938

Summary: [BAT][GVT-d] timeout waiting for SBI to complete read transaction
Product: DRI Reporter: Martin Peres <martin.peres>
Component: DRM/IntelAssignee: XiongZhang <xiong.y.zhang>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: high CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard: ReadyForDev
i915 platform: BDW i915 features: power/suspend-resume
Attachments:
Description Flags
Setting pch_id for Gen7.5+ in virtual environment none

Description Martin Peres 2017-02-24 10:34:25 UTC
Seems to be a known bug since there is already a patch for it (https://patchwork.freedesktop.org/series/20193/).

However, we do not want to report failures to other developers, so let's show that it is a known problem.
Comment 1 Martin Peres 2017-03-02 09:19:57 UTC
*** Bug 100018 has been marked as a duplicate of this bug. ***
Comment 2 Weinan Li 2017-03-07 00:41:33 UTC
do you still meet this issue?
Comment 3 Martin Peres 2017-03-07 09:03:23 UTC
Yes, very consistently: every time we run IGT, which is multiple times per day.

But this is expected, since https://patchwork.freedesktop.org/series/20193/ has not landed yet.
Comment 4 Terrence Xu 2017-03-07 13:39:27 UTC
(In reply to Martin Peres from comment #3)
> Yes, very consistently: every time we run IGT, which is multiple times per
> day.
> 
> But this is expected, since https://patchwork.freedesktop.org/series/20193/
> has not landed yet.

Hello Peres, can you confirm your environment is gvt-d or gvt-g, I guess this bug is for gvt-d. :)
Comment 5 Martin Peres 2017-03-07 13:50:34 UTC
Yes, you are right, the bug is in gvt-d.
Comment 6 Martin Peres 2017-03-07 13:51:40 UTC
Moving the bug to drm/intel since there is no separate component for GVT-d.
Comment 7 Terrence Xu 2017-03-07 13:56:22 UTC
(In reply to Martin Peres from comment #6)
> Moving the bug to drm/intel since there is no separate component for GVT-d.

Thanks, reassign this issue to Zhang, Xiong who is working at GVT-d project.
Comment 8 Martin Peres 2017-03-20 15:03:27 UTC
What's the status of this bug?

Until it is fixed, the following tests are ignored for our CI runs:
igt@drv_module_reload@basic-reload
igt@drv_module_reload@basic-reload-final
igt@drv_module_reload@basic-reload-inject
igt@gem_exec_suspend@basic-s3
igt@gem_exec_suspend@basic-s4-devices
igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a
igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b
igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c

This is suboptimal as it means that these tests go mostly untested and some code may get introduced and break them and our CI system will not do anything to prevent the merge. So, if you want to avoid doing more work later on to bisect the regressions, I would suggest fixing this bug as soon as possible so as we can lift these tests from the blacklist for the gvt-d platforms!
Comment 9 Jani Saarinen 2017-03-22 13:42:05 UTC
With revert done:
https://patchwork.freedesktop.org/series/21682/

Still warnings seen: 
fi-bdw-gvtdvm    total:278  pass:256  dwarn:8   dfail:0   fail:0   skip:14  time: 462s
Comment 10 Chris Wilson 2017-03-22 13:49:44 UTC
(In reply to Jani Saarinen from comment #9)
> With revert done:
> https://patchwork.freedesktop.org/series/21682/
> 
> Still warnings seen: 
> fi-bdw-gvtdvm    total:278  pass:256  dwarn:8   dfail:0   fail:0   skip:14 
> time: 462s

Of course. The reverted patch was only to identify the source of the warnings by adding more information to them.
Comment 11 Martin Peres 2017-03-23 09:57:57 UTC
Raising the priority, because it reduces our code coverage
Comment 12 XiongZhang 2017-03-28 10:26:55 UTC
The issue is: "HAS_PCH_LPT_LP(dev_priv) ? SBI_GEN0 : SBI_DBUFF0" exist in lpt_enable_clkout_dp() and lpt_disable_clkout_dp() function. 
  In GVT-d, PCH ISA bridge isn't passed through to guest, I915 could only assume it is a LPT pch and couldn't identify whether it is a LPT_LP, LPT or LPT_H PCH. So the above function will default read/write SBI_DBUFF0 register, this is wrong if it is running on LPT_LP PCH.  

The following experiment could remove this error message, but it also break LPT and LPT-H pch.
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 8b807a9..0c295f1 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -135,6 +135,7 @@ static enum intel_pch intel_virt_detect_pch(struct drm_i915_private *dev_priv)
                DRM_DEBUG_KMS("Assuming CouarPoint PCH\n");
        } else if (IS_HASWELL(dev_priv) || IS_BROADWELL(dev_priv)) {
                ret = PCH_LPT;
+               dev_priv->pch_id = INTEL_PCH_LPT_LP_DEVICE_ID_TYPE;
                DRM_DEBUG_KMS("Assuming LynxPoint PCH\n");
        } else if (IS_SKYLAKE(dev_priv) || IS_KABYLAKE(dev_priv)) {
                ret = PCH_SPT;

Is there a method to identify LPT, LPT_LP or LPT_H from IGD itself, not from PCH ISA bridge ?
Comment 13 XiongZhang 2017-03-29 03:21:00 UTC
Created attachment 130520 [details]
Setting pch_id for Gen7.5+ in virtual environment

we could get dev_priv->pch_id from IGD type in virtual environment. please help try this patch in your CI system and see whether it could fix this bug or not.
Comment 14 XiongZhang 2017-03-29 08:57:33 UTC
CI report the attachment patch could fix this bug. So I sent the patch to intel-gfx mail list.
Comment 15 Jani Saarinen 2017-03-29 10:52:06 UTC
Documenting trybot data: https://intel-gfx-ci.01.org/CI/Trybot_685/
Comment 16 Jani Saarinen 2017-04-03 11:49:36 UTC
Documenting pw data same: https://patchwork.freedesktop.org/series/22064/
Comment 17 Ricardo 2017-05-09 16:41:49 UTC
Adding tag into "Whiteboard" field - ReadyForDev
The bug still active
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
*Logs included
Comment 18 Jani Saarinen 2017-06-14 11:13:28 UTC
Patches again in the list: https://patchwork.freedesktop.org/series/25772/
Comment 19 Martin Peres 2017-07-06 11:56:59 UTC
(In reply to Jani Saarinen from comment #18)
> Patches again in the list: https://patchwork.freedesktop.org/series/25772/

The workaround is now applied, but the bug shall remain open until it is actually fixed properly.
Comment 20 Elizabeth 2017-09-13 19:25:28 UTC
Hello everybody, is there any update with this case? Thank you.
Comment 21 Martin Peres 2017-09-14 11:02:39 UTC
(In reply to Elizabeth from comment #20)
> Hello everybody, is there any update with this case? Thank you.

XiongZhang, any updates on that? I guess this will take a long time to fix this properly given that you need to land changes in many components.
Comment 22 Martin Peres 2017-09-14 11:03:05 UTC
(In reply to Elizabeth from comment #20)
> Hello everybody, is there any update with this case? Thank you.

XiongZhang, any updates on that? I guess this will take a long time to fix this properly given that you need to land changes in many components.
Comment 23 XiongZhang 2018-04-11 09:22:00 UTC
LPT pch and LPT_LP pch have different SBI registers, but real pch isn't passed through to guest in GVT-d, then guest i915 couldn't correctly identify LPT and LPT_LP pch. Currently guest i915 guess the pch type according to processor type. This guess may be wrong if ULT/ULX processor with LPT PCH or normal processor with LPT_LP pch.
In order to fix these corner case, real pch id should be passed from host to guest, but this isn't a standard, hacker method is needed in QEMU and i915, and kinds of hacker are rejected by QEMU.
Comment 24 Jani Nikula 2018-05-30 08:08:28 UTC
I suggest filing a new bug about the ISA bridge passthrough or some mechanism for passing the host PCH id to the guest, and then closing this bug. That's the real issue anyway, and this bug is just the symptom of it.
Comment 25 Francesco Balestrieri 2019-03-25 07:50:30 UTC
XiongZhang can you create a bug for the remaining corner case so that we can close this as Jani suggested?
Comment 26 XiongZhang 2019-03-25 08:45:56 UTC
Done it.
https://bugs.freedesktop.org/show_bug.cgi?id=110234

thanks
(In reply to Francesco Balestrieri from comment #25)
> XiongZhang can you create a bug for the remaining corner case so that we can
> close this as Jani suggested?
Comment 27 Francesco Balestrieri 2019-03-25 08:48:12 UTC
Thanks, closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.