Bug 94126

Summary: [SKL/KBL] Vblank counter issues when DMC takes goes to DC5/DC6 impacting frontbuffer tracking and PSR with X modesetting(0) driver.
Product: DRI Reporter: Rodrigo Vivi <rodrigo.vivi>
Component: DRM/IntelAssignee: Dhinakaran Pandiyan <dhinakaran.pandiyan>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: high CC: dhinakaran.pandiyan, intel-gfx-bugs, rodrigo.vivi, tjaalton
Version: DRI git   
Hardware: Other   
OS: Linux (All)   
Whiteboard: ReadyForDev
i915 platform: KBL, SKL i915 features: display/PSR, power/Other

Description Rodrigo Vivi 2016-02-12 20:05:41 UTC
Note: PSR on SKL and KBL will be disabled by default while this case is not properly handled because with X modesetting driver we can have a big slowness between a type and the PSR getting Exit and screen consequently updated.

Issue: Hardware frame counters are reset to zero when DC5/DC6 are entered, but cannot be restored because they are read-only registers. With this drm vblank handlers get confused on the number and can take forever to return a wait for vblank ioctl what can slow down some user space drawings. And much worst when PSR is enabled and depend on frontbuffer tracking to exit and get screen updated.

The working log on HSW:
[ 12.026748] [drm:drm_ioctl] pid=1154, dev=0xe200, auth=1, I915_GEM_SET_DOMAIN 
[ 12.026757] [drm:drm_ioctl] pid=1154, dev=0xe200, auth=1, I915_GEM_MADVISE 
[ 12.026769] [drm:drm_ioctl] pid=1154, dev=0xe200, auth=1, I915_GEM_MADVISE 
[ 12.026776] [drm:drm_ioctl] pid=1154, dev=0xe200, auth=1, I915_GEM_BUSY 
[ 12.026781] [drm:drm_ioctl] pid=1154, dev=0xe200, auth=1, I915_GEM_MADVISE 
[ 12.026787] [drm:drm_ioctl] pid=1154, dev=0xe200, auth=1, I915_GEM_SET_DOMAIN 
[ 12.026928] [drm:drm_ioctl] pid=1048, dev=0xe200, auth=1, DRM_IOCTL_WAIT_VBLANK 
[ 12.026936] [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0 
[ 12.026946] [drm:drm_wait_vblank] waiting on vblank count 529, crtc 0 
[ 12.026949] [drm:drm_wait_vblank] returning 529 to client 
[ 12.026952] [drm:vblank_disable_fn] disabling vblank on crtc 0 
[ 12.026964] [drm:drm_ioctl] pid=1048, dev=0xe200, auth=1, DRM_IOCTL_WAIT_VBLANK 
[ 12.026968] [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0 
[ 12.026972] [drm:drm_wait_vblank] waiting on vblank count 529, crtc 0 
[ 12.026975] [drm:drm_wait_vblank] returning 529 to client 
[ 12.026978] [drm:vblank_disable_fn] disabling vblank on crtc 0
[ 12.026984] [drm:drm_ioctl] pid=1048, dev=0xe200, auth=1, DRM_IOCTL_WAIT_VBLANK 
[ 12.026988] [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0 
[ 12.026993] [drm:drm_queue_vblank_event] event on vblank count 530, current 529, crtc 0
[ 12.044837] [drm:drm_handle_vblank_events] vblank event on 530, current 531
[ 12.044841] [drm:vblank_disable_fn] disabling vblank on crtc 0 
[ 12.044964] [drm:drm_ioctl] pid=1048, dev=0xe200, auth=1, I915_GEM_SW_FINISH 
[ 12.044971] [drm:drm_ioctl] pid=1048, dev=0xe200, auth=1, I915_GEM_EXECBUFFER2
[ 12.045053] [drm:i915_gem_execbuffer_move_to_active [i915]] ERROR I915-DEBUG: i915_gem_execbuffer_move_to_active 1118 
[ 12.045080] [drm:intel_fb_obj_invalidate [i915]] ERROR I915-DEBUG: intel_fb_obj_invalidate 90 fb_bits 1 origin 2 
– PSR EXIT –


The Bad case on on SKL:
[ 12.079666] [drm:drm_ioctl] pid=1101, dev=0xe200, auth=1, I915_GEM_SET_DOMAIN 
[ 12.079700] [drm:drm_ioctl] pid=1101, dev=0xe200, auth=1, I915_GEM_MADVISE 
[ 12.079720] [drm:drm_ioctl] pid=1101, dev=0xe200, auth=1, I915_GEM_BUSY 
[ 12.079732] [drm:drm_ioctl] pid=1101, dev=0xe200, auth=1, I915_GEM_MADVISE 
[ 12.079746] [drm:drm_ioctl] pid=1101, dev=0xe200, auth=1, I915_GEM_SET_DOMAIN 
[ 12.080020] [drm:drm_ioctl] pid=991, dev=0xe200, auth=1, DRM_IOCTL_WAIT_VBLANK 
[ 12.080051] [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0 
[ 12.080797] [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0: Noisy timestamp 69 us > 20 us [3 reps]. 
[ 12.080833] [drm:drm_wait_vblank] waiting on vblank count 36, crtc 0 
[ 12.080844] [drm:drm_wait_vblank] returning 36 to client 
[ 12.080931] [drm:vblank_disable_fn] disabling vblank on crtc 0 
[ 12.080991] [drm:drm_ioctl] pid=991, dev=0xe200, auth=1, DRM_IOCTL_WAIT_VBLANK 
[ 12.081011] [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0 
[ 12.081025] [drm:drm_wait_vblank] waiting on vblank count 36, crtc 0 
[ 12.081036] [drm:drm_wait_vblank] returning 36 to client 
[ 12.081046] [drm:vblank_disable_fn] disabling vblank on crtc 0
[ 12.081068] [drm:drm_ioctl] pid=991, dev=0xe200, auth=1, DRM_IOCTL_WAIT_VBLANK 
[ 12.081080] [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0 
[ 12.081100] [drm:drm_queue_vblank_event] event on vblank count 541, current 36, crtc 0
[ 20.497782] [drm:drm_handle_vblank_events] vblank event on 541, current 541
[ 20.497796] [drm:vblank_disable_fn] disabling vblank on crtc 0 
[ 20.498061] [drm:drm_ioctl] pid=991, dev=0xe200, auth=1, I915_GEM_SW_FINISH 
[ 20.498079] [drm:drm_ioctl] pid=991, dev=0xe200, auth=1, I915_GEM_EXECBUFFER2
[ 20.498296] [drm:i915_gem_execbuffer_move_to_active [i915]] ERROR I915-DEBUG: i915_gem_execbuffer_move_to_active 1118 
[ 20.498390] [drm:intel_fb_obj_invalidate [i915]] ERROR I915-DEBUG: intel_fb_obj_invalidate 90 fb_bits 1 origin 2 
– PSR EXIT –

So the slowness was caused by the vblank couter waiting to go from 36 to 541!
Comment 1 Timo Aaltonen 2016-03-15 14:32:52 UTC
Any news on this? We'd like to migrate gen9+ to use modesetting instead..
Comment 2 yann 2016-04-29 12:13:38 UTC
Increasing priority due to current platform experience impact
Comment 3 Timo Aaltonen 2016-07-06 17:28:03 UTC
this seems to be working fine for me now, tested on KBL first with 4.6 (bad) and then 4.7 (good)
Comment 4 Timo Aaltonen 2016-09-28 06:59:12 UTC
and at least SKL broken again with PSR enabled on 4.8
Comment 5 Jari Tahvanainen 2017-03-13 12:16:49 UTC
Timo, Rodrigo - any new news on this?
Comment 6 Timo Aaltonen 2017-03-13 14:39:11 UTC
Tested 4.10 on SKL & KBL, both still busted.
Comment 7 Jim Bride 2017-03-14 18:30:48 UTC
I want to do a run with vblank debug messages turned on to ensure that my understanding of what's going on here is correct, but from looking at the code I think we should be able to handle this case (where we lose the HW vblank count due to a power management event) similarly to what would happen if the counter wrapped around.  We'll likely want to modify drm_update_vblank_count() to ensure that DRM's software counter gets adjusted
properly when we lose our HW counter.
Comment 8 Ricardo 2017-05-09 17:40:41 UTC
Adding tag into "Whiteboard" field - ReadyForDev
The bug still active
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
Comment 9 Elizabeth 2017-07-27 20:43:40 UTC
(In reply to Jim Bride from comment #7)
> I want to do a run with vblank debug messages turned on to ensure that my
> understanding of what's going on here is correct, but from looking at the
> code I think we should be able to handle this case (where we lose the HW
> vblank count due to a power management event) similarly to what would happen
> if the counter wrapped around.  We'll likely want to modify
> drm_update_vblank_count() to ensure that DRM's software counter gets adjusted
> properly when we lose our HW counter.
Good afternoon,
Is there any update on this case? 
Thank you.
Comment 10 Jim Bride 2017-07-31 16:39:36 UTC
I'm in the process of writing an IGT test to verify a set of patches that Rodrigo wrote last year that should address this issue.  During patch reviews an IGT test was requested for this case.
Comment 11 Elizabeth 2017-10-23 19:22:41 UTC
(In reply to Jim Bride from comment #10)
> I'm in the process of writing an IGT test to verify a set of patches that
> Rodrigo wrote last year that should address this issue.  During patch
> reviews an IGT test was requested for this case.
Then shouldn't we move this case to IGT component? Thank you.
Comment 12 Jani Saarinen 2018-03-29 07:10:36 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 13 Jani Saarinen 2018-04-22 15:50:01 UTC
Rodrigo, DK, is this still valid?
Comment 14 Dhinakaran Pandiyan 2018-04-23 17:29:23 UTC
This was fixed with 

2e8bf22 drm/i915: Estimate and update missed vblanks.
d0bb96b drm/vblank: Restoring vblank counts after device PM events.
68036b0 drm/vblank: Do not update vblank count if interrupts are already disabled.
f4c0468 drm/atomic: Handle 64-bit return from drm_crtc_vblank_count()
3abe241 drm/tegra: Handle 64-bit return from drm_crtc_vblank_count()
9038aa4 drm/radeon: Handle 64-bit return from drm_crtc_vblank_count()
23effc1 drm/amdgpu: Handle 64-bit return from drm_crtc_vblank_count()
1b29b7c drm/i915: Handle 64-bit return from drm_crtc_vblank_count()
734cbbf drm/i915/vblank: Make the vblank counter u64 -> u32 typecast explicit
3b765c0 drm/vblank: Data type fixes for 64-bit vblank sequences.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.