Bug 103844 - [bsw] "*ERROR* cpu pipe A|B|C fifo underrun" - can not recovery
Summary: [bsw] "*ERROR* cpu pipe A|B|C fifo underrun" - can not recovery
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-22 07:53 UTC by kait wang
Modified: 2018-04-20 15:12 UTC (History)
3 users (show)

See Also:
i915 platform: BSW/CHT
i915 features: display/watermark


Attachments
demsg info (42.94 KB, text/plain)
2017-11-22 08:30 UTC, kait wang
no flags Details

Description kait wang 2017-11-22 07:53:30 UTC
our device which use [braswell] n3160 to decode&display report "pipe underrun"
accidentally, however, can not be recovery. we have two display modes for the device. one is three hdmi display mode, another is hdmi&vga with same source display mode. the relationship between pipe and connecter is pipeA - DP0     
 pipeB-HDMI_A3  pipeC-HDMI_A1. when this bug happens, the output of hdmi(or vga) refer to related pipe is not correct. the condition to reproducting the bug as follows:
1)kernel version: linux-4.2.x
2)drm version:libdrm-2.4.64
3)application: the layout of hdmi(or vga) output changed periodically for 10s,and each layout has several "windows" which maybe overlapped for display decoded picture.so we also have decoder business on braswell.
4)now this bug is always happens after about 2 days when we run our business
Comment 1 kait wang 2017-11-22 08:30:01 UTC
Created attachment 135661 [details]
demsg info
Comment 2 Elizabeth 2017-11-22 16:23:16 UTC
Hello Kait, is there any reason to use 4.2 specifically? It is possible for you to try with latest stable of https://www.kernel.org???
Comment 3 ycui6 2017-11-23 09:30:13 UTC
Hi Elizabeth,
Yao is one PAE from Intel IOTG PRC ICE team. 
Kai reported issue come from DAHUA one big customer WW top 10 in DSS domain, I assistant Kai on this underrun can't recovery issue for more than one weeks. It's truly urgent to customer. Shall we improve this ticket priority?

BTW:
1, Kernel Update: 
1-1 Kai serviced project has pass production base on 4.2.x that is Brasewell  suggested kernel that time and with https://www.kernel.org long term support. It's not easy to update before that prove workable. BTW what version do u suggest for BSW? 4.4 or 4.09/4.10?...
 
2, About Rootcause:

This issue happen with firstly 3 independence display refresh their monitors highest timing. Almost 1080p60 fps, Secondly GFX media full-time running for 16 channelx1080p decoding or switch 3x4k, or 64xD1...and VPP combine them together to 3 display channel, and maybe flip window would overlap each other. Thirdly, PIPEA/C may show the same video source and display using set_plane for refresh by only A's VBL(Potential SYNC Problem? But libdrm should help handle and protect this).

On the other hand, we make a small test. found that if one FB is displaying on one PIPE, and we thought it finished on other, we hard swith FB on the 2nd pipe's regs, underrun also cold happen.

Another try, is when 3 independence display running normally. we try to modified  FIFODISARB, to some level. underrun also could happen.


So question here, Expect for your sharing you Opinions
1,what's the truly root cause do u think that underrun happened 
  A, Memory Bandwidth?  
     then we try to workaround by slow down the FPS, or try cut-off some GFX task
  B, FB switch by setplane. kernel sync protection is not good?
     then try upgrade kernel and try if already fix there
  c, HW FIFO by REG FIFODISARB not tuning to a suitable size?
     Still need turing.

We really need your help and comments. expect for your reply, thanks!
Comment 4 Elizabeth 2017-11-23 17:43:02 UTC
Thanks for the information Ycui6, allow us some time to check in the issue.
Setting ticket to High since bug is being looked into.
Comment 5 Clinton Taylor 2017-12-05 18:06:38 UTC
Many pipe underrun issues are caused by plane watermark programming in the i915 driver. Kernel 4.2 doesn't have many of the i915 CHV watermark fixes in later kernels. 

Suggest backporting of CHV watermark fixes. Kernel 4.6 appears to have most of the needed changes.
Comment 6 kait wang 2017-12-11 03:17:35 UTC
hi Elizabeth:
now we have made some progress in this problem. 
firstly, we increase the Pre-allocated/total GFX in bios setting to maximum and this problem was disappeared. 
secondly, the hrtimer is set in our config of kernel, which CONFIG_HIGH_RES_TIMERS and CONFIG_SCHED_HRTICK are set to yes. howerev, once we modify the config to disable the hrtimer, this underrun problem is disappeared.
so we think the underrun problem is related to the access to gfx memory which could be effected by hrtimer. now we confused about the relationship between hrtimer and gpu.
We really need your help and comments. expect for your reply again, thanks!
Comment 7 Elizabeth 2017-12-18 20:17:01 UTC
Hello again, 
I was asking around a little, and to follow up I would like to know if you got the opportunity to test with the backported watermark fixes or the kernel 4.6?
Comment 8 Jani Nikula 2018-01-22 12:11:21 UTC
Please use a newer kernel or backport the fixes as suggested in comment #5. v4.2 is old and unsupported (see kernel.org).

If you need to bump priority, please escalate using your usual internal channels. Thanks.
Comment 9 Jani Saarinen 2018-03-29 07:10:38 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 10 Jani Saarinen 2018-04-20 15:12:34 UTC
Closing, please re-open if still occurs.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.