Summary: | [SNB bisected] *ERROR* uncleared fifo underrun on pipe A | ||
---|---|---|---|
Product: | DRI | Reporter: | Chris Bainbridge <chris.bainbridge> |
Component: | DRM/Intel | Assignee: | Matt Roper <matthew.d.roper> |
Status: | CLOSED WORKSFORME | QA Contact: | Elio <elio.martinez.monroy> |
Severity: | critical | ||
Priority: | high | CC: | caravena, dorota.czaplejewicz, flyser42, gary.c.wang, intel-gfx-bugs, matthew.d.roper, philippe, spiderx, tomeu, udo |
Version: | DRI git | Keywords: | bisected |
Hardware: | x86-64 (AMD64) | ||
OS: | All | ||
Whiteboard: | ReadyForDev | ||
i915 platform: | SNB | i915 features: | display/watermark |
Attachments: |
Description
Chris Bainbridge
2016-05-24 13:22:56 UTC
I have reported this issue separately and I still see this on Linux v4.7-rc5 on my Sandybridge GPU. [ dmesg ] ... [ 18.524752] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun [ 18.527260] [drm:intel_set_pch_fifo_underrun_reporting [i915]] *ERROR* uncleared pch fifo underrun on pch transcoder A [ 18.527294] [drm:intel_pch_fifo_underrun_irq_handler [i915]] *ERROR* PCH transcoder A FIFO underrun $ grep -i sandy /var/log/Xorg.0.log [ 25.674] (II) intel(0): SNA initialized with Sandybridge (gen6, gt2) backend - Sedat Dilek - Do these messages only appear one time when the driver first loads, or do they continue to get generated during regular system operation (e.g., moving mouse, changing modes, etc.)? If you only see them when the driver is loaded, then there's probably something not quite right during our initial watermark sanitization step where we inherit the BIOS configuration and try to clean it up. If they continue to get generated later while the system is in use, that will indicate that we're still doing something wrong at runtime, which is more serious. Is there any corruption/flickering that goes along with these messages? If you could provide a dmesg log of a boot with debugging turned on (drm.debug=0xf), that might help us debug where the problem is. The messages appear once at boot. (The similar messages mentioned in bug #93802 appear every time Xorg is quit or switched to Linux terminal) There is no obvious corruption or flickering. Created attachment 124779 [details]
dmesg-drm
boot log attached
Is this issue still valid? (In reply to Jani Saarinen from comment #5) > Is this issue still valid? Yes still appear at boot on 4.9.0 This bug is still present on 4.10-rc2 on my Thinkpad W520 (sandy bridge). I tried to reproduce on IVB-3770 in 2 scenarios: - boot with no display manager - turn on display manager after boot , but no underruns in dmesg with drm.debug=0x1f. Does it happen if X/display manager is not enabled on boot? Does this issue require special kernel config? Created attachment 129242 [details]
a kernel configuration which is affected by the bug
(In reply to Dorota Czaplejewicz from comment #8) > Does it happen if X/display manager is not enabled on boot? > Does this issue require special kernel config? It also happens when just my initrd with the normal KMS console is booted. I don't know if a special kernel configuration is required, but I attached mine just in case. (In reply to Fabian Henze from comment #10) > (In reply to Dorota Czaplejewicz from comment #8) > > Does it happen if X/display manager is not enabled on boot? > > Does this issue require special kernel config? > > It also happens when just my initrd with the normal KMS console is booted. > > I don't know if a special kernel configuration is required, but I attached > mine just in case. Chris, Fabian, Sedat, would you be able to find out what are the programmed latencies at boot (the ones inherited from the BIOS), and if a tweak similar to the one below helps avoid it? http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=e95a2f7509f5219177d6821a0a8754f93892ca56 (In reply to Tomeu Vizoso from comment #11) > Chris, Fabian, Sedat, would you be able to find out what are the programmed > latencies at boot (the ones inherited from the BIOS), If you give me some hints how to do that, I would. > and if a tweak similar to the one below helps avoid it? > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/ > ?id=e95a2f7509f5219177d6821a0a8754f93892ca56 Since I am using a sandy bridge notebook (1080p resolution if that matters), this tweak should already be in place, right? (In reply to Fabian Henze from comment #12) > (In reply to Tomeu Vizoso from comment #11) > > Chris, Fabian, Sedat, would you be able to find out what are the programmed > > latencies at boot (the ones inherited from the BIOS), > > If you give me some hints how to do that, I would. It should be already printed, at least in recent kernels. intel_print_wm_latency will tell. Could you please attach your dmesg with drm.debug=0x1f? > > and if a tweak similar to the one below helps avoid it? > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/ > > ?id=e95a2f7509f5219177d6821a0a8754f93892ca56 > > Since I am using a sandy bridge notebook (1080p resolution if that matters), > this tweak should already be in place, right? You are right, could you please find out what's the smaller value that when passed to ilk_increase_wm_latency() will avoid the messages? A fix has come for another similar bug that could also solve these issues: https://patchwork.freedesktop.org/series/19953/ Could someone test it and see if it also fixes the issue described in this ticket? Thanks! Created attachment 129876 [details]
dmesg on sandy bridge with drm.debug=0x1f
(In reply to Tomeu Vizoso from comment #14) > A fix has come for another similar bug that could also solve these issues: > https://patchwork.freedesktop.org/series/19953/ Unfortunately it didn't fix this bug, but I attached my dmesg output with drm.debug=0x1f. This was on a 4.10 kernel WITH the mentioned patch applied. (In reply to Tomeu Vizoso from comment #13) > You are right, could you please find out what's the smaller value that when > passed to ilk_increase_wm_latency() will avoid the messages? I was not sure how many and which values I should try, so I set it to 10 and the message didn't go away. Then I tried removing ilk_increase_wm_latency() altogether, which didn't help either. Anything else I should try? (In reply to Fabian Henze from comment #17) > (In reply to Tomeu Vizoso from comment #13) > > You are right, could you please find out what's the smaller value that when > > passed to ilk_increase_wm_latency() will avoid the messages? > I was not sure how many and which values I should try, so I set it to 10 and > the message didn't go away. Then I tried removing ilk_increase_wm_latency() > altogether, which didn't help either. Anything else I should try? Hi, I think 10 would be a bit too little. I would expect the value to be between 12 and 20, but maybe more could be needed. (In reply to Tomeu Vizoso from comment #18) > Hi, I think 10 would be a bit too little. I would expect the value to be > between 12 and 20, but maybe more could be needed. I tried a few values between 13 and 100 (13, 20, 50 and 100 iirc) and all had the fifo underrun. What now ...? (In reply to Fabian Henze from comment #19) > (In reply to Tomeu Vizoso from comment #18) > > Hi, I think 10 would be a bit too little. I would expect the value to be > > between 12 and 20, but maybe more could be needed. > > I tried a few values between 13 and 100 (13, 20, 50 and 100 iirc) and all > had the fifo underrun. What now ...? So, your last log showed this: [ 1.034165] [drm:gen6_check_mch_setup] Wrong MCH_SSKPD value: 0x16040307 This can cause underruns. This usually means that the BIOS programmed a too low priority (ox7 instead of 0xc) for memory requests from the display, but that register is locked by when the kernel starts so we cannot fix it. I see your machine currently has fw 1.42 and there's a 1.43 available from Lenovo's support site, but the BIOS changelog doesn't mention such a fix. It may be worth it anyway to upgrade the BIOS to 1.43. In the kernel sources it's mentioned that when the BIOS has such a bug and it cannot be updated to a fixed version, that the issue can only be worked around by increasing the latencies. It may still be worth it though to check that the watermarks are programmed at the right values. But you report that fixing up the latencies above 12 didn't help, so we might not be able to do anything about it other than hiding the message. (In reply to Chris Bainbridge from comment #6) > (In reply to Jani Saarinen from comment #5) > > Is this issue still valid? > > Yes still appear at boot on 4.9.0 Hi, could you please try again with v4.11-rc1 while booting with ignore_loglevel drm.debug=0x1f and attach the whole dmesg? Thanks! Fabian - can you please help out Tomeu? See comment 21, and use the newest kernel if possible. Created attachment 130647 [details]
kernel 4.11-rc4: dmesg on sandy bridge with drm.debug=0x1f
(In reply to Tomeu Vizoso from comment #21) > Hi, could you please try again with v4.11-rc1 while booting with > ignore_loglevel drm.debug=0x1f and attach the whole dmesg? Thanks! Sorry for the long delay. I attached the logfile and will try to respond faster next time ;-) Can you re-test with the latest development kernel? git://anongit.freedesktop.org/drm-tip drm-tip I'm thinking that commit a5509abda48e ("drm/i915: Fix legacy cursor vs. watermarks for ILK-BDW") in particular may be helpful. Created attachment 130929 [details]
drm-tip: dmesg on sandy bridge with drm.debug=0x1f
still present in drm-tip 1b757084743a.
Adding tag into "Whiteboard" field - ReadyForDev The bug still active *Status is correct *Platform is included *Feature is included *Priority and Severity correctly set *Logs inclueded Reporters, can you again verify if help coming from latest drm-tip? Reporters, ping. Closing soon if no response. Is this still valid on latest? Created attachment 133471 [details]
dmesg on ivy bridge with drm.debug=0x1f drm-intel-fixes-2017-08-09-1-801-gd063a48456d2
Bug still exists in latest drm nightly (drm-intel-fixes-2017-08-09-1-801-gd063a48456d2) on IVB Macbook.
It was reported that some fifo underrund problems can be caused by enabling CONFIG_INTEL_IOMMU_DEFAULT_ON=y (see https://bugs.archlinux.org/task/55629#comment161154). I notice that Fabian's configuration has this flag set... Could that really be the problem? (In reply to François Guerraz from comment #31) > It was reported that some fifo underrund problems can be caused by enabling > CONFIG_INTEL_IOMMU_DEFAULT_ON=y (see > https://bugs.archlinux.org/task/55629#comment161154). > I notice that Fabian's configuration has this flag set... > Could that really be the problem? Setting intel_iommu=igfx_off indeed fixed it. *** Bug 100219 has been marked as a duplicate of this bug. *** *** Bug 105060 has been marked as a duplicate of this bug. *** First of all. Sorry about spam. This is mass update for our bugs. Sorry if you feel this annoying but with this trying to understand if bug still valid or not. If bug investigation still in progress, please ignore this and I apologize! If you think this is not anymore valid, please comment to the bug that can be closed. If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug. Closing, please re-open if still occurs. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.