Bug 101261 - [G45] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:32:pipe A] flip_done timed out
Summary: [G45] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [C...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-31 23:03 UTC by Diego Viola
Modified: 2017-11-10 21:14 UTC (History)
2 users (show)

See Also:
i915 platform: G45
i915 features: display/atomic


Attachments
dmesg showing flip_done timed out errors (67.94 KB, text/plain)
2017-05-31 23:04 UTC, Diego Viola
no flags Details
dmesg showing [i915]] *ERROR* CPU pipe A FIFO underrun (40.96 KB, text/plain)
2017-05-31 23:05 UTC, Diego Viola
no flags Details
lspci (1.40 KB, text/plain)
2017-05-31 23:10 UTC, Diego Viola
no flags Details
dmesg (more info about the flip_done timed out trace) (87.35 KB, text/plain)
2017-05-31 23:41 UTC, Diego Viola
no flags Details
dmesg with drm-tip (65.14 KB, text/plain)
2017-06-03 02:34 UTC, Diego Viola
no flags Details
pipe A vblank wait timed out with drm-tip (95.56 KB, text/plain)
2017-06-13 22:26 UTC, Diego Viola
no flags Details
dmesg with drm.debug=0x3f (drm-tip kernel) (5.37 MB, application/x-xz)
2017-06-13 22:46 UTC, Diego Viola
no flags Details
dmesg with drm.debug=0x3e (vsyrjala/linux.git gmch_irq_redo) (1015.19 KB, text/plain)
2017-06-22 20:49 UTC, Diego Viola
no flags Details
[PATCH] drm/i915: Disable MSI for all pre-gen5 (1.16 KB, patch)
2017-06-23 10:51 UTC, Ville Syrjala
no flags Details | Splinter Review
[PATCH] drm/i915: Drop the IER tricks from pre-gen5 irq handlers (3.30 KB, patch)
2017-06-23 17:18 UTC, Ville Syrjala
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Diego Viola 2017-05-31 23:03:55 UTC
I've noticed that while playing the NFSIISE port on my desktop PC I get errors such as this one on dmesg:

[  544.874379] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun

Previously, my system also hung while playing this same game on this PC, with a different error:

[ 3641.648551] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:29:pipe A] flip_done timed out

This is the game I'm playing:

https://github.com/zaps166/NFSIISE

I'm on Arch Linux (x86-64) and kernel 4.11.3-1-ARCH.
Comment 1 Diego Viola 2017-05-31 23:04:55 UTC
Created attachment 131623 [details]
dmesg showing flip_done timed out errors
Comment 2 Diego Viola 2017-05-31 23:05:59 UTC
Created attachment 131624 [details]
dmesg showing [i915]] *ERROR* CPU pipe A FIFO underrun
Comment 3 Diego Viola 2017-05-31 23:10:17 UTC
Created attachment 131625 [details]
lspci
Comment 4 Diego Viola 2017-05-31 23:41:02 UTC
Created attachment 131626 [details]
dmesg (more info about the flip_done timed out trace)
Comment 5 Diego Viola 2017-06-02 20:06:16 UTC
I get the error below after running other games too (Limbo, Distance, etc.)

[ 9982.362655] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
Comment 6 Diego Viola 2017-06-03 02:34:26 UTC
Created attachment 131689 [details]
dmesg with drm-tip

I compiled a kernel from drm-tip and I no longer get this issue:

[ 9982.362655] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun

However, after trying a game and using my system for other things, my system began to stall and graphics would also stop updating, I noticed this in dmesg again:

[  888.582323] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:32:pipe A] flip_done timed out

Please see the attachment for the full dmesg with the drm-tip kernel.
Comment 7 Diego Viola 2017-06-03 02:38:36 UTC
This is the drm-tip commit I'm currently testing with:

commit d919ad0d077110d2d48a4b7503ddc02c3864667d
Comment 8 Maarten Lankhorst 2017-06-08 10:04:24 UTC
I'm glad the FIFO underruns are fixed. g45 had some changes to watermarks that probably fixed that.

I'm less certain about pipe A vblank wait timeouts, could you boot with drm.debug=0x3f and reproduce the error? it's a bit spammy though.
Comment 9 Diego Viola 2017-06-13 04:33:38 UTC
(In reply to Maarten Lankhorst from comment #8)
> I'm glad the FIFO underruns are fixed. g45 had some changes to watermarks
> that probably fixed that.
> 
> I'm less certain about pipe A vblank wait timeouts, could you boot with
> drm.debug=0x3f and reproduce the error? it's a bit spammy though.

I couldn't reproduce the flip_done timed out error with drm.debug=0x3f, do you have a testsuite I could test with?
Comment 10 Diego Viola 2017-06-13 22:26:27 UTC
Created attachment 131937 [details]
pipe A vblank wait timed out with drm-tip

The "pipe A vblank wait timed out" hang is still happening with drm-tip, that said, I'm unable to reproduce it when booting the kernel with drm.debug=0x3f.
Comment 11 Diego Viola 2017-06-13 22:46:15 UTC
Created attachment 131938 [details]
dmesg with drm.debug=0x3f (drm-tip kernel)

OK,  I was finally able to reproduce the hang with drm.debug=0x3f.

I was able to reproduce the problem by switching between the game and pavucontrol, I'm using i3 as my window manager.

Please see the attachment.
Comment 12 Diego Viola 2017-06-20 20:37:31 UTC
[ 2113.973548] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:30:pipe A] flip_done timed out

This is still happening with linux 4.12-rc6.
Comment 13 Diego Viola 2017-06-20 20:46:57 UTC
I'm running the games inside a systemd-nspawn container, not sure if this matters at all.
Comment 14 Diego Viola 2017-06-21 03:29:24 UTC
I also get the same problem when running the game on my main system (outside of systemd-nspawn).
Comment 15 Ville Syrjala 2017-06-22 12:35:05 UTC
Can you test with this?
git://github.com/vsyrjala/linux.git gmch_irq_redo
Comment 16 Diego Viola 2017-06-22 19:31:44 UTC
(In reply to Ville Syrjala from comment #15)
> Can you test with this?
> git://github.com/vsyrjala/linux.git gmch_irq_redo

It freezes with this branch too.

Jun 22 16:26:42 myhost kernel: [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:32:pipe A] flip_done timed out
Comment 17 Diego Viola 2017-06-22 20:49:37 UTC
Created attachment 132149 [details]
dmesg with drm.debug=0x3e (vsyrjala/linux.git gmch_irq_redo)
Comment 18 Diego Viola 2017-06-23 05:39:47 UTC
I tried some older kernels in order to see if this ever worked before, and to see if I could bisect the problem.

The kernels I tried:

4.10.17
4.9.33
4.8.17
4.7.10
4.4.73

Those are all broken, which means the bug is also present in those too.

4.8.17 and 4.7.10 I couldn't even get to compile due to this error:

kernel/built-in.o: In function `update_wall_time':
(.text+0x7afd7): undefined reference to `____ilog2_NaN'
make: *** [Makefile:969: vmlinux] Error 1

I'm sure 4.6 and 4.5 would have been the same.

4.4.73 is also broken, which means the game ran and but it hung, I didn't notice any 'flip_done timed out' errors in dmesg, the game window just got frozen/stuck and after I closed the game window (ctrl-c on the process), and restarted the game, the whole system froze and the display turned black. I had to hard reset at this point.

So maybe it never worked and there's no way for me to bisect.
Comment 19 Ville Syrjala 2017-06-23 10:51:15 UTC
Created attachment 132155 [details] [review]
[PATCH] drm/i915: Disable MSI for all pre-gen5

Let's try disabling MSI. It suspect it's just broken still on g4x. It is even documented to be broken on 965gm but we still try to use it for some reason.
Comment 20 Ville Syrjala 2017-06-23 17:18:00 UTC
Created attachment 132208 [details] [review]
[PATCH] drm/i915: Drop the IER tricks from pre-gen5 irq handlers

We might not need these IER tricks when we're not using MSI, so it would be nice if you can try this in addition to the previous patch.
Comment 21 Diego Viola 2017-06-23 20:17:27 UTC
(In reply to Ville Syrjala from comment #19)
> Created attachment 132155 [details] [review] [review]
> [PATCH] drm/i915: Disable MSI for all pre-gen5
> 
> Let's try disabling MSI. It suspect it's just broken still on g4x. It is
> even documented to be broken on 965gm but we still try to use it for some
> reason.

By applying this patch to the gmch_irq_redo branch the problem was solved, thanks.
Comment 22 Diego Viola 2017-06-23 20:27:08 UTC
(In reply to Ville Syrjala from comment #20)
> Created attachment 132208 [details] [review] [review]
> [PATCH] drm/i915: Drop the IER tricks from pre-gen5 irq handlers
> 
> We might not need these IER tricks when we're not using MSI, so it would be
> nice if you can try this in addition to the previous patch.

The combination of the gmch_irq_redo branch and your two latest patches also solves the problem.
Comment 23 Diego Viola 2017-06-24 01:28:20 UTC
drm-tip + 0001-drm-i915-Disable-MSI-for-all-pre-gen5.patch is also working perfectly.

I've spent another 2 hours playing, I've basically spent all day playing games, and it didn't freeze anymore.

Thank you.
Comment 24 Ville Syrjala 2017-06-27 13:18:09 UTC
commit e38c2da01f76cca82b59ca612529b81df82a7cc7
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Mon Jun 26 23:30:51 2017 +0300

    drm/i915: Disable MSI for all pre-gen5
Comment 25 Ricardo 2017-06-27 14:05:50 UTC
Closing verified
Comment 26 fin4478 2017-09-11 06:38:19 UTC
If there Any patches for this bug, they are not in the kernel 4.13.1 from kernel.org. Disabling MSI in the kernel config does not help.
Dmesg at boot:
[   19.424100] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:30:pipe A] flip_done timed out
[   19.525043] vblank wait timed out on crtc 0
[   19.525096] ------------[ cut here ]------------
[   19.525113] WARNING: CPU: 1 PID: 106 at drivers/gpu/drm/drm_vblank.c:1090 drm_wait_one_vblank+0x187/0x190 [drm]

Dmesg when playing OpenArena:
[  477.152249] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:35:pipe B] flip_done timed out
[  477.253242] vblank wait timed out on crtc 1
[  477.253303] ------------[ cut here ]------------
[  477.253325] WARNING: CPU: 0 PID: 363 at drivers/gpu/drm/drm_vblank.c:1090 drm_wait_one_vblank+0x187/0x190 [drm]

My 10 year old dual core pentium64 intel laptop:
[    0.000000] DMI: FUJITSU SIEMENS ESPRIMO Mobile V5505           /ESPRIMO Mobi
le V5505           , BIOS R01-A1D    01/21/2009
Comment 27 Diego Viola 2017-09-25 20:56:00 UTC
(In reply to fin4478 from comment #26)
> If there Any patches for this bug, they are not in the kernel 4.13.1 from
> kernel.org. Disabling MSI in the kernel config does not help.

I don't think this is true, see:

commit ce3f7163e4ce8fd583dcb36b6ee6b81fd1b419ae
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Mon Jun 26 23:30:51 2017 +0300

    drm/i915: Disable MSI for all pre-gen5

$ git describe --contains ce3f7163e4ce8fd583dcb36b6ee6b81fd1b419ae
v4.13-rc1~45^2~2^2~9

The patch is already included in 4.13 and up.

> Dmesg at boot:
> [   19.424100] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]]
> *ERROR* [CRTC:30:pipe A] flip_done timed out
> [   19.525043] vblank wait timed out on crtc 0
> [   19.525096] ------------[ cut here ]------------
> [   19.525113] WARNING: CPU: 1 PID: 106 at drivers/gpu/drm/drm_vblank.c:1090
> drm_wait_one_vblank+0x187/0x190 [drm]
> 
> Dmesg when playing OpenArena:
> [  477.152249] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]]
> *ERROR* [CRTC:35:pipe B] flip_done timed out
> [  477.253242] vblank wait timed out on crtc 1
> [  477.253303] ------------[ cut here ]------------
> [  477.253325] WARNING: CPU: 0 PID: 363 at drivers/gpu/drm/drm_vblank.c:1090
> drm_wait_one_vblank+0x187/0x190 [drm]
> 
> My 10 year old dual core pentium64 intel laptop:
> [    0.000000] DMI: FUJITSU SIEMENS ESPRIMO Mobile V5505           /ESPRIMO
> Mobi
> le V5505           , BIOS R01-A1D    01/21/2009

I think it's possible that your "flip_done timed out" is being caused by a different issue, please open a separate bug report.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.