Bug 109762 - [AMDGPU] flip_done timed out when playing Xonotic
Summary: [AMDGPU] flip_done timed out when playing Xonotic
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-02-24 19:34 UTC by Amadeusz
Modified: 2019-11-19 09:15 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (75.36 KB, text/plain)
2019-02-24 19:34 UTC, Amadeusz
no flags Details

Description Amadeusz 2019-02-24 19:34:30 UTC
Created attachment 143453 [details]
dmesg

Hi,

I have frequent gpu hangs when playing Xonotic using amdgpu video driver.

[ 9330.297589] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:47:crtc-0] flip_done timed out
[ 9340.537609] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:45:plane-5] flip_done timed out
[ 9340.537682] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* amdgpu_dm_commit_planes: acrtc 0, already busy
[ 9340.537762] WARNING: CPU: 1 PID: 3733 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:4860 amdgpu_dm_atomic_commit_tail+0x1349/0x14f0 [amdgpu]

full dmesg attached

# uname -r
5.0.0-rc7+

# emerge mesa -pv

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild   R    ] media-libs/mesa-19.0.0_rc4::gentoo  USE="classic dri3 egl gallium gbm gles2 llvm pic vaapi vdpau wayland -d3d9 -debug -gles1 -lm_sensors -opencl -osmesa -pax_kernel (-selinux) -test -unwind -valgrind -vulkan -xa -xvmc" ABI_X86="32 (64) (-x32)" VIDEO_CARDS="i965 intel radeon radeonsi (-freedreno) -i915 (-imx) -nouveau -r100 -r200 -r300 -r600 (-vc4) -virgl (-vivante) -vmware" 0 KiB

Total: 1 package (1 reinstall), Size of downloads: 0 KiB

# cat /etc/X11/xorg.conf.d/video.conf 
#Section "Device"
#       Identifier "Intel Graphics"
#       Driver "modesetting"
#       #Option "GLXVBlank" "off"
#       Option "AccelMethod" "glamor"
#       Option "DRI" "3"
#EndSection

Section "Device"
        Identifier "AMD"
        Driver "amdgpu"
        Option "GLXVBlank" "off"
        Option "AccelMethod" "glamor"
        Option "DRI" "3"
EndSection

# lspci -vvv -s 01:00.0
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Tonga PRO [Radeon R9 285/380] (rev f1) (prog-if 00 [VGA controller])
        Subsystem: Gigabyte Technology Co., Ltd Tonga PRO [Radeon R9 285/380]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 46
        Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Region 2: Memory at d0000000 (64-bit, prefetchable) [size=2M]
        Region 4: I/O ports at e000 [size=256]
        Region 5: Memory at dfd00000 (32-bit, non-prefetchable) [size=256K]
        Expansion ROM at dfd40000 [disabled] [size=128K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x16 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00558  Data: 0000
        Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [200 v1] Resizable BAR <?>
        Capabilities: [270 v1] Secondary PCI Express <?>
        Capabilities: [2b0 v1] Address Translation Service (ATS)
                ATSCap: Invalidate Queue Depth: 00
                ATSCtl: Enable-, Smallest Translation Unit: 00
        Capabilities: [2c0 v1] Page Request Interface (PRI)
                PRICtl: Enable- Reset-
                PRISta: RF- UPRGI- Stopped+
                Page Request Capacity: 00000020, Page Request Allocation: 00000000
        Capabilities: [2d0 v1] Process Address Space ID (PASID)
                PASIDCap: Exec+ Priv+, Max PASID Width: 10
                PASIDCtl: Enable- Exec- Priv-
        Capabilities: [328 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 1
                ARICtl: MFVC- ACS-, Function Group: 0
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu
Comment 1 Amadeusz 2019-02-27 21:50:06 UTC
So as I half remembered it didn't happen in the past.

I did some internet searches and there are few similar bugs on this bugzilla:
https://bugzilla.freedesktop.org/show_bug.cgi?id=109461
https://bugzilla.freedesktop.org/show_bug.cgi?id=104624
https://bugzilla.freedesktop.org/show_bug.cgi?id=108309

But also I found:
https://bbs.archlinux.org/viewtopic.php?id=239670
which allowed me to narrow the time when it broke.

I looked at changes between 4.14 and 4.15 and choose to try reverting one of the commits with "flip" as part of commit message. (Getting bisect running on 4.14 with too new gcc is pain...)

So seems like reverting 320a127437e5d3cbb7fc444f8769eb510d11d3b9 helps with random freezes for me (although I tested only for one day).

However from what I can see reverting this commit is just a workaround...

So if anyone wants to try and reproduce it you can install Xonotic from xonotic.org (or your distribution repositories) and either have some fun playing or just create infinite time match with bots and once click left mouse button after it starts to select bot view and leave it running.
One note that I start xonotic from command line with vblank_mode=0 added before, like "vblank_mode=0 xonotic".
Comment 2 Martin Peres 2019-11-19 09:15:22 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/710.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.