Bug 102358 - WarThunder freezes at start, with activated vsync (vblank_mode=2)
Summary: WarThunder freezes at start, with activated vsync (vblank_mode=2)
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: Other Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-22 11:40 UTC by haro41
Modified: 2017-11-14 14:58 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
gdb all tread backtrace (69.71 KB, text/plain)
2017-08-23 17:51 UTC, haro41
Details
Patch to see if there might be a race causing this (4.34 KB, patch)
2017-08-25 11:23 UTC, Thomas Hellström
Details | Splinter Review
Replacement patch to see if there is a race causing this. (4.00 KB, patch)
2017-08-25 14:36 UTC, Thomas Hellström
Details | Splinter Review
debug log: concurrent waiting in xcb_wait_for_special_event() (19.05 KB, text/plain)
2017-09-17 18:27 UTC, haro41
Details
Patch to protect the loader_dri3_drawable struct (7.95 KB, patch)
2017-09-19 18:00 UTC, Thomas Hellström
Details | Splinter Review
protection in action, longer debug log (811.55 KB, application/x-xz)
2017-09-20 18:23 UTC, haro41
Details

Description haro41 2017-08-22 11:40:36 UTC
On latest mesa git (17.3-dev) WarThunder freezes with vsync activated.

The main problem: 
a system consumes significantly more power (+90W in my case), with vsync deactivated.

Switching back to mesa 17.2-rc5 or disabling vsync (vblanc=0), are solutions to make it work, atm.

Here my system specs:
(glxinfo |grep OpenGL)

OpenGL vendor string: X.Org
OpenGL renderer string: AMD Radeon (TM) R9 380 Series (TONGA / DRM 3.19.0 / 4.13.0-rc5+, LLVM 6.0.0)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.3.0-devel (git-46a8c4ef81)
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 17.3.0-devel (git-46a8c4ef81)
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 17.3.0-devel (git-46a8c4ef81)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
OpenGL ES profile extensions:
Comment 1 Michel Dänzer 2017-08-23 02:04:04 UTC
Can you bisect which Mesa Git commit introduced the issue?
Comment 2 haro41 2017-08-23 15:00:02 UTC
Yes, i did it via 'git bisect'.

Here is the first related commit:

d5ba75f8881f0869dc16f71f7395514c0a35b6e2 is the first bad commit
commit d5ba75f8881f0869dc16f71f7395514c0a35b6e2
Author: Thomas Hellstrom <thellstrom@vmware.com>
Date:   Tue Jun 20 19:24:34 2017 +0200

    st/dri2 Plumb the flush_swapbuffer functionality through to dri3
    
    Implement the state tracker manager drawable interface flush_swapbuffer
    method by plumbing it through to dri3 if available.
    
    Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
    Reviewed-by: Marek Olšák <marek.olsak@amd.com>
    Reviewed-by: Brian Paul <brianp@vmware.com>
    Reviewed-by: Sinclair Yeh <syeh@vmware.com>

:040000 040000 8df730d2ac95b42435c96043da0eb6fba5f6861c 4179b3bb9a075169627eb00de5780bbbe8abea02 M	src


I hope it makes sense and can help you.
Comment 3 Michel Dänzer 2017-08-23 15:32:34 UTC
Thomas, any ideas?

(In reply to haro41 from comment #2)
> Yes, i did it via 'git bisect'.

Thanks. Any chance you can get a backtrace[0] of the hanging process?

[0] Ideally of all threads, something like "thread apply all bt full" in gdb.
Comment 4 haro41 2017-08-23 17:51:02 UTC
Created attachment 133719 [details]
gdb all tread backtrace
Comment 5 Thomas Hellström 2017-08-23 18:02:31 UTC
This looks odd. 

That commit actually only adds a wait for all swaps to be scheduled at glFinish(), so it shouldn't really be causing any grief unless the server somehow forgets to send the right events or the dri3 wait_for_sbc is broken...
Comment 6 haro41 2017-08-23 19:18:48 UTC
BTW:

... setting environment variable LIBGL_DRI3_DISABLE (to switch back to DRI2)
fixes the freeze too ...
Comment 7 Thomas Hellström 2017-08-25 11:23:10 UTC
Created attachment 133771 [details] [review]
Patch to see if there might be a race causing this

@haro41: Could you test the attached dri3_mutex.diff and see if there is a change in behaviour?
Comment 8 haro41 2017-08-25 12:58:55 UTC
@Thomas,

i got two rejects when trying to apply the patch.

Let me sync to your base version first, to avoid additional diffs,
where/when did you branch exactly?
Comment 9 Thomas Hellström 2017-08-25 14:36:26 UTC
Created attachment 133776 [details] [review]
Replacement patch to see if there is a race causing this.
Comment 10 Thomas Hellström 2017-08-25 14:36:59 UTC
(In reply to haro41 from comment #8)
> @Thomas,
> 
> i got two rejects when trying to apply the patch.
> 
> Let me sync to your base version first, to avoid additional diffs,
> where/when did you branch exactly?

My mistake. Added a new patch based on 0cc4c7e3.
Comment 11 haro41 2017-08-25 16:18:09 UTC
i applied your patch successful, still the freezes, maybe in average a bit later now.

The behavoir changed a bit:

before patch:

vblank_mode=2 (default)-> always freezes inside 0..2 minutes runtime, 
                          framerate fix/clamped at 50(as expected)
vblank_mode=0          -> no freezes at all, dynamic, high framerates 
LIBGL_DRI3_DISABLE=1   -> no freezes at all, framerate fix at 50 


after patch:

vblank_mode=2 (default)-> always freezes inside 0..2 minutes runtime, 
                          framerate fix/clamped at 100(!!)
vblank_mode=0          -> no freezes at all, dynamic, high framerates 
LIBGL_DRI3_DISABLE=1   -> no freezes at all, framerate fix at 50 


To be honest, i am not familiar enough with DRM internals to understand what exactly happens here, but it looks like something is broken in respect to DRI 3 usage.

Somehow i think i could be the only one with this freezes and to ensure i am not wasting your time:
Can you give me a hint, where i should look first to exclude it is something specific to my system/setup?
Comment 12 Thomas Hellström 2017-08-25 17:01:12 UTC
(In reply to haro41 from comment #11)
> i applied your patch successful, still the freezes, maybe in average a bit
> later now.
> 
> The behavoir changed a bit:
> 
> before patch:
> 
> vblank_mode=2 (default)-> always freezes inside 0..2 minutes runtime, 
>                           framerate fix/clamped at 50(as expected)
> vblank_mode=0          -> no freezes at all, dynamic, high framerates 
> LIBGL_DRI3_DISABLE=1   -> no freezes at all, framerate fix at 50 
> 
> 
> after patch:
> 
> vblank_mode=2 (default)-> always freezes inside 0..2 minutes runtime, 
>                           framerate fix/clamped at 100(!!)
> vblank_mode=0          -> no freezes at all, dynamic, high framerates 
> LIBGL_DRI3_DISABLE=1   -> no freezes at all, framerate fix at 50 
> 
> 
> To be honest, i am not familiar enough with DRM internals to understand what
> exactly happens here, but it looks like something is broken in respect to
> DRI 3 usage.
> 
> Somehow i think i could be the only one with this freezes and to ensure i am
> not wasting your time:
> Can you give me a hint, where i should look first to exclude it is something
> specific to my system/setup?


That's really weird :).

Actually I don't think anything's wrong with your setup, but rather that there's a multithreading bug in dri3 or the app. There's no concurrency protection at all in the dri3 client and I'm not sure that's correct. I think you're the only one seeing this possibly perhaps because you're the first to try it with a heavily multithreaded application.

Anyway, I'm OK with commenting out the glFinish() wait for swapbuffers until someone has the possibility to debug this thoroughly. Unfortunately WarThunder doesn't run on vmware's svga driver (yet) due to bugs...

It would also be good to try to rule out server side radeon dri3 problems. Perhaps by running it on nouveau or intel...
Comment 13 haro41 2017-08-25 17:35:25 UTC
Ok, that makes sense for me, thank you :)
Comment 14 Michel Dänzer 2017-08-26 05:09:05 UTC
(In reply to Thomas Hellström from comment #12)
> It would also be good to try to rule out server side radeon dri3 problems.
> Perhaps by running it on nouveau or intel...

Or simply the modesetting Xorg driver. A server-side issue could be in the xserver Present code used by all drivers though.
Comment 15 haro41 2017-08-26 16:03:08 UTC
... i found this related and interesting blog:

https://keithp.com/blogs/DRM-lease-4/

Seems there is something WIP in respect to DRM synchronisation and this very bug.
Comment 16 Michel Dänzer 2017-08-28 01:37:36 UTC
DRM leases have nothing to do with this issue.

Have you got a chance to test if this also happens with the Xorg modesetting driver?
Comment 17 haro41 2017-08-28 12:08:42 UTC
@Michel,

i did just now, but WarThunder freeze behavoir didn't really change.


xorg.conf:
Section "Device"
    Identifier "AMD"
    Driver "modesetting"
EndSection


DRI 3 is used per default too (X.Org X Server 1.19.3).


BTW: 
i have tested with my older pitcairn (HD7870), trying amdgpu and radeon kernel driver. The behavoir is the same as with tonga in both cases.
Comment 18 Thomas Hellström 2017-09-07 10:17:23 UTC
FWIW, I got it running under dri3/vsync with the svga driver with no apparent issue.

It also runs fine with modesetting/svga although there is no true vsync since the kernel module flips pages instantly.
Comment 19 Thomas Hellström 2017-09-07 14:34:35 UTC
What happens if you run in windowed mode + vsync?
Comment 20 haro41 2017-09-08 14:53:20 UTC
@Thomas,

i get freezes in windowed mode with activated vsync too (tried with latest git).
Comment 21 haro41 2017-09-17 17:43:07 UTC
... looks like the reason for freezing, is a concurrent waiting in xcb_wait_for_special_event(..).

While the main thread is waiting for present related events, another thread is consuming this events (because he was the first one entering the wait) and the main thread is waiting for ever (freeze).

I will attach the debug log for some frames before the freeze.



@Thomas, 
if my frame rate is lower (FPS < Monitor Sync, because of to much debug output), i don't get any freezes. Could this be the reason why you can't reproduce the freezes with svga-stack?
Comment 22 haro41 2017-09-17 18:27:42 UTC
Created attachment 134297 [details]
debug log: concurrent waiting in xcb_wait_for_special_event()

This command's are used for logging (all in 'src/loader/loader_dri3_helper.c'):

printf("%4x =>dri3_handle_present_event: XCB_PRESENT_COMPLETE_NOTIFY: serial:%u \n", (uint16_t)pthread_self(), ce->serial);

printf("%4x =>dri3_handle_present_event: XCB_PRESENT_EVENT_IDLE_NOTIFY: pixmap:%u \n", (uint16_t)pthread_self(), ie->pixmap);

printf("%4x =>xcb_wait_for_special_event in dri3_wait_for_event: send_sbc:%lu recv_sbc:%lu\n", (uint16_t)pthread_self(), draw->send_sbc, draw->recv_sbc);

printf("%4x =>xcb_wait_for_special_event in dri3_find_back:      send_sbc:%lu recv_sbc:%lu\n", (uint16_t)pthread_self(), draw->send_sbc, draw->recv_sbc);

printf("%4x =>loader_dri3_swapbuffer_barrier:                    send_sbc:%lu recv_sbc:%lu\n", (uint16_t)pthread_self(), draw->send_sbc, draw->recv_sbc);



'9240' is obviously the main thread.
Comment 23 Thomas Hellström 2017-09-19 18:00:28 UTC
Created attachment 134344 [details] [review]
Patch to protect the loader_dri3_drawable struct

So here is a patch that doesn't fully make dri3 drawables thread-safe, but it should at least make sure threads don't steal events from eachother.

Please try,
Thomas
Comment 24 haro41 2017-09-19 21:15:09 UTC
I tested your patch (~20 minutes): 

No freezes at all, good work!

I will continue later and meanwhile i'am trying to understand what the meanings of all that different xx_swap_buffers() functions/callbacks could be :)

Thanks,
Jens
Comment 25 Thomas Hellström 2017-09-19 21:32:16 UTC
Comment on attachment 134344 [details] [review]
Patch to protect the loader_dri3_drawable struct

OK, thanks, that's good to know.

Note the patch isn't complete yet. Just enough to verify what the problem was.
Comment 26 haro41 2017-09-20 18:23:49 UTC
Created attachment 134383 [details]
protection in action, longer debug log

adapted debug log (longer test), showing current protection at work ...

No freezes and no other visible issues currently.
Comment 27 haro41 2017-10-31 09:39:37 UTC
@Thomas,

any chance to finally fix this for the soon released mesa 17.3?
Comment 28 Thomas Hellström 2017-11-02 10:50:59 UTC
Hi!

We can probably pave over this specific problem for the release, but making dri3 fully thread-safe is a much larger task, which I will not have time for before the release.

BTW are you running with mesa glthread? In that case, could you test with master mesa and

export mesa_glthread=false

/Thomas
Comment 29 haro41 2017-11-02 16:35:51 UTC
I tried both: mesa_glthread=false/true, it doesn't make a difference in respect to this issue.

It think other applications/games could be affected by this problem too, so maybe temporary reverting the changes in dri2_flush_swapbuffers() would make sense? 

(this is currently my approach to avoid the freezes)
Comment 30 Thomas Hellström 2017-11-02 16:41:47 UTC
Thanks for testing. 

But if I understand you correctly the "patch to protect the loader_dri3_drawable struct" fixes the issue on your side, right? If so, I'd rather push a somewhat polished version of that patch...
Comment 31 haro41 2017-11-02 20:40:42 UTC
Yes, your last patch worked flawless here and if you could provide a polished version just let me know, i am ready to test it.
Comment 32 Thomas Hellström 2017-11-03 14:02:53 UTC
Slightly polished patch available here...


https://lists.freedesktop.org/archives/mesa-dev/2017-November/175373.html
Comment 33 haro41 2017-11-03 19:56:59 UTC
No freezes, works great for me.
Comment 34 Thomas Hellström 2017-11-03 20:40:31 UTC
(In reply to haro41 from comment #33)
> No freezes, works great for me.

Want to add a Tested-by: tag?

/Thomas
Comment 35 haro41 2017-11-05 16:33:29 UTC
(In reply to Thomas Hellström from comment #34)
> (In reply to haro41 from comment #33)
> > No freezes, works great for me.
> 
> Want to add a Tested-by: tag?
> 
> /Thomas

... if it helps, but where and how to add this tag?
Comment 36 Thomas Hellström 2017-11-05 17:30:01 UTC
(In reply to haro41 from comment #35)
> (In reply to Thomas Hellström from comment #34)
> > (In reply to haro41 from comment #33)
> > > No freezes, works great for me.
> > 
> > Want to add a Tested-by: tag?
> > 
> > /Thomas
> 
> ... if it helps, but where and how to add this tag?

It's added by me to the commit message before pushing, to indicate that you've tested the patch. A tested by tag typically looks like

Tested-by: Firstname Lastname <haro41@gmx.de>

So if you want me to do that I'll need your first and last name.

/Thomas
Comment 37 haro41 2017-11-06 16:08:37 UTC
Ok, thanks for clarification. 
I prefer not to add such tag, because this is my anonymous email address, dedicated to things like to games.

/Jens
Comment 38 Thomas Hellström 2017-11-13 13:14:23 UTC
Fix has now been pushed to mesa master.
Comment 39 haro41 2017-11-14 14:58:20 UTC
Thank you, problem fully solved for me.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.