Bug 102204

Summary: GL_ARB_buffer_storage crippled extension on r600, radeonsi and amdgpu Mesa drivers
Product: Mesa Reporter: Aaron Paden <aaronbpaden>
Component: Mesa coreAssignee: mesa-dev
Status: RESOLVED MOVED QA Contact: mesa-dev
Severity: normal    
Priority: medium CC: awesie, dark_sylinc, farmboy0+freedesktop, fdsfgs, john.ettedgui, mirh
Version: 18.3   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
URL: https://github.com/gonetz/GLideN64/issues/1561
Whiteboard:
i915 platform: i915 features:
Attachments: glxinfo
glxinfo2
glxinfo3
glxinfo4

Description Aaron Paden 2017-08-14 01:40:49 UTC
I'm using:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland PRO [Radeon R7 240/340]
Mesa: 17.1.6
Linux: 4.12.6

This is somewhat related to #98724.

I decided to give GLideN64 another try since it now works with Rogue Squadron, but I've found that while it does render (I wouldn't say that #98724 has been fixed. I just think that GLideN64/GLupeN64 no longer use glDrawElementsIndirect) it's still unplayably slow. The Windows drivers do work fine on the same hardware.

For this test, I was using the prebuilt and configured version found here: https://m64p.github.io/ the executable is mupen64plus-gui. Because of #97226 it has to be launched from the terminal.

The download link doesn't work with HTTPS Everywhere for some reason. Source is here: https://github.com/m64p/mupen64plus-GLideN64
Comment 1 H4nN1baL 2018-01-22 07:41:06 UTC
Created attachment 136888 [details]
glxinfo
Comment 2 H4nN1baL 2018-01-22 07:58:46 UTC
Same problem here, but on r600.

Distro: Xubuntu 16.04.3 LTS

Hardware:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Juniper PRO [Radeon HD 5750] (prog-if 00 [VGA controller])
	Subsystem: Micro-Star International Co., Ltd. [MSI] Juniper PRO [Radeon HD 5750]

cat /usr/share/X11/xorg.conf.d/xorg.conf
Section "Device"
   Identifier "Radeon"
   Driver "radeon"
   Option "Accel" "on"
   Option "ColorTiling" "on"
   Option "ColorTiling2D" "on"
   Option "DRI" "3"
   Option "TearFree" "on"
   Option "AccelMethod" "glamor"
EndSection

Current glxinfo: https://bugs.freedesktop.org/attachment.cgi?id=136888

Previous glxinfo: https://0x0.st/CHU.txt

This problem is related to buffer storage usage.
More info: https://github.com/gonetz/GLideN64/issues/1561#issuecomment-353778289
Comment 3 H4nN1baL 2018-01-24 10:28:41 UTC
Created attachment 136936 [details]
glxinfo2

Upgraded Kernel, xserver-xorg, Mesa, etc.

The problem disappears with "LIBGL_ALWAYS_SOFTWARE=1" and in more older versions of GLideN64 disappears with "MESA_EXTENSION_OVERRIDE=-GL_ARB_buffer_storage"
Comment 4 H4nN1baL 2018-03-11 07:16:20 UTC
 This problem is currently "solved" on the GLide64 side. But it reveals a serious problem with GL_ARB_buffer_storage extension.
 GL_UNSIGNED_BYTE with GL_ARB_buffer_storage extension it's extremely slow. GL_UNSIGNED_SHORT with GL_ARB_buffer_storage extension is a little better but is still slow. GL_UNSIGNED_BYTE without GL_ARB_buffer_storage is faster, but not work at full speed. GL_UNSIGNED_SHORT without GL_ARB_buffer_storage works perfect.

 The dangerous combination is GL_UNSIGNED_SHORT with GL_ARB_buffer_storage extension, that may end up destroying the hardware under the right conditions.

 I quote myself:

Sorry for the delay, but I had a "technical mishap" ...my video card is DEAD, stone-dead.
I am responsible for my unwise BIOS configuration (GART error reporting = Enabled + PCIE Spread Spectrum = Auto) and "pushing stress" on GPU to render at non-standard resolutions (monitor 1080p to 120Hz (max 144Hz), mupen64plus windowed to 1400x1050) using a buggy extension like some kind of "steeplechase generator" for benchmarking. Well, I reaped what I sowed.

 Original post there: https://github.com/gonetz/GLideN64/pull/1738#issuecomment-369848827

Full history:
https://github.com/gonetz/GLideN64/issues/1561
https://bugs.freedesktop.org/show_bug.cgi?id=105256
https://github.com/gonetz/GLideN64/pull/1735
https://github.com/gonetz/GLideN64/pull/1738

cya
Comment 5 H4nN1baL 2018-03-31 14:34:44 UTC
Created attachment 138463 [details]
glxinfo3

 This problem persist even on amdgpu (MSI Radeon RX 560 AERO ITX 4G OC). The problem not exist using nouveau (Gigabyte NVIDIA GeForce 210 GPU - GV-N210D3-1GI).
Comment 6 H4nN1baL 2018-04-02 08:00:57 UTC
Created attachment 138484 [details]
glxinfo4

Upgrade to Mesa 18.1 (https://launchpad.net/~paulo-miguel-dias/+archive/ubuntu/mesa/)
I perceive the fixed bugs on 'auto' TearFree. Unfortunately, the issue with buffer storage remains untouched.

The problem is very clear by setting the window resolution to 1280x960 or above. And compare the difference in performance by using 'MESA_EXTENSION_OVERRIDE=-GL_ARB_buffer_storage' on mupen64plus or m64p. Doesn't need to play ROMs to see the problem, even with 'm64p_test_rom.v64' happens. (https://github.com/mupen64plus/mupen64plus-rom)

Other workaround is using this "patch" on GLideN64: https://github.com/Jj0YzL5nvJ/GLideN64/commit/cc5d063c841fd53c26df4a1f14fa7ffdf10f1c18
Comment 7 mirh 2018-09-24 14:26:01 UTC
Could the comments here be of any help?
https://github.com/acomminos/wine-pba/blob/master/patches/0009-wined3d-Add-quirk-to-use-GL_CLIENT_STORAGE_BIT-for-m.patch

I.e. did anybody try to check for GL_CLIENT_STORAGE_BIT flag?
Comment 8 H4nN1baL 2018-11-03 13:26:59 UTC
The addition of that flag makes the difference in this part:
https://github.com/gonetz/GLideN64/blob/7aa360c9007d5b5f8c020d68341585e1f5b24b03/src/Graphics/OpenGLContext/opengl_ColorBufferReaderWithBufferStorage.cpp#L34-L35
With it the problem disappears completely.

I can also add it here:
https://github.com/gonetz/GLideN64/blob/7aa360c9007d5b5f8c020d68341585e1f5b24b03/src/Graphics/OpenGLContext/opengl_BufferedDrawer.cpp#L15-L16
I don't see any difference... Is it only necessary to add it to the first part?
Comment 9 Matias N. Goldberg 2018-11-03 16:51:03 UTC
Disclaimer: I'm not a Mesa dev.

I saw this ticket by accident and since I'm a heavy user of GL_ARB_buffer_storage on AMD+Mesa, I took a look in case there's something that could affect me.

After glancing through the problems, it appears that the problem is "user error" and the real bug here are the lack of performance warnings to report via KHR_debug.

GL_ARB_buffer_storage can backfire if you do not use it well.
It would appear that you're performing several operations that are not supported in HW (such as the use of GL_UNSIGNED_BYTE which requires converting all indices to GL_UNSIGNED_SHORT in SW) and must be processed to work correctly.

Or do stuff that is a minefield, like issuing a loop of glMemoryBarrier+glDrawElementsBaseVertex PER TRIANGLE while reading from a persistent mapped buffer. That is just not gonna work fast.

The reason GL_CLIENT_STORAGE_BIT improves things is because the buffer is stored on a CPU buffer (i.e. a good ol' malloc), and when it's time to render, Mesa does the SW conversions for you, then uploads the data to GPU.
When that flag is not present, Mesa must be constantly downloading & uploading data from GPU back & forth to accommodate whatever you're doing that is not natively supported.

The rule of thumb is that persistent mapped memory should be treated as write only, read once memory.
If I'm writing view matrices and then render that once, I write directly to the the persistent mapped buffer and use it directly on the shader.
If I'm writing material/pass data that is reused multiple times on multiple draws, I write it once to the persistent mapped buffer and then perform a glCopyBufferSubData to another buffer that is not CPU visible at all.

Btw avoid using GL_MAP_COHERENT_BIT. Your calls to glReadPixels to directly store into the PBO backed by coherent memory could be hurting you a lot. glReadPixels must do deswizzling; it's not a raw memcpy on the GPU side. Keeping both the CPU & GPU caches in sync is foolishness here.
If you insist on using GL_MAP_COHERENT_BIT for your glReadPixels; perform the glReadPixels onto a PBO buffer that is not CPU visible (to perform deswizzling), then perform a glCopyBufferSubData from that PBO into another PBO (that is now CPU visible).
Comment 10 Matias N. Goldberg 2018-11-03 18:25:08 UTC
OK this left me wondering.

I looked further into what happens when you call glBufferStorage, and it seems that buffer_usage in st_cb_bufferobjects.c https://github.com/anholt/mesa/blob/master/src/mesa/state_tracker/st_cb_bufferobjects.c#L226 stores all buffers in VRAM unless GL_CLIENT_STORAGE_BIT is used because it always returns PIPE_USAGE_DEFAULT.

The chosen pipe_resource_usage will end up in si_init_resource_fields (https://github.com/anholt/mesa/blob/master/src/gallium/drivers/radeonsi/si_buffer.c#L103) which ends up putting the buffers all buffers in VRAM with Write Combining (unless GL_CLIENT_STORAGE_BIT is set)

That is... an odd choice. This is specially bad for buffers requested with GL_MAP_READ_BIT flags, which clearly should not be stored in VRAM with WC bits set.

This indeed looks like a bug to me. Unfortunately, these hint flags don't map really well to HW.
Comment 11 GitLab Migration User 2019-09-18 20:26:33 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1015.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.