Created attachment 139997 [details]
Apitrace dump of citra showing the performance issue
calling glReadPixels for the GL_DEPTH24_STENCIL8 attachment and the GL_DEPTH_STENCIL / GL_UNSIGNED_INT_24_8 format seems to stall the GPU, even if a PBO is bound. 92% of my CPU time is spend in intel_miptree_map.
I guess i965 lacks a copy shader for the tiled Z24S8 attachment to the linear TBO buffer.
Tests was done on: Mesa DRI Intel(R) UHD Graphics 620 (Kabylake GT2)
and mesa: 4affeba1e9eb426a1ba13a3e8ced4673c4bb9b34
An apitrace dump which highlights this issue in the last frames is attached.
glReadPixels path for blorp (and tiled memcpy path) has been enabled for color buffers only, this case falls back to Mesa frontend.