Bug 34674

Summary: [ilk, wine] FEAR / depth-stencil rendering triggers SW fallback
Product: Mesa Reporter: Tobias Jakobi <liquid.acid>
Component: Drivers/DRI/i965Assignee: Paul Berry <stereotype441>
Status: RESOLVED WONTFIX QA Contact:
Severity: normal    
Priority: medium CC: hverbeet, imamdxl8805, jbarnes, kenneth, mattst88
Version: git   
Hardware: Other   
OS: All   
URL: http://www.gamershell.com/download_10167.shtml
Whiteboard:
i915 platform: i915 features:

Description Tobias Jakobi 2011-02-24 10:18:18 UTC
System:
Intel Corporation Arrandale Integrated Graphics Controller

vanilla-kernel 2.6.36.2
libdrm, mesa and xf86-video-intel git master
xorg-server-1.9.4

mesa git master = f19439940c43aa9d937716c6f1ee70cc26799e08

Another problem on my i965, this time triggered by wine / FEAR.

Link to game demo:
http://www.gamershell.com/download_10167.shtml

I already partially described the issue here on the wine bugtracker:
http://bugs.winehq.org/show_bug.cgi?id=26229

In the demo, after exiting the first underground passage, the performance collapses. My guess still is the rendering of the waterplane that exists in that area.

Checking with INTEL_DEBUG=fall shows that the software fallback in intelCopyTexImage2D is triggered. Internal format is GL_DEPTH24_STENCIL8, target is GL_TEXTURE_2D in intelCopyTexImage2D at this point.

I followed the calls into intel_copy_texsubimage and there it returns GL_FALSE because of the Y-tiling blitter issue. I then switched tiling off with driconf and continued testing. After that the GL_FALSE comes from intelEmitCopyBlit.

In intelEmitCopyBlit I slowly gave up and only tried to disable the tiling and offset checks at the beginning of the function. Which kinda works. No fallback anymore and I can't spot any obvious render corruptions. Pretty hackish of course.

So we've got this callstack:
intelCopyTexImage2D
  intel_copy_texsubimage
    intelEmitCopyBlit <- bails out because of the tiling/offset checks

My questions are now:
- Is this fallback justified? Why does this FBO get Y-tiling?
- What is doing wine wrong?
- What could wine do better to avoid the fallback?

Greets,
Tobias
Comment 1 Tobias Jakobi 2011-02-24 11:00:00 UTC
Still testing with texture tiling disabled (driconf).
intelEmitCopyBlit fails because the source is Y-tiled. Not idea what the source is here though.
Comment 2 Chris Wilson 2011-02-24 13:12:01 UTC
The BLT unit simply cannot handle Y-tiling, which is being used for the depth buffer. We need a 3D transfer function that can handle the blit and any potential swizzling (useful for handling format conversion as well). [This is reminiscent of gallium's pipe transfer functions.]
Comment 3 Eric Anholt 2011-02-24 16:28:18 UTC
There's not much we can do about a request to CopyTexSubImage DEPTH24_STENCIL8.  Depth buffers have to be tiled Y if tiled, and they have to be tiled for performance in general (and always tiled on newer chipsets, since anything else would be insane).  If Wine only needed the depth component, then I think existing CopyTexSubImage support would work.

Without app changes, the only thing I could think of doing would be to check if the tiling in src and dst happened to line up exactly along tile boundaries and so do an "X" tiled blit even though the data transferred is Y.

On snb, the blitter docs say it can do Y-tiled, but doesn't say where to set a bit saying it's Y-tiled.
Comment 4 Tobias Jakobi 2011-02-24 17:19:31 UTC
See line 4206 of
http://source.winehq.org/git/wine.git/?a=blob;f=dlls/wined3d/surface.c;h=b7f4b1c4e0840ade3a7951c05f8ea6de45f3756d;hb=HEAD
for how wined3d calls glCopyTexImage2D.

I have no idea if there is "room for optimization" in this case...
Comment 5 Chris Wilson 2011-02-25 00:39:17 UTC
(In reply to comment #3)
> On snb, the blitter docs say it can do Y-tiled, but doesn't say where to set a
> bit saying it's Y-tiled.

It's a register bit set for the duration of the batch that converts the BLT_TILED flags from meaning X-tiling to Y-tiling. How I rejoiced!
Comment 6 Henri Verbeet 2011-02-28 07:04:29 UTC
(In reply to comment #3)
> would be insane).  If Wine only needed the depth component, then I think
> existing CopyTexSubImage support would work.
> 
We currently only do something with the depth component, but that's actually a bit of a bug. In the long term I expect the code in question to go away for implementations with FBOs though.

> Without app changes, the only thing I could think of doing would be to check if
> the tiling in src and dst happened to line up exactly along tile boundaries and
> so do an "X" tiled blit even though the data transferred is Y.
> 
> On snb, the blitter docs say it can do Y-tiled, but doesn't say where to set a
> bit saying it's Y-tiled.
I'm not very familiar with i965, but is there no way to detile / texture from a tiled buffer?
Comment 7 Tobias Jakobi 2011-04-19 16:03:20 UTC
Reconfirming with latest mesa git master (08d1c91e6c185a186e49189b7ed48629f35a4659)
Comment 8 Tobias Jakobi 2011-05-19 09:23:29 UTC
Reconfirming with latest mesa git master
(339544f4bbef9be1b4b3465f28482b9699a99692)
Comment 9 Kenneth Graunke 2011-05-20 14:27:15 UTC
Reassigning to Chad, as he's been looking into depth and stencil related stuff lately.
Comment 10 Eric Anholt 2011-06-01 12:26:07 UTC
Henri, the problem is getting the stencil value moved.  If it was just depth wine was asking for, then we could texture and store that value (the existing path).  However, without a way to texture-sample stencil values, it's tough.  One could imagine reinterpreting the existing D24S8 texture as ARGB8888, then doing multiple passes of stencil updates to hit a bit at a time, like the drawpixels code.

If we get separate stencil working on Ironlake, that could make things better, as then at least the stencil values are separate from the depth values, and more tractable to move around. (or at least there's just less to be moved about in a stencil fallback).
Comment 11 Henri Verbeet 2011-06-04 12:22:25 UTC
(In reply to comment #10)
> Henri, the problem is getting the stencil value moved.  If it was just depth
> wine was asking for, then we could texture and store that value (the existing
> path).  However, without a way to texture-sample stencil values, it's tough. 
> One could imagine reinterpreting the existing D24S8 texture as ARGB8888, then
> doing multiple passes of stencil updates to hit a bit at a time, like the
> drawpixels code.
> 
Perhaps we could just always pass glCopyTexImage2D() GL_DEPTH_COMPONENT for internal format in wined3d. As long as that ends up creating something with an appropriate depth format that should work out ok.
Comment 12 Henri Verbeet 2011-06-08 10:29:51 UTC
Could you give this a try with Wine from current git? We use GL_DEPTH_COMPONENT now. If you prefer released versions, 1.3.22 should be released at the end of this week.
Comment 13 Tobias Jakobi 2011-06-09 12:17:58 UTC
Did you mean me?

Well, yeah, I just tried it with a fresh git master checkout of wine (7cd20a413e1ab212f032269e4025aba18d0292af) and a maybe 3 days old mesa git checkout. Still got the fallback at the exact same position in the game. Also got some massive blueish artifacts around what I guess are the shadow volumes.

I then updated mesa git to 6861a701772eac3a6a7d3136d03efa7ac7e5c026. Wanted to test if I could at least remove the artifacts. Wasn't such a good idea. The driver now crashes early in emit_depthbuffer when launching the game.

Backtrace:
=>0 0x7d6a92c5 emit_depthbuffer+0x455() in i965_dri.so (0x00f9e338)
  1 0x7d6b1dd7 brw_upload_state+0x266() in i965_dri.so (0x00f9e388)
  2 0x7d69fd2c brw_draw_prims+0x4cb() in i965_dri.so (0x00f9e438)
  3 0x7d785afa vbo_draw_arrays+0xc9() in i965_dri.so (0x00f9e4b8)
  4 0x7d803225 _mesa_meta_Clear+0x254() in i965_dri.so (0x00f9e568)
  5 0x7d67643e intelClear+0x16d() in i965_dri.so (0x00f9e5a8)
  6 0x7d86be68 _mesa_Clear+0x177() in i965_dri.so (0x00f9e5f8)
  7 0x7da09837 glClear+0x26() in libgl.so.1 (0x00f9e618)
  8 0x7ea2116d device_clear_render_targets+0x9dc() in wined3d (0x00f9e6e8)
  9 0x7ea21961 wined3d_device_clear+0x110() in wined3d (0x00f9e768)
  10 0x7ea231d0 wined3d_device_init_3d+0x45f() in wined3d (0x00f9e818)
  11 0x7eb20bb0 device_init+0x24f() in d3d9 (0x00f9e9f8)
  12 0x7eb22351 IDirect3D9Impl_CreateDevice+0x90() in d3d9 (0x00f9ea48)
Comment 14 Tobias Jakobi 2011-06-11 09:25:08 UTC
New test with 4176025d463e7733dac19788b45b6472b65d62d4 (97d230b0bcf8ed001f685ebac314fbd8e1955718 fixed the segfault problem I mentioned in comment #13), but the fallback is still there.
Comment 15 Tobias Jakobi 2011-07-03 06:03:24 UTC
Entering the 'critical' area now produces a segmentation fault in the DRI driver.

Backtrace:
=>0 0x7d9b78f2 instruction_scheduler::add_dep+0x42(this=0xf8e248, before=0x6e074f78, after=0x6e07adc0, latency=0x6e074e30) [/usr/local/git-trees/mesa-git/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp:189] in i965_dri.so (0x00f8e0fc)
  1 0x7d9b7a8a instruction_scheduler::add_dep+0x29(this=0xf8e248, before=0x6e074f78, after=0x6e07adc0) [/usr/local/git-trees/mesa-git/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp:220] in i965_dri.so (0x00f8e11c)
  2 0x7d9b7daa instruction_scheduler::calculate_deps+0x119(this=0xf8e248) [/usr/local/git-trees/mesa-git/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp:311] in i965_dri.so (0x00f8e20c)
  3 0x7d9b83e7 fs_visitor::schedule_instructions+0xa6(this=0xf8e2e4) [/usr/local/git-trees/mesa-git/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp:529] in i965_dri.so (0x00f8e27c)
  4 0x7d99afb7 fs_visitor::run+0x1b6(this=0xf8e2e4) [/usr/local/git-trees/mesa-git/src/mesa/drivers/dri/i965/brw_fs.cpp:1558] in i965_dri.so (0x00f8e2bc)
  5 0x7d99b9ec brw_wm_fs_emit+0x79b(brw=0x7c7c15b0, c=0x79ba5020, prog=0x6e06c688) [/usr/local/git-trees/mesa-git/src/mesa/drivers/dri/i965/brw_fs.cpp:1637] in i965_dri.so (0x00f9e76c)
  6 0x7d983c41 do_wm_prog+0xf0() in i965_dri.so (0x00f9e7dc)
  7 0x7d99a362 brw_fs_precompile+0x141(ctx=0x7c7c15b0, prog=0x6e06c688) [/usr/local/git-trees/mesa-git/src/mesa/drivers/dri/i965/brw_fs.cpp:1708] in i965_dri.so (0x00f9e87c)
  8 0x7d9b8ea4 brw_shader_precompile+0x23(ctx=0x7c7c15b0, prog=0x6e06c688) [/usr/local/git-trees/mesa-git/src/mesa/drivers/dri/i965/brw_shader.cpp:67] in i965_dri.so (0x00f9e89c)
  9 0x7dad49d9 _mesa_glsl_link_shader+0x228(ctx=0x7c7c15b0, prog=0x6e06c688) [/usr/local/git-trees/mesa-git/src/mesa/program/ir_to_mesa.cpp:3268] in i965_dri.so (0x00f9e8dc)
  10 0x7da218fa link_program+0x79() in i965_dri.so (0x00f9e91c)
  11 0x7dd40b87 glLinkProgramARB+0x26() in libgl.so.1 (0x00f9e93c)
  12 0x7eafaad6 in wined3d (+0x6aad5) (0x00f9e98c)
  13 0x7eb58783 in wined3d (+0xc8782) (0x00f9ea3c)
  14 0x7eb58ddb in wined3d (+0xc8dda) (0x00f9eacc)
  15 0x7eaf4f6b in wined3d (+0x64f6a) (0x00f9ed4c)
  16 0x7eaca368 wined3d_device_draw_indexed_primitive+0xd7() in wined3d (0x00f9edac)
  17 0x7ebcfbb5 in d3d9 (+0xfbb4) (0x00f9edfc)
  18 0x0050f688 in fearspdemo (+0x10f687) (0x0ab2cb08)
  19 0x0ab2cb08 (0x0ab364f0)
  20 0x5c737265 (0x64616873)
Comment 16 Kenneth Graunke 2011-07-09 00:51:30 UTC
The crashes in the instruction scheduler sound familiar, and I think ought to be fixed by now.  With commit 86e62b2357447b7c (intel: Mark MESA_FORMAT_X8_Z24 as always supported.) and the Wine changes (to use GL_DEPTH_COMPONENT), we've eliminated the stencil problem.

I suspect the fallback will still be there, as even an X8_Z24 depth buffer will still be Y-tiled, so we can't use Ironlake's blitter to do a CopyTexSubImage.  PlaneShift also hits this software fallback path.

There ought to be a way to do this on the GPU, though, even on Ironlake.  If nothing else, it should be possible to create shaders that sample from the depth buffer at the appropriate coordinates and just draw a quad.
Comment 17 Tobias Jakobi 2011-07-09 01:23:36 UTC
Thanks Kenneth for the update, I'm going to check this again on Sunday. :)
Comment 18 Kenneth Graunke 2011-07-09 03:05:47 UTC
Come to think of it, Mesa's metaops already implement something to that effect (using glReadPixels, I believe).  It may just work with master.
Comment 19 Tobias Jakobi 2011-07-10 08:09:07 UTC
Nope, just tested. Still hits the fallback with up-to-date mesa git master and wine git master.
Comment 20 Chad Versace 2011-07-25 16:54:34 UTC
Ken said:
> There ought to be a way to do this on the GPU, though, even on Ironlake.  If
> nothing else, it should be possible to create shaders that sample from the
> depth buffer at the appropriate coordinates and just draw a quad.
> 
> Come to think of it, Mesa's metaops already implement something to that effect
> (using glReadPixels, I believe).  It may just work with master.

_mesa_meta_CopyTexSubImage2D() has no clever magic. It accomplishes the copy with _swrast_ReadPixels() followed by memcpy(). And _swrast_ReadPixels uses the slow span functions in intel_span.c.

Eric and ickle, have you had any success with toggling the Y-tile blit register?
Comment 21 Tobias Jakobi 2011-08-31 15:20:56 UTC
Issue is still present with latest mesa git master (82fff5f3edfd2f6396a872a12d753b2ab90edd7b).
Comment 22 Md Imam Hossain 2011-09-01 21:48:08 UTC
(In reply to comment #1)
> Still testing with texture tiling disabled (driconf).
> intelEmitCopyBlit fails because the source is Y-tiled. Not idea what the source
> is here though.

(In reply to comment #0)
> System:
> Intel Corporation Arrandale Integrated Graphics Controller
> 
> vanilla-kernel 2.6.36.2
> libdrm, mesa and xf86-video-intel git master
> xorg-server-1.9.4
> 
> mesa git master = f19439940c43aa9d937716c6f1ee70cc26799e08
> 
> Another problem on my i965, this time triggered by wine / FEAR.
> 
> Link to game demo:
> http://www.gamershell.com/download_10167.shtml
> 
> I already partially described the issue here on the wine bugtracker:
> http://bugs.winehq.org/show_bug.cgi?id=26229
> 
> In the demo, after exiting the first underground passage, the performance
> collapses. My guess still is the rendering of the waterplane that exists in
> that area.
> 
> Checking with INTEL_DEBUG=fall shows that the software fallback in
> intelCopyTexImage2D is triggered. Internal format is GL_DEPTH24_STENCIL8,
> target is GL_TEXTURE_2D in intelCopyTexImage2D at this point.
> 
> I followed the calls into intel_copy_texsubimage and there it returns GL_FALSE
> because of the Y-tiling blitter issue. I then switched tiling off with driconf
> and continued testing. After that the GL_FALSE comes from intelEmitCopyBlit.
> 
> In intelEmitCopyBlit I slowly gave up and only tried to disable the tiling and
> offset checks at the beginning of the function. Which kinda works. No fallback
> anymore and I can't spot any obvious render corruptions. Pretty hackish of
> course.
> 
> So we've got this callstack:
> intelCopyTexImage2D
>   intel_copy_texsubimage
>     intelEmitCopyBlit <- bails out because of the tiling/offset checks
> 
> My questions are now:
> - Is this fallback justified? Why does this FBO get Y-tiling?
> - What is doing wine wrong?
> - What could wine do better to avoid the fallback?
> 
> Greets,
> Tobias

Hi Tobias Jakobi,

Can you direct me to the page, from where I can learn how to check software fallback or which Extensions of OpenGL causing performance problems in OpenGL, Thanks in advance.

I am getting severe fps drop in Call of Duty 4 in some areas of particular levels. From System Monitor tool, I can see that CPU is not being utilised in those problematic areas. So this is an CPU problem rather than GPU driver problem.

Everything works just like in Windows except performance.

My system spec is:

Intel GMA 4500
Intel Dual Core 2.20 GHz
RAM 3 GB
Linux 3, Xserver 1.10.3, xf86-Intel 2.16, Mesa 7.12-devel git from 1 Sep, 11, WINE 1.3.27
Comment 23 Tobias Jakobi 2011-09-02 02:01:47 UTC
Would you please stop spamming other people's bug with your unrelated questions?! Ask these questions on the mailing list, where they belong.
Comment 24 Tobias Jakobi 2011-09-29 16:14:51 UTC
Still happens with recent mesa git master (e4394fb19f735da3fad9340653637bbe54778069), but now gives different info (INTEL_DEBUG=fall):

intel_copy_texsubimage mismatched formats MESA_FORMAT_X8_Z24, MESA_FORMAT_S8_Z24
intelCopyTexSubImage2D - fallback to swrast
intelReadPixels: fallback to swrast
Comment 25 Chad Versace 2011-09-30 09:59:53 UTC
Tobias,

To avoid the sw falback, we must blit the depthbuffer, which is Y-tiled. But to blit a Y-tiled buffer, we must toggle a register bit that toggles X- or Y-tiled blitting.

This sw fallback does need to be eliminated, but I have not yet had the time to experiment with manipulating that bit. I'll inform you when we have patches.
Comment 26 Hai 2011-09-30 10:00:15 UTC
I am OOP on vacation from WW40.6 to WW41.5. E-mail response might be late.

Hai Lan
Comment 27 Tobias Jakobi 2011-09-30 10:28:57 UTC
@Chad: Sure, take your time! Wasn't trying to put the screws on you here :)
Comment 28 Md Imam Hossain 2011-11-06 00:38:21 UTC
getting similar fallback in the game 'Prince of Persia The Sands of Time' via WINE.

The interface just messed up like texts, menu but 3d is okay.

This used to be working fine from Mesa 7.7.1 up to 7.9 series, introduced by Mesa version 7.10 and up.

terminal log

intel_copy_texsubimage mismatched formats MESA_FORMAT_X8_Z24, MESA_FORMAT_S8_Z24
intelCopyTexSubImage2D - fallback to swrast
intelReadPixels: fallback to swrast

Thank you
Comment 29 Kenneth Graunke 2011-11-06 08:22:03 UTC
Please file a separate bug report for that.  While it may be the same cause, we don't know that for sure, and for clarity I'd rather keep one bug per application and platform.
Comment 30 Tobias Jakobi 2012-01-22 07:13:27 UTC
Just a quick bump, so that the bug won't be forgotten :)
Comment 31 Kenneth Graunke 2012-07-31 18:29:04 UTC
We can probably actually blit these on the GPU now using Paul's blorp path.  Reassigning to Paul - it might already be fixed on master - or if not I imagine Paul can fix it...
Comment 32 Paul Berry 2012-07-31 18:48:31 UTC
Yeah, the blorp path would probably be the best way to fix this bug.  Unfortunately, the blorp path is at present only implemented for Gen6 and above, so it would not be a trivial fix for Gen5.  If anyone wants to work on back-porting blorp to Gen5, I'd be happy to assist, but I'm unlikely to have time to work on it myself any time soon.
Comment 33 Henri Verbeet 2012-08-01 09:09:22 UTC
For what it's worth, there have been some changes to Wine recently (1.5.10) that are likely to avoid this path.
Comment 34 Tobias Jakobi 2012-08-01 09:33:33 UTC
(In reply to comment #33)
> For what it's worth, there have been some changes to Wine recently (1.5.10)
> that are likely to avoid this path.

You mean the change of the default state of AlwaysOffscreen?
Comment 35 Henri Verbeet 2012-08-01 09:39:12 UTC
Yes.
Comment 36 Md Imam Hossain 2012-08-01 11:05:03 UTC
But what about native games, like the one I filed

https://bugs.freedesktop.org/show_bug.cgi?id=51600

with Apitrace I have found similar evidence in Psychonauts game
Comment 37 Tobias Jakobi 2012-08-01 11:32:47 UTC
(In reply to comment #35)
> Yes.

I think I tested this issue with AlwaysOffscreen enabled, when you (or was it Stefan?) introduced it in August. It didn't change anything related to the fallback IIRC.
Comment 38 Henri Verbeet 2012-08-01 11:49:28 UTC
(In reply to comment #37)
> I think I tested this issue with AlwaysOffscreen enabled, when you (or was it
> Stefan?) introduced it in August. It didn't change anything related to the
> fallback IIRC.

There may be other sources of depth blits, in particular when the application explicitly does them itself using something like d3d9 IDirect3DDevice9::StretchRect(), but those should all be FBO blits, and fairly uncommon. The only depth/stencil blits using CopyTexImage2D() should come from the call in surface_load_ds_location() you found in comment 4, and that one should never happen with "AlwaysOffscreen" enabled.
Comment 39 Tobias Jakobi 2012-08-01 12:05:49 UTC
(In reply to comment #38)
> There may be other sources of depth blits, in particular when the application
> explicitly does them itself using something like d3d9
> IDirect3DDevice9::StretchRect(), but those should all be FBO blits, and fairly
> uncommon. The only depth/stencil blits using CopyTexImage2D() should come from
> the call in surface_load_ds_location() you found in comment 4, and that one
> should never happen with "AlwaysOffscreen" enabled.

Then I must have made a mistake while testing. I'm going to report back once I checked this again.
Comment 40 Tobias Jakobi 2012-09-21 12:08:17 UTC
With AlwaysOffscreen enabled the fallback isn't triggered. I was wondering if I should leave this bug open, since the fallback is still there (and maybe to serve as a reminder to backport blorp also to ILK)
Comment 41 Eric Anholt 2013-06-08 00:28:16 UTC
OK, I'm going to go ahead and close it, since with improved wine it apparently isn't an issue any more.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.