Bug 97231 - GL_DEPTH_CLAMP doesn't clamp to the far plane
Summary: GL_DEPTH_CLAMP doesn't clamp to the far plane
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Mesa core (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-08-06 21:15 UTC by Jules Blok
Modified: 2016-08-20 04:19 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
apitrace file to reproduce the problem (989.40 KB, application/x-7z-compressed)
2016-08-06 21:15 UTC, Jules Blok
Details
apitrace file version 2 (1.04 MB, application/x-7z-compressed)
2016-08-08 10:57 UTC, Jules Blok
Details
FIFO log for Dolphin Emulator (606.26 KB, application/octet-stream)
2016-08-08 18:21 UTC, Jules Blok
Details
apitrace file version 3 (1.17 MB, application/x-7z-compressed)
2016-08-09 07:07 UTC, Jules Blok
Details
api trace file version 4 (5.96 MB, application/x-7z-compressed)
2016-08-09 19:39 UTC, Jules Blok
Details
apitrace file llvmpipe (2.44 MB, application/x-7z-compressed)
2016-08-10 15:25 UTC, Jules Blok
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jules Blok 2016-08-06 21:15:27 UTC
Created attachment 125579 [details]
apitrace file to reproduce the problem

On both the Intel drivers and the software rasterizer (llvm) GL_DEPTH_CLAMP doesn't properly clamp to the far plane as specified. It seems that it clamps the depth value written to the depth buffer to 1.0 instead.

According to the specs: https://www.opengl.org/registry/specs/ARB/depth_clamp.txt

    The one major issue is that fragments of a  primitive may extend
    beyond the conventional window space depth range for depth values
    (typically the range [0,1]).  Rather than discarding fragments that
    defy the window space depth range (effectively what near and far
    plane clipping accomplish), the depth values can be clamped to the
    current depth range.

I've attached an apitrace to reproduce the problem.

Expected output: https://fifoci.dolphin-emu.org/media/results/a09836d8de0911b9c6bdceffe64561a54cd988e9.png

Actual output: https://fifoci.dolphin-emu.org/media/results/565273f42d550e19b9b83921f177e4a898bbd5e6.png
Comment 1 Jules Blok 2016-08-06 21:20:09 UTC
You can also take a look at this issue on our automated testing system: https://fifoci.dolphin-emu.org/compare/1931541-1924723/

And here is the relevant PR where we are running into this issue: https://github.com/dolphin-emu/dolphin/pull/4085
Comment 2 Roland Scheidegger 2016-08-08 10:45:19 UTC
Some very quick look at the trace shows all glDepthRangef calls set near to 0.0 and far to 1.0. So where's the bug?
Comment 3 Jules Blok 2016-08-08 10:57:17 UTC
Created attachment 125589 [details]
apitrace file version 2

That's a rounding error, 16777215 / 16777216 is being rounded to 1 in the apitrace GUI.

I've attached another apitrace that's more representative of the current PR. The difference being that we've reversed the depth range to gain more depth precision.
Comment 4 Roland Scheidegger 2016-08-08 11:15:08 UTC
(In reply to Jules Blok from comment #3)
> Created attachment 125589 [details]
> apitrace file version 2
> 
> That's a rounding error, 16777215 / 16777216 is being rounded to 1 in the
> apitrace GUI.
> 
> I've attached another apitrace that's more representative of the current PR.
> The difference being that we've reversed the depth range to gain more depth
> precision.

Oh I could see why it might not work with llvmpipe if you do reverse depth range.
lp_setup_set_viewports() will set the min/max depth (they actually get "reverse-calculated" from viewport), but it will not take the potential reversing into account, so assuming near <= far. These values are then later used for clamping (in generate_fs_loop). Thus with min = 1.0, max = 0.0 it will first do the min against max (so the value will end up negative or 0.0) and then the max against min from the result (therefore, any value would end up as 1.0).
Just a theory though...
Comment 5 Jules Blok 2016-08-08 12:42:04 UTC
(In reply to Roland Scheidegger from comment #4)
> Oh I could see why it might not work with llvmpipe if you do reverse depth
> range.

We already had this problem before we reversed the depth range, refer to the obsolete apitrace to see the same sample without the reversed depth range.
Comment 6 Roland Scheidegger 2016-08-08 15:44:22 UTC
(In reply to Jules Blok from comment #5)
> (In reply to Roland Scheidegger from comment #4)
> > Oh I could see why it might not work with llvmpipe if you do reverse depth
> > range.
> 
> We already had this problem before we reversed the depth range, refer to the
> obsolete apitrace to see the same sample without the reversed depth range.

The replays don't work with llvmpipe since they require glsl 400 and I couldn't make them work with version overrides neither.
Not sure though why it wouldn't work without reversed depth, I know this feature generally works (with d3d10 in particular, which forbids reversing depth hence the bug there going unnoticed).
Comment 7 Jules Blok 2016-08-08 18:21:04 UTC
Created attachment 125603 [details]
FIFO log for Dolphin Emulator

(In reply to Roland Scheidegger from comment #6)
> The replays don't work with llvmpipe since they require glsl 400 and I
> couldn't make them work with version overrides neither.

It probably uses some extensions llvmpipe doesn't support, I'll try and get you an apitrace from llvmpipe. You can also try and compile the Dolphin Emulator, I've attached a replay file (FIFO log) you can drag into the emulator.

(In reply to Roland Scheidegger from comment #6)
> Not sure though why it wouldn't work without reversed depth, I know this
> feature generally works (with d3d10 in particular, which forbids reversing
> depth hence the bug there going unnoticed).

It seems you misunderstood me, it doesn't work with either reversed depth or normal depth on Mesa. So reverse depth has nothing to do with the issue, it was just a small thing I changed for reasons unrelated to this issue. I can reproduce the problem either way.
Comment 8 Roland Scheidegger 2016-08-08 20:46:07 UTC
(In reply to Jules Blok from comment #7)
> It seems you misunderstood me, it doesn't work with either reversed depth or
> normal depth on Mesa. So reverse depth has nothing to do with the issue, it
> was just a small thing I changed for reasons unrelated to this issue. I can
> reproduce the problem either way.

No I understood that just fine. What I'm saying is I'm pretty sure it is totally busted if you do reverse depth range.
But if you don't do that, this feature is generally known to work with llvmpipe. There's tests for this both on the gl (piglit) and d3d10 side (albeit piglit doesn't seem to catch it won't work with reversed depth even though the test does that...).
So I think you're doing something unusual. With my earlier quick look, it seemed like maybe you're relying on getting the clamped values already as fragcoord.z in the shader? That's not going to happen (and I can't see why it should, with d3d10 the language is explicit this happens when the depth values arrive at the output merger stage, gl language is way more confusing but still says it happens at depth test time, and you cannot force early depth tests with llvmpipe).
If you're using (2^24 - 1) / (2^24) maybe there could also be precision issues, albeit I would think it should survive the viewport scaling bits (should be right at lsb of the float mantissa).
Comment 9 Jules Blok 2016-08-08 21:04:08 UTC
(In reply to Roland Scheidegger from comment #8)
> So I think you're doing something unusual. With my earlier quick look, it
> seemed like maybe you're relying on getting the clamped values already as
> fragcoord.z in the shader?

We don't rely on that, we explicitly do our own clamping on gl_FragCoord.z in the pixel shader.

(In reply to Roland Scheidegger from comment #8)
> If you're using (2^24 - 1) / (2^24) maybe there could also be precision
> issues, albeit I would think it should survive the viewport scaling bits
> (should be right at lsb of the float mantissa).

The reason we're using that specific depth range is so we can prevent round trip errors by dividing/multiplying all our depth values by a power-of-two factor. However, we still need to make sure we don't go outside of the 24-bit integer range hence the strange far value.

What's happening in this specific example is that the game renders a mesh to clear the screen at depth 2^24 - 1 and then it renders the background at 2^24. Now, on the system we're trying to emulate the background depth would be clamped to 2^24 - 1 and thus it will pass an less-equals depth test.

The nvidia blob driver handles this correctly: https://fifoci.dolphin-emu.org/result/1931730/

The llvmpipe does not: https://fifoci.dolphin-emu.org/compare/1931541-1924723/

And the Intel driver has never been able to pass this test: https://fifoci.dolphin-emu.org/result/1931392/

On Windows both Nvidia and Intel drivers pass this test.
Comment 10 Roland Scheidegger 2016-08-09 03:10:22 UTC
(In reply to Jules Blok from comment #9)
> (In reply to Roland Scheidegger from comment #8)
> > So I think you're doing something unusual. With my earlier quick look, it
> > seemed like maybe you're relying on getting the clamped values already as
> > fragcoord.z in the shader?
> 
> We don't rely on that, we explicitly do our own clamping on gl_FragCoord.z
> in the pixel shader.
Ok. Albeit I'm actually missing that in the spec, if you do use early depth test, does that still not change the value you see in the fs?

I actually looked again at what llvmpipe really does, and now honestly I've no idea how this passes the piglit test. The problem is that we're actually only doing depth clamp when we write depth in the fs - otherwise that code is skipped (and the piglit test doesn't do that). Looks incorrect to me (you still get another clamp against a max value of a fixed 1.0 elsewhere).
I might be able to look into fixing it, though an apitrace which actually works would be helpful.
Comment 11 Jules Blok 2016-08-09 06:41:20 UTC
(In reply to Roland Scheidegger from comment #10)
> Ok. Albeit I'm actually missing that in the spec, if you do use early depth
> test, does that still not change the value you see in the fs?

No idea, I can't actually see the fs inputs from our automated testing system. I think we should focus on getting a sample to run for you.

(In reply to Roland Scheidegger from comment #10)
> I might be able to look into fixing it, though an apitrace which actually
> works would be helpful.

The most reliable method is to capture one yourself, then you can be sure that Dolphin won't use any feature not available on your GPU. You can just compile Dolphin from the PR and run it with the FIFO log that has been attached.

If you'd rather have me supply an apitrace that runs on your system, please let me know which GPU you are using so I can try and get a capture from a similar system.
Comment 12 Jules Blok 2016-08-09 07:07:04 UTC
Created attachment 125624 [details]
apitrace file version 3

I've attached an apitrace where I've forced Dolphin to use GLSL version 1.50. Hopefully that will be enough to be able to run it on your system. But it's possible that I'm still using extensions not available on your system.
Comment 13 Jules Blok 2016-08-09 19:39:00 UTC
Created attachment 125650 [details]
api trace file version 4

I've added an apitrace that was captured on Linux, perhaps you will have less problems running that trace.
Comment 14 Roland Scheidegger 2016-08-10 03:42:21 UTC
(In reply to Jules Blok from comment #13)
> Created attachment 125650 [details]
> api trace file version 4
> 
> I've added an apitrace that was captured on Linux, perhaps you will have
> less problems running that trace.

This still uses plenty of unsupported 4.0 stuff not supported on llvmpipe unfortunately.
Comment 15 Jules Blok 2016-08-10 15:25:51 UTC
Created attachment 125671 [details]
apitrace file llvmpipe

Finally got an apitrace running on llvmpipe by using a VM. Sorry for the delay.
Comment 16 Roland Scheidegger 2016-08-11 02:16:31 UTC
(In reply to Jules Blok from comment #15)
> Created attachment 125671 [details]
> apitrace file llvmpipe
> 
> Finally got an apitrace running on llvmpipe by using a VM. Sorry for the
> delay.

Thanks, works much better (although in turn it doesn't work on nvidia blob here - looks like it doesn't like the offsets used for glBindBufferRange() there...).

I might be able to look into in some days...

Although I really believe this to be a llvmpipe problem - if other mesa drivers are affected by it too they probably have their own, different bugs with it.
Comment 17 Jules Blok 2016-08-11 08:04:30 UTC
(In reply to Roland Scheidegger from comment #16)
> Although I really believe this to be a llvmpipe problem - if other mesa
> drivers are affected by it too they probably have their own, different bugs
> with it.

I think you're right, it looks like the mesa Intel drivers were never able to run these tests correctly. Even when we didn't rely on GL_DEPTH_CLAMP.
Comment 18 Ilia Mirkin 2016-08-13 01:13:31 UTC
FWIW this also fails on nouveau right now... this is a part of the patch I'm going to have to apply (still working out the full details, but this should have the general idea):

@@ -329,8 +329,21 @@ nvc0_validate_viewport(struct nvc0_context *nvc0)
       PUSH_DATA (push, (w << 16) | x);
       PUSH_DATA (push, (h << 16) | y);
 
-      zmin = vp->translate[2] - fabsf(vp->scale[2]);
-      zmax = vp->translate[2] + fabsf(vp->scale[2]);
+      if (nvc0->rast->pipe.clip_halfz) {
+         zmin = vp->translate[2];
+         zmax = vp->translate[2] + vp->scale[2];
+         if (zmax < zmin) {
+            float t = zmax;
+            zmax = zmin;
+            zmin = t;
+         }
+      } else {
+         zmin = vp->translate[2] - fabsf(vp->scale[2]);
+         zmax = vp->translate[2] + fabsf(vp->scale[2]);
+      }

Note that the near/far planes have to be calculated differently based on whether clip_halfz is enabled.
Comment 19 Jules Blok 2016-08-13 01:15:25 UTC
Ah, so this is an issue when combining GL_DEPTH_CLAMP and GL_CLIP_CONTROL?
Comment 20 Roland Scheidegger 2016-08-13 01:36:28 UTC
(In reply to Ilia Mirkin from comment #18)
> FWIW this also fails on nouveau right now... this is a part of the patch I'm
> going to have to apply (still working out the full details, but this should
> have the general idea):
> 
> @@ -329,8 +329,21 @@ nvc0_validate_viewport(struct nvc0_context *nvc0)
>        PUSH_DATA (push, (w << 16) | x);
>        PUSH_DATA (push, (h << 16) | y);
>  
> -      zmin = vp->translate[2] - fabsf(vp->scale[2]);
> -      zmax = vp->translate[2] + fabsf(vp->scale[2]);
> +      if (nvc0->rast->pipe.clip_halfz) {
> +         zmin = vp->translate[2];
> +         zmax = vp->translate[2] + vp->scale[2];
> +         if (zmax < zmin) {
> +            float t = zmax;
> +            zmax = zmin;
> +            zmin = t;
> +         }
> +      } else {
> +         zmin = vp->translate[2] - fabsf(vp->scale[2]);
> +         zmax = vp->translate[2] + fabsf(vp->scale[2]);
> +      }
> 
> Note that the near/far planes have to be calculated differently based on
> whether clip_halfz is enabled.

Yes, that's pretty much what I had in mind for fixing llvmpipe. Albeit llvmpipe already does different calculation depending on halfz.
It does:
      if (lp->rasterizer->clip_halfz == 0) {
         float half_depth = viewports[i].scale[2];
         min_depth = viewports[i].translate[2] - half_depth;
         max_depth = min_depth + half_depth * 2.0f;
      } else {
         min_depth = viewports[i].translate[2];
         max_depth = min_depth + viewports[i].scale[2];
      }
That said, it doesn't do the abs part. Is that really required?
And don't you need the zmin/zmax swap part too without clip_halfz?
(And as said, llvmpipe has problems with depth clamp as it does only actually clamp the values if they are output by fs.)
Comment 21 Ilia Mirkin 2016-08-13 01:48:26 UTC
(In reply to Roland Scheidegger from comment #20)
> That said, it doesn't do the abs part. Is that really required?

I think the abs was an attempt to avoid the swap

> And don't you need the zmin/zmax swap part too without clip_halfz?

I arrived at a similar conclusion -- seems easier to just avoid cleverness. See my patch at

https://patchwork.freedesktop.org/patch/104977/
Comment 22 Ilia Mirkin 2016-08-14 21:46:19 UTC
Like Roland said, if multiple drivers are affected, it's by independent bugs. However for nv50 and nvc0, this is fixed by:

commit 5c1ccd8053412b6a42098481d2fde3d483208c33
Author: Ilia Mirkin <imirkin@alum.mit.edu>
Date:   Fri Aug 12 21:40:52 2016 -0400

    nv50,nvc0: fix depth range when halfz is enabled

I'll leave this bug open for now though.
Comment 23 Roland Scheidegger 2016-08-20 04:19:47 UTC
For llvmpipe, this is fixed by 0849621891041498c7438080338ccea562440a9a.


bug/show.html.tmpl processed on Mar 25, 2017 at 17:28:03.
(provided by the Example extension).