Bug 101910

Summary: [BYT] ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f
Product: Mesa Reporter: Mark Janes <mark.a.janes>
Component: Drivers/DRI/i965Assignee: Topi Pohjolainen <topi.pohjolainen>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: jason, mirlan.ax.tokonbekov
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 101911    
Attachments: byt test failure
Patch simplifying test to include problematic blit only

Description Mark Janes 2017-07-24 22:50:46 UTC
Created attachment 132937 [details]
byt test failure

Many tests began failing on byt at:

f5859b45b1686e8116380d870f48432495fb19c7
Author:     Topi Pohjolainen <topi.pohjolainen@intel.com>
i965/miptree: Switch remaining surfaces to isl

Full list of tests:
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.cubemap_to_cubemap
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.cubemap_to_texture2d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.cubemap_to_texture2d_array
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.cubemap_to_texture3d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.texture2d_array_to_cubemap
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.texture2d_array_to_texture2d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.texture2d_array_to_texture2d_array
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.texture2d_array_to_texture3d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.texture2d_to_cubemap
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.texture2d_to_texture2d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.texture2d_to_texture2d_array
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.texture2d_to_texture3d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.texture3d_to_cubemap
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.texture3d_to_texture2d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.texture3d_to_texture2d_array
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.texture3d_to_texture3d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.cubemap_to_cubemap
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.cubemap_to_texture2d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.cubemap_to_texture2d_array
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.cubemap_to_texture3d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.texture2d_array_to_cubemap
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.texture2d_array_to_texture2d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.texture2d_array_to_texture2d_array
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.texture2d_array_to_texture3d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.texture2d_to_cubemap
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.texture2d_to_texture2d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.texture2d_to_texture2d_array
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.texture2d_to_texture3d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.texture3d_to_cubemap
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.texture3d_to_texture2d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.texture3d_to_texture2d_array
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32i_rgb32f.texture3d_to_texture3d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.cubemap_to_cubemap
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.cubemap_to_texture2d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.cubemap_to_texture2d_array
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.cubemap_to_texture3d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.texture2d_array_to_cubemap
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.texture2d_array_to_texture2d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.texture2d_array_to_texture2d_array
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.texture2d_array_to_texture3d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.texture2d_to_cubemap
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.texture2d_to_texture2d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.texture2d_to_texture2d_array
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.texture2d_to_texture3d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.texture3d_to_cubemap
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.texture3d_to_texture2d
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.texture3d_to_texture2d_array
ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32ui_rgb32f.texture3d_to_texture3d

Sample test output attached.


Additionally, the same commit regressed a single test on SNB:
ES3-CTS.functional.fbo.msaa.4_samples.rgba32f
Standard Output

glGenRenderbuffers(1, 0x00007ffea89bb154);
// renderbuffers = { 421 }
glBindRenderbuffer(GL_RENDERBUFFER, 421);
glRenderbufferStorageMultisample(GL_RENDERBUFFER, 4, GL_RGBA32F, 119, 131);
glGenFramebuffers(1, 0x00007ffea89bb15c);
// framebuffers = { 482 }
glBindFramebuffer(GL_FRAMEBUFFER, 482);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, 421);
glGetError();
// GL_NO_ERROR returned
glCheckFramebufferStatus(GL_FRAMEBUFFER);
// GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT returned
Framebuffer is not complete: 'GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT' at es3fFboTestCase.cpp:141
Comment 1 Topi Pohjolainen 2017-07-25 05:46:24 UTC
The regression with SNB is actually expected in a way. Old i965 miptree creation logic didn't restrict multi-sampling with over 64-bit formats (here we have GL_RGBA32F which is 128-bits). But ISL does and prevents such surfaces from being created - hardware docs say so.

Baytrail cases need more looking into, I'll see what I can find out.
Comment 2 Topi Pohjolainen 2017-07-25 13:11:29 UTC
Well, the difference is valign 2 vs 4. Forcing valign 4 even for linear with RGB32F:

diff --git a/src/intel/isl/isl_gen7.c b/src/intel/isl/isl_gen7.c
index 24d411f..57751ea 100644
--- a/src/intel/isl/isl_gen7.c
+++ b/src/intel/isl/isl_gen7.c
@@ -383,7 +383,13 @@ isl_gen7_choose_image_alignment_el(const struct isl_device *dev,
    assert(!(require_valign4 && gen7_format_needs_valign2(dev, info->format)));
 
    /* We default to VALIGN_2 because it uses the least memory. */
-   const uint32_t valign = require_valign4 ? 4 : 2;
+   uint32_t valign = require_valign4 ? 4 : 2;
+
+   /* In practise it looks that even as linear tiled this has to be programmed
+    * to 4 on BayTrail.
+    */
+   if (info->format == ISL_FORMAT_R32G32B32_FLOAT && ISL_DEV_IS_BAYTRAIL(dev))
+      valign = 4;
 
    *image_align_el = isl_extent3d(halign, valign, 1);
 }


fixes the tests.
Comment 3 Topi Pohjolainen 2017-07-25 13:22:26 UTC
Actually it is not just that, skipping the valign2 altogether for RGB32F fixes them all:

diff --git a/src/intel/isl/isl_gen7.c b/src/intel/isl/isl_gen7.c
index 24d411f..ed18494 100644
--- a/src/intel/isl/isl_gen7.c
+++ b/src/intel/isl/isl_gen7.c
@@ -30,6 +30,9 @@ gen7_format_needs_valign2(const struct isl_device *dev,
 {
    assert(ISL_DEV_GEN(dev) == 7);
 
+   if (format == ISL_FORMAT_R32G32B32_FLOAT && ISL_DEV_IS_BAYTRAIL(dev))
+      return false;
+
    /* From the Ivybridge PRM (2012-05-31), Volume 4, Part 1, Section 2.12.1,
     * RENDER_SURFACE_STATE Surface Vertical Alignment:
     *
Comment 4 Topi Pohjolainen 2017-07-25 13:28:21 UTC
Just in case both patches seem to be doing the same thing, the latter also affects tiling filter allowing Y-tiling for RGB32F.
Comment 5 Topi Pohjolainen 2017-07-25 17:48:37 UTC
Well, I'm puzzled, PRM for Bay Trail says the same as what is said for IVB:

Vol 2, RENDER_SURFACE_STATE, Surface Vertical Alignment:

Restriction: VALIGN_4 is not supported for surface format R32G32B32_FLOAT
Comment 6 Topi Pohjolainen 2017-07-26 06:36:36 UTC
Looking further. I also noticed that this might not show on IVB in jenkins just because it isn't run. At least I need to set:

MESA_EXTENSION_OVERRIDE=GL_EXT_copy_image MESA_GLES_VERSION_OVERRIDE=3.1

With that I can see the same error that Bay Trail gets.
Comment 7 Topi Pohjolainen 2017-07-26 07:24:56 UTC
I made originally a mistake in comparing batch buffer dumps. There is a valign 2 vs.4 difference in one of the surfaces but this is actually insignificant to the problem in hand. Whereas ISL opts to use valign of 2 for X-tiled RGBA_UNORM old miptree logic chose 4. Both seem to work, ISL just uses a little less space.

The difference in case of RGB32F is tiling - ISL falls back to X-tiling while old i965 uses Y-tiling. The last hack of mine changed this allowing ISL to keep on using Y-tiling. The fact that it changed valign from 2 to 4 is actually wrong but for some reason didn't upset the hardware.
Comment 8 Topi Pohjolainen 2017-07-26 07:40:16 UTC
There is similar fallback in old miptree, in brw_tex_layout.c::brw_miptree_choose_tiling(). Only that it never fired for RGB32F:

   /* From the Ivy Bridge PRM, Vol4 Part1 2.12.2.1 (SURFACE_STATE for most
    * messages), on p64, under the heading "Surface Vertical Alignment":
    *
    *     This field must be set to VALIGN_4 for all tiled Y Render Target
    *     surfaces.
    *
    * So if the surface is renderable and uses a vertical alignment of 2,
    * force it to be X tiled.  This is somewhat conservative (it's possible
    * that the client won't ever render to this surface), but it's difficult
    * to know that ahead of time.  And besides, since we use a vertical
    * alignment of 4 as often as we can, this shouldn't happen very often.
    */
   if (brw->gen == 7 && mt->valign == 2 &&
       brw->mesa_format_supports_render[mt->format]) {
      return ISL_TILING_X;
   }

There the last bit: brw->mesa_format_supports_render[mt->format] yields false for MESA_FORMAT_RGB_FLOAT32.

Therefore it looks that this used to work by luck, old logic used Y-tiling with valign == 2 which should be illegal.
Comment 9 Topi Pohjolainen 2017-07-26 15:52:29 UTC
While the failing tests themself don't create RGB32F surfaces to back renderbuffers (in which case they would probably fail as i965 marks RGB32F as non-renderable), the surfaces nonetheless end up as render targets. As the titles of all the failing tests say, tests copy two images. Image copies in turn are implemented in i965 using blorp (_mesa_CopyImageSubData() -> brw_blorp_copy_miptrees()) which ends up using RGB32F as GPU render target.

With respect to PRM, isl does the right thing and chooses X-tiling with valign of 2. This for some reason, however, fails. Old miptree used contrary to the spec Y-tiling with valign of 2 which for some reason works (at least in these tests).
Also as mentioned a few comments earlier, for some reason at least rgb32f_rgb32f.cubemap_to_cubemap works if one uses X-tiling and valign of 4 (which also is against the spec).
Comment 10 Jason Ekstrand 2017-07-26 19:08:55 UTC
(In reply to Topi Pohjolainen from comment #9)
> While the failing tests themself don't create RGB32F surfaces to back
> renderbuffers (in which case they would probably fail as i965 marks RGB32F
> as non-renderable), the surfaces nonetheless end up as render targets. As
> the titles of all the failing tests say, tests copy two images. Image copies
> in turn are implemented in i965 using blorp (_mesa_CopyImageSubData() ->
> brw_blorp_copy_miptrees()) which ends up using RGB32F as GPU render target.

No, it doesn't.  It should be using R32_UINT and manually offsetting to the slice to be copied.  There may be bugs there though.

> With respect to PRM, isl does the right thing and chooses X-tiling with
> valign of 2. This for some reason, however, fails. Old miptree used contrary
> to the spec Y-tiling with valign of 2 which for some reason works (at least
> in these tests).
> Also as mentioned a few comments earlier, for some reason at least
> rgb32f_rgb32f.cubemap_to_cubemap works if one uses X-tiling and valign of 4
> (which also is against the spec).

My gut suspects something in our code to calculate offsets though I'm not sure.
Comment 11 Topi Pohjolainen 2017-07-26 19:54:00 UTC
(In reply to Jason Ekstrand from comment #10)
> (In reply to Topi Pohjolainen from comment #9)
> > While the failing tests themself don't create RGB32F surfaces to back
> > renderbuffers (in which case they would probably fail as i965 marks RGB32F
> > as non-renderable), the surfaces nonetheless end up as render targets. As
> > the titles of all the failing tests say, tests copy two images. Image copies
> > in turn are implemented in i965 using blorp (_mesa_CopyImageSubData() ->
> > brw_blorp_copy_miptrees()) which ends up using RGB32F as GPU render target.
> 
> No, it doesn't.  It should be using R32_UINT and manually offsetting to the
> slice to be copied.  There may be bugs there though.

You are correct, it does change the format. But the tiling vs align remains. Old uses Y-tiling with valign 2 and new uses X-tiling with valign 2. Latter should be correct while the former shouldn't.

> 
> > With respect to PRM, isl does the right thing and chooses X-tiling with
> > valign of 2. This for some reason, however, fails. Old miptree used contrary
> > to the spec Y-tiling with valign of 2 which for some reason works (at least
> > in these tests).
> > Also as mentioned a few comments earlier, for some reason at least
> > rgb32f_rgb32f.cubemap_to_cubemap works if one uses X-tiling and valign of 4
> > (which also is against the spec).
> 
> My gut suspects something in our code to calculate offsets though I'm not
> sure.

That would explain it. I'll take a look tomorrow.
Comment 12 Topi Pohjolainen 2017-07-26 19:57:43 UTC
(In reply to Topi Pohjolainen from comment #11)
> (In reply to Jason Ekstrand from comment #10)
> > (In reply to Topi Pohjolainen from comment #9)
> > > While the failing tests themself don't create RGB32F surfaces to back
> > > renderbuffers (in which case they would probably fail as i965 marks RGB32F
> > > as non-renderable), the surfaces nonetheless end up as render targets. As
> > > the titles of all the failing tests say, tests copy two images. Image copies
> > > in turn are implemented in i965 using blorp (_mesa_CopyImageSubData() ->
> > > brw_blorp_copy_miptrees()) which ends up using RGB32F as GPU render target.
> > 
> > No, it doesn't.  It should be using R32_UINT and manually offsetting to the
> > slice to be copied.  There may be bugs there though.
> 
> You are correct, it does change the format. But the tiling vs align remains.
> Old uses Y-tiling with valign 2 and new uses X-tiling with valign 2. Latter
> should be correct while the former shouldn't.
> 

Can we actually use valign2 with R32_UINT in general?

> > 
> > > With respect to PRM, isl does the right thing and chooses X-tiling with
> > > valign of 2. This for some reason, however, fails. Old miptree used contrary
> > > to the spec Y-tiling with valign of 2 which for some reason works (at least
> > > in these tests).
> > > Also as mentioned a few comments earlier, for some reason at least
> > > rgb32f_rgb32f.cubemap_to_cubemap works if one uses X-tiling and valign of 4
> > > (which also is against the spec).
> > 
> > My gut suspects something in our code to calculate offsets though I'm not
> > sure.
> 
> That would explain it. I'll take a look tomorrow.
Comment 13 Jason Ekstrand 2017-07-26 20:10:38 UTC
(In reply to Topi Pohjolainen from comment #12)
> (In reply to Topi Pohjolainen from comment #11)
> > (In reply to Jason Ekstrand from comment #10)
> > > (In reply to Topi Pohjolainen from comment #9)
> > > > While the failing tests themself don't create RGB32F surfaces to back
> > > > renderbuffers (in which case they would probably fail as i965 marks RGB32F
> > > > as non-renderable), the surfaces nonetheless end up as render targets. As
> > > > the titles of all the failing tests say, tests copy two images. Image copies
> > > > in turn are implemented in i965 using blorp (_mesa_CopyImageSubData() ->
> > > > brw_blorp_copy_miptrees()) which ends up using RGB32F as GPU render target.
> > > 
> > > No, it doesn't.  It should be using R32_UINT and manually offsetting to the
> > > slice to be copied.  There may be bugs there though.
> > 
> > You are correct, it does change the format. But the tiling vs align remains.
> > Old uses Y-tiling with valign 2 and new uses X-tiling with valign 2. Latter
> > should be correct while the former shouldn't.
> > 
> 
> Can we actually use valign2 with R32_UINT in general?

We probably have to whack VALIGN as part of adjusting to single-slice.  It also shouldn't matter at that point since there are no mip-levels or array slices to align.  Though I could see the hardware just being picky.
Comment 14 Topi Pohjolainen 2017-07-27 06:13:45 UTC
Using:

if (brw->gen == 7 && mt->valign == 2)
      return ISL_TILING_X;

for old miptree and hence forcing it to use X-tiling fails the same way as the new. I'm now about to take closer look of the offsets.
Comment 15 Topi Pohjolainen 2017-07-27 12:25:08 UTC
Created attachment 133072 [details] [review]
Patch simplifying test to include problematic blit only

It looks that a blorp_copy() targeting level 6, layer 0 interferes with level 0. If that copy is omitted level 0 contents are as expected.

With valign 2, one gets image offset 32,96 corresponding to tile offset = 98304 and intra tile x = 32, y = 0.

With valign 4, one gets the same tile offset = 98304 but intra tile x = 0, y = 4. That doesn't interfere with level 0.
Comment 16 Topi Pohjolainen 2017-07-27 12:26:50 UTC
(In reply to Topi Pohjolainen from comment #15)
> Created attachment 133072 [details] [review] [review]
> Patch simplifying test to include problematic blit only
> 
> It looks that a blorp_copy() targeting level 6, layer 0 interferes with
> level 0. If that copy is omitted level 0 contents are as expected.
> 
> With valign 2, one gets image offset 32,96 corresponding to tile offset =
> 98304 and intra tile x = 32, y = 0.
> 
> With valign 4, one gets the same tile offset = 98304 but intra tile x = 0, y
> = 4. That doesn't interfere with level 0.

This is for examining:

ES31-CTS.functional.copy_image.non_compressed.viewclass_96_bits.rgb32f_rgb32f.cubemap_to_cubemap
Comment 17 Topi Pohjolainen 2017-08-21 06:54:21 UTC
After realizing that writing level 6 actually interferes with level 1 (instead of level 0 that I stated earlier), the bug started make more sense as levels 1 and 6 reside in the same tile. I reminded myself how Y-tiling is laid out and how the individual levels reside respect to one and other. Then the difference between valign == 2 and valign == 4 started to make more sense (initially the intra tile offset difference (32, 0) vs. (0, 4) looked odd). After some more digging I realized that the alignment difference itself wasn't significant - valign == 4 case was just lucky to place levels on tile boundary with respect to x-coordinate. And furthermore that our image copy logic failed to adjust intra tile x-coordinate when it treated RGB surfaces as three times wide R-only.
Comment 18 Topi Pohjolainen 2017-08-22 06:43:31 UTC
This is fixed in master by:

commit 393ec1a5071263d300e91f43058ed3b594d41418
Author: Topi Pohjolainen <topi.pohjolainen@intel.com>
Date:   Sat Aug 19 09:22:22 2017 +0300

    intel/blorp: Adjust intra-tile x when faking rgb with red-only
    
    v2 (Jason): Adjust directly in surf_fake_rgb_with_red()
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101910
    
    CC: mesa-stable@lists.freedesktop.org
    Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
    Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.