Bug 105224

Summary: Webgl Pointclouds flickers
Product: Mesa Reporter: Augustin Trancart <augustin.trancart>
Component: Drivers/DRI/i965Assignee: Iago Toral <itoral>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: blocker    
Priority: highest CC: kenneth, mark.a.janes
Version: 17.3Keywords: bisected, regression
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 104757    
Attachments: My glxinfo
screencast of flicker on 17.2.8
itowns screencast 17.2.8 (no flicker)
itowns screencast 17.3.3 (flickers)
potree screencast 17.2.8 (no flickers)
potree screencast 17.3.3 (flickers)
Fix

Description Augustin Trancart 2018-02-23 09:58:33 UTC
Created attachment 137555 [details]
My glxinfo

Description
===========
Since version 17.3.3, pointclouds with webgl flickers.


Step to reproduce
=================
- visit http://potree.entwine.io/data/sncf.html or http://www.itowns-project.org/itowns/examples/pointcloud.html
- move the camera and observe the poincloud


Expected
========
They display correctly


Actual
======
They flicker at each rendering (the example at itowns-project.org stops flickering after a while, because this application stops the rendering loop when there is nothing to do).

When zooming very near the points, only those at the same position flickers (maybe a depth issue?).



I'm pretty sure it happens after the update to 17.2.8, but I'm going to downgrade now and double-check.

I hope I selected the right component. Thanks!
Comment 1 Augustin Trancart 2018-02-23 10:20:56 UTC
I confirm downgrading to 17.2.8 solves the issue.

I should also mention that I'm on ubuntu using x-swat ppa, so the real version numbers are 17.2.8-0ubuntu0~16.04.1 and 17.3.3-0ubuntu1~16.04.1
Comment 2 Augustin Trancart 2018-02-23 11:11:27 UTC
Thanks Michel for the component correction. 

Just a precision, I didn't downgrade the packages that seems to contain my video drivers. Are you sure it is a driver issue then?
Comment 3 Mark Janes 2018-02-23 18:27:17 UTC
Created attachment 137561 [details]
screencast of flicker on 17.2.8
Comment 4 Mark Janes 2018-02-23 18:28:35 UTC
the points flicker for me on Mesa 17.2.8 (KBL / Linux 4.15).  Can you describe your hardware/kernel for the working configuration?  Please also attach a screencast of what it is supposed to look like.
Comment 5 Augustin Trancart 2018-02-23 21:41:38 UTC
~ ᐅ uname -a
Linux 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

My hardware config is in the glxinfo (I guess). In short: it's a Lenovo X1 Carbon with a Intel HD Graphics 520 (Skylake GT2).

I'm attaching screencasts of both version now.
Comment 6 Augustin Trancart 2018-02-23 21:42:50 UTC
Created attachment 137565 [details]
itowns screencast 17.2.8 (no flicker)
Comment 7 Augustin Trancart 2018-02-23 21:46:00 UTC
Created attachment 137566 [details]
itowns screencast 17.3.3 (flickers)
Comment 8 Augustin Trancart 2018-02-23 21:47:24 UTC
Created attachment 137567 [details]
potree screencast 17.2.8 (no flickers)
Comment 9 Augustin Trancart 2018-02-23 21:49:18 UTC
Created attachment 137568 [details]
potree screencast 17.3.3 (flickers)
Comment 10 Augustin Trancart 2018-02-23 21:57:17 UTC
On Itowns 17.3.3 attachment (https://bugs.freedesktop.org/attachment.cgi?id=137566) you can see that part of the floor behind the walls are actually hiding the building once they are displayed (data is downloaded by patch).
Comment 11 Mark Janes 2018-02-23 22:00:22 UTC
what version of firefox are you running?  I'm puzzled as to why I can't get a correct render on my system.
Comment 12 Augustin Trancart 2018-02-24 13:34:23 UTC
I reproduce on:
- firefox 59 (current dev edition, so beta)
- firefox 58 (current stable)
- chromium 64

I do *not* reproduce on:
- midori 0.5.11
- epiphany 3.18.11

Looking at webglreport, *maybe* the bug is only on browsers that have the  	EXT_frag_depth extension. When I comment out the use of gl_FragDepthEXT in itowns, it makes the problem go away.
Comment 13 Mark Janes 2018-02-25 17:33:40 UTC
I reproduced correct behavior on ubuntu 16.10:
  linux 4.8
  mesa 12.0.6
Comment 14 Mark Janes 2018-02-25 20:58:31 UTC
Bisected to:

commit 16631ca30ea6d6eec8101f07d97a55b2210026e8
Author: Iago Toral Quiroga <itoral@igalia.com>
Date:   Fri Oct 13 09:22:54 2017 +0200
    i965/sbe: fix active components for SSO programs with over 16 inputs
    
    When we have up to 16 FS inputs, the SF unit will reorder our inputs
    to be consecutive, however, when we have more than 16 we need to
    to read our inputs from the URB exactly as they have been
    output from the previous stage. This means that for SSO we have to
    consider if we have URB padding due to unused input locations.
    
    Specifically, this affects gen9 active components programming, since
    for things to work in scenarios with over 16 inputs that have padded
    regions we need to ensure that we program active components for the
    padded regions too. If we don't do this the hardware won't read
    the URB properly for inputs located after padded regions.
    
    Found empirically.
    
    Fixes (these also require a patch in CTS):
    KHR-GL45.enhanced_layouts.varying_locations
    KHR-GL45.enhanced_layouts.varying_array_locations
    
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Comment 15 Iago Toral 2018-02-26 09:48:16 UTC
Created attachment 137603 [details] [review]
Fix

This patch seems to fix the problem: we were not accounting for the fact that some special FS inputs are not stored in the URB, such as VARYING_SLOT_PNTC and we still need to program gen9+ active components for them too.

Mark: I am currently testing this in Jenkins, will send for review unless I see issues with it.
Comment 16 Iago Toral 2018-02-26 10:24:28 UTC
I was surprised that this was never caught by Jenkins before, because in theory, it should affect all situations that involve point sprite functionality and we have tests for that in Piglit.

I did some investigation and it turns out that the tests are passing because the point sprite varying is the only input to the FS in these tests and the hardware has a requirement by which our URB must have at least one entry, so we were padding the URB size to have exactly one offset. This means that when we took the number of inputs from the URB size alone we would count 2, the minimum we can have (each URB offset packs 2 slots), and since the test only required one, it would work.

I'll try to add another test to piglit that adds some user varyings so we can replicate the problem and add regression testing for it.
Comment 17 Iago Toral 2018-02-26 10:40:34 UTC
(In reply to Iago Toral from comment #15)
> Mark: I am currently testing this in Jenkins, will send for review unless I
> see issues with it.

Jenkin's results are good. Sent for review here:
https://lists.freedesktop.org/archives/mesa-dev/2018-February/186822.html
Comment 18 Iago Toral 2018-02-26 11:53:38 UTC
(In reply to Iago Toral from comment #17)
> (In reply to Iago Toral from comment #15)
> > Mark: I am currently testing this in Jenkins, will send for review unless I
> > see issues with it.
> 
> Jenkin's results are good. Sent for review here:
> https://lists.freedesktop.org/archives/mesa-dev/2018-February/186822.html

And this is a patch for piglit that extends the existing gl_PointCoord test to trigger the problem:

https://lists.freedesktop.org/archives/piglit/2018-February/023894.html
Comment 19 Iago Toral 2018-03-01 10:03:04 UTC
Fixed with:

commit bc73016703f8f2815e000f1c100532cf6e13cd3c
Author: Iago Toral Quiroga <itoral@igalia.com>
Date:   Thu Mar 1 07:59:42 2018 +0100

    i965/sbe: fix number of inputs for active components
    
    In 16631ca30ea6 we fixed gen9 active components to account for padded
    inputs in the URB, which we can have with SSO programs. To do that,
    instead of going through the bitfield of inputs (which doesn't include
    padding information), we compute the number of inputs from the size
    of the URB entry.
    
    Unfortunately, there are some special inputs that are not stored in
    the URB and that we also need to account for. These special inputs
    are identified and handled during calculate_attr_overrides().
    
    Instead of keeping track of the exact number of inputs, we just
    program active components for all possible inputs like we do in
    anvil.
    
    This fixes a regression in a WebGL program that uses Point Sprite
    functionality (specifically, VARYING_SLOT_PNTC).
    
    v2:
     - Add 'Fixes' tag (Mark Janes)
     - make no_vue_inputs int instead of uint32_t, and add const qualifier
       to num_inputs variable (Ian)
    
    v3:
     - Do not try to count inputs correctly, just program all input
       slots like we do in anvil (Ken)
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105224
    Fixes: 16631ca30ea6 (i965/sbe: fix active components for SSO programs with over 16 inputs)
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.