Bug 109357 - [SKL] [REGRESSION] [BISECTED] [OpenGL CTS] Many flaky tests after adding workarounds for object preemption in gen9
Summary: [SKL] [REGRESSION] [BISECTED] [OpenGL CTS] Many flaky tests after adding work...
Status: RESOLVED WORKSFORME
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-14 17:04 UTC by Andrés Gómez García
Modified: 2019-04-15 15:08 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
gl45 cts-runner results with Linux 4.19.0 (7.21 MB, application/x-xz)
2019-01-15 15:26 UTC, Andrés Gómez García
Details
gl46 cts-runner results with Linux 4.18.0 (7.14 MB, application/x-xz)
2019-01-15 15:27 UTC, Andrés Gómez García
Details

Description Andrés Gómez García 2019-01-14 17:04:36 UTC
After:

--

commit 5c454661c66fa2624cf4bba1071175070724869a
Author: Rafael Antognolli <rafael.antognolli@intel.com>
Date:   Mon Oct 29 10:19:54 2018 -0700

    i965/gen9: Add workarounds for object preemption.
    
    Gen9 hardware requires some workarounds to disable preemption depending
    on the type of primitive being emitted.
    
    We implement this by adding a function that checks the primitive type
    and number of instances right before the 3DPRIMITIVE.
    
    For now, we just ignore blorp.  The only primitive it emits is
    3DPRIM_RECTLIST, and since it's not listed in the workarounds, we can
    safely leave preemption enabled when it happens. Or it will be disabled
    by a previous 3DPRIMITIVE, which should be fine too.
    
    v3:
     - Apply missing workarounds for instanced rendering and line loop (Ken)
     - Move workaround code to brw_draw_single_prim()
    
    Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
    Cc: Kenneth Graunke <kenneth@whitecape.org>
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

--

(At least) The following tests are failing when running the cts-runner for gl45 in SKL with the x11_egl target:

--

KHR-GL45.enhanced_layouts.xfb_global_buffer
KHR-GL45.geometry_shader.rendering.rendering.lines_input_points_output_line_strip_drawcall
KHR-GL45.gpu_shader5.uniform_blocks_array_indexing
KHR-GL45.gpu_shader_fp64.builtin.*
KHR-GL45.sepshaderobjs.InterfacePrecisionMatchingInt
KHR-GL45.shader_storage_buffer_object.basic-operations-case2-vs
KHR-GL45.shader_subroutine.control_flow_and_returned_subroutine_values_used_as_subroutine_input
KHR-GL45.shader_subroutine.eight_subroutines_four_uniforms
KHR-GL45.shader_subroutine.four_subroutines_with_two_uniforms
KHR-GL45.shader_subroutine.subroutines_with_separate_shader_objects
KHR-GL45.shader_subroutine.two_subroutines_single_subroutine_uniform
KHR-GL45.shading_language_420pack.binding_uniform_single_block
KHR-GL45.shading_language_420pack.implicit_conversions
KHR-GL45.texture_view.view_sampling

--

The hardware is an Intel NUC:

--

$ glxinfo | grep Skylake
    Device: Mesa DRI Intel(R) Iris Graphics 540 (Skylake GT3e)  (0x1926)
OpenGL renderer string: Mesa DRI Intel(R) Iris Graphics 540 (Skylake GT3e) 

--

Notice that I *cannot* reproduce this regression with KBL.

Notice also that I cannot reproduce this regression running the same tests individually applying the reportedly affected profiles: this only shows up when running with the cts-runner.

Notice that failing tests tend to vary, but the presence of many failures in KHR-GL45.gpu_shader_fp64.builtin.* is constant.

This is a quick command to be able to replay the CTS run with the same codebase with docker:

--

$ RESULTS_DIRECTORY=<my_results_directory_path>
$ docker run --privileged --rm -t -v "$RESULTS_DIRECTORY":/results:Z -e DISPLAY=unix:0.0 -v /tmp/.X11-unix:/tmp/.X11-unix registry.gitlab.com/igalia/graphics/vk-gl-cts:wip-agomez-gen9-workarounds-for-object-preemption_wip-agomez-gen9-workarounds-for-object-preemption /bin/bash -c "cd ~/vk-gl-cts/build/external/openglcts/modules; TIMESTAMP=`date +%Y%m%d%H%M%S`; RCR_CTS_RUNNER_TYPE="gl45"; mkdir -p /results/cts-runner/\$RCR_CTS_RUNNER_TYPE-\$TIMESTAMP; ./cts-runner --type=\$RCR_CTS_RUNNER_TYPE --logdir=/results/\$RCR_CTS_RUNNER_TYPE-\$TIMESTAMP"

--

The local versions are:

--

$ uname -a
Linux panix 4.19.0-1-amd64 #1 SMP Debian 4.19.12-1 (2018-12-22) x86_64 GNU/Linux
$ cat /var/log/Xorg.0.log
[  2171.328]
X.Org X Server 1.16.4
Release Date: 2014-12-20
[  2171.328] X Protocol Version 11, Revision 0
[  2171.328] Build Operating System: Linux 3.16.0-4-amd64 x86_64 Debian
[  2171.328] Current Operating System: Linux nucbot1 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64                                                  

...

[  2171.328] Build Date: 11 February 2015  12:32:02AM
[  2171.328] xorg-server 2:1.16.4-1 (http://www.debian.org/support)

...

[  2171.332] (II) LoadModule: "intel"
[  2171.332] (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so
[  2171.332] (II) Module intel: vendor="X.Org Foundation"
[  2171.332]    compiled for 1.15.99.904, module version = 2.21.15
[  2171.332]    Module class: X.Org Video Driver
[  2171.332]    ABI class: X.Org Video Driver, version 18.0

...

--
Comment 1 Andrés Gómez García 2019-01-14 17:17:30 UTC
(In reply to Andrés Gómez García from comment #0)
...
> $ cat /var/log/Xorg.0.log
> [  2171.328]
> X.Org X Server 1.16.4
> Release Date: 2014-12-20
> [  2171.328] X Protocol Version 11, Revision 0
> [  2171.328] Build Operating System: Linux 3.16.0-4-amd64 x86_64 Debian
> [  2171.328] Current Operating System: Linux nucbot1 3.16.0-4-amd64 #1 SMP
> Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64                                 
> 
> 
> ...
> 
> [  2171.328] Build Date: 11 February 2015  12:32:02AM
> [  2171.328] xorg-server 2:1.16.4-1 (http://www.debian.org/support)
> 
> ...
> 
> [  2171.332] (II) LoadModule: "intel"
> [  2171.332] (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so
> [  2171.332] (II) Module intel: vendor="X.Org Foundation"
> [  2171.332]    compiled for 1.15.99.904, module version = 2.21.15
> [  2171.332]    Module class: X.Org Video Driver
> [  2171.332]    ABI class: X.Org Video Driver, version 18.0
> 
> ...

This is obviously wrong :(

Correct info:

$ cat $HOME/.local/share/xorg/Xorg.0.log
[    17.287] (--) Log file renamed from "/home/igalia/igalia/.local/share/xorg/Xorg.pid-1165.log" to "/home/igalia/igalia/.local/share/xorg/Xorg.0.log"
[    17.290] 
X.Org X Server 1.20.3
X Protocol Version 11, Revision 0
[    17.290] Build Operating System: Linux 4.9.0-8-amd64 x86_64 Debian
[    17.290] Current Operating System: Linux panix 4.19.0-1-amd64 #1 SMP Debian 4.19.12-1 (2018-12-22) x86_64

...

[    17.290] Build Date: 25 October 2018  06:15:23PM
[    17.290] xorg-server 2:1.20.3-1 (https://www.debian.org/support)

...

[    17.312] (II) LoadModule: "glx"
[    17.315] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[    17.326] (II) Module glx: vendor="X.Org Foundation"
[    17.326]    compiled for 1.20.3, module version = 1.0.0
[    17.326]    ABI class: X.Org Server Extension, version 10.0
[    17.326] (II) LoadModule: "intel"
[    17.326] (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so
[    17.330] (II) Module intel: vendor="X.Org Foundation"
[    17.330]    compiled for 1.20.1, module version = 2.99.917
[    17.330]    Module class: X.Org Video Driver
[    17.330]    ABI class: X.Org Video Driver, version 24.0
[    17.331] (II) intel: Driver for Intel(R) Integrated Graphics Chipsets:
        i810, i810-dc100, i810e, i815, i830M, 845G, 854, 852GM/855GM, 865G,
        915G, E7221 (i915), 915GM, 945G, 945GM, 945GME, Pineview GM,
        Pineview G, 965G, G35, 965Q, 946GZ, 965GM, 965GME/GLE, G33, Q35, Q33,
        GM45, 4 Series, G45/G43, Q45/Q43, G41, B43
[    17.331] (II) intel: Driver for Intel(R) HD Graphics
[    17.331] (II) intel: Driver for Intel(R) Iris(TM) Graphics
[    17.331] (II) intel: Driver for Intel(R) Iris(TM) Pro Graphics
[    17.331] xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)
[    17.332] (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20180719
[    17.332] (II) intel(0): SNA compiled: xserver-xorg-video-intel 2:2.99.917+git20180925-2 (Andreas Boll <aboll@debian.org>)
[    17.332] (II) intel(0): SNA compiled for use with valgrind
[    17.333] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
[    17.334] (--) intel(0): Integrated Graphics Chipset: Intel(R) Iris Graphics 540
[    17.334] (--) intel(0): CPU: x86-64, sse2, sse3, ssse3, sse4.1, sse4.2, avx, avx2; using a maximum of 2 threads
[    17.334] (==) intel(0): Depth 24, (--) framebuffer bpp 32
[    17.334] (==) intel(0): RGB weight 888
[    17.334] (==) intel(0): Default visual is TrueColor
[    17.334] (**) intel(0): Option "DRI" "3"
[    17.335] (II) intel(0): Output HDMI1 using monitor section Monitor0
[    17.335] (II) intel(0): Enabled output HDMI1
[    17.335] (II) intel(0): Output DP1 has no monitor section
[    17.335] (II) intel(0): Enabled output DP1
[    17.335] (II) intel(0): Output HDMI2 has no monitor section
[    17.335] (II) intel(0): Enabled output HDMI2
[    17.335] (--) intel(0): Using a maximum size of 256x256 for hardware cursors
[    17.335] (II) intel(0): Output VIRTUAL1 has no monitor section
[    17.335] (II) intel(0): Enabled output VIRTUAL1
[    17.336] (--) intel(0): Output HDMI1 using initial mode 1280x720 on pipe 0
[    17.336] (==) intel(0): TearFree disabled
[    17.336] (==) intel(0): Using gamma correction (1.0, 1.0, 1.0)
[    17.336] (==) intel(0): DPI set to (96, 96)
[    17.336] (II) Loading sub module "dri3"
[    17.336] (II) LoadModule: "dri3"
[    17.336] (II) Module "dri3" already built-in
[    17.336] (II) Loading sub module "dri2"
[    17.336] (II) LoadModule: "dri2"
[    17.336] (II) Module "dri2" already built-in
[    17.336] (II) Loading sub module "present"
[    17.336] (II) LoadModule: "present"
[    17.336] (II) Module "present" already built-in
[    17.339] (II) intel(0): SNA initialized with Skylake (gen9) backend
[    17.339] (==) intel(0): Backing store enabled
[    17.339] (==) intel(0): Silken mouse enabled
[    17.339] (II) intel(0): HW Cursor enabled
[    17.340] (==) intel(0): DPMS enabled
[    17.340] (==) intel(0): Display hotplug detection enabled
[    17.341] (II) intel(0): [DRI2] Setup complete
[    17.341] (II) intel(0): [DRI2]   DRI driver: i965
[    17.341] (II) intel(0): [DRI2]   VDPAU driver: va_gl
[    17.341] (II) intel(0): direct rendering: DRI2 DRI3 enabled
[    17.341] (II) intel(0): hardware support for Present enabled
[    17.341] (II) Initializing extension Generic Event Extension
[    17.341] (II) Initializing extension SHAPE
[    17.341] (II) Initializing extension MIT-SHM
[    17.341] (II) Initializing extension XInputExtension
[    17.342] (II) Initializing extension XTEST
[    17.342] (II) Initializing extension BIG-REQUESTS
[    17.343] (II) Initializing extension SYNC
[    17.343] (II) Initializing extension XKEYBOARD
[    17.343] (II) Initializing extension XC-MISC
[    17.343] (II) Initializing extension SECURITY
[    17.343] (II) Initializing extension XFIXES
[    17.344] (II) Initializing extension RENDER
[    17.344] (II) Initializing extension RANDR
[    17.344] (II) Initializing extension COMPOSITE
[    17.344] (II) Initializing extension DAMAGE
[    17.345] (II) Initializing extension MIT-SCREEN-SAVER
[    17.345] (II) Initializing extension DOUBLE-BUFFER
[    17.345] (II) Initializing extension RECORD
[    17.345] (II) Initializing extension DPMS
[    17.345] (II) Initializing extension Present
[    17.345] (II) Initializing extension DRI3
[    17.346] (II) Initializing extension X-Resource
[    17.346] (II) Initializing extension XVideo
[    17.346] (II) Initializing extension XVideo-MotionCompensation
[    17.346] (II) Initializing extension SELinux
[    17.346] (II) SELinux: Disabled on system
[    17.346] (II) Initializing extension GLX
[    17.384] (II) AIGLX: Loaded and initialized i965
[    17.384] (II) GLX: Initialized DRI2 GL provider for screen 0
[    17.384] (II) Initializing extension XFree86-VidModeExtension
[    17.385] (II) Initializing extension XFree86-DGA
[    17.385] (II) Initializing extension XFree86-DRI
[    17.385] (II) Initializing extension DRI2
[    17.389] (II) intel(0): switch to mode 1280x720@60.0 on HDMI1 using pipe 0, position (0, 0), rotation normal, reflection none
[    17.396] (II) intel(0): Setting screen physical size to 338 x 190

...
Comment 2 Rafael Antognolli 2019-01-14 17:33:21 UTC
Were you able to reproduce it on your own SKL, or does it only happen on the CI ones?
Comment 3 Andrés Gómez García 2019-01-14 20:11:31 UTC
On our own SKL.
Comment 4 Mark Janes 2019-01-14 20:17:46 UTC
I haven't seen any flaky tests in this category in Mesa i965 CI.  We run the GL46 variants of the tests, on Linux 4.18.

Usually SKL and KBL have identical regression patterns, so it is surprising that you can't reproduce the regression on KBL.  Are there differences between those systems (eg kernel or other sw configuration)?
Comment 5 Andrés Gómez García 2019-01-15 10:44:18 UTC
(In reply to Mark Janes from comment #4)
> I haven't seen any flaky tests in this category in Mesa i965 CI.  We run the
> GL46 variants of the tests, on Linux 4.18.

I can confirm that I can reproduce with the GL46 variants of the tests.

> Usually SKL and KBL have identical regression patterns, so it is surprising
> that you can't reproduce the regression on KBL.  Are there differences
> between those systems (eg kernel or other sw configuration)?

Yeah, weird.

Since we use docker, the SW stack is the same, with the exception of the services provided by the host system. Namely, the kernel and X server versions.

Additionally, all hour host systems are running Debian Buster but I've actually realized that our KBL is using also Linux 4.18.

I'm just now running a pass in the SKL box with the same 4.18 kernel. Let's see what comes out.
Comment 6 Andrés Gómez García 2019-01-15 15:26:41 UTC
Created attachment 143129 [details]
gl45 cts-runner results with Linux 4.19.0
Comment 7 Andrés Gómez García 2019-01-15 15:27:52 UTC
Created attachment 143130 [details]
gl46 cts-runner results with Linux 4.18.0
Comment 8 Andrés Gómez García 2019-01-15 15:28:17 UTC
(In reply to Andrés Gómez García from comment #5)

...

> I'm just now running a pass in the SKL box with the same 4.18 kernel. Let's
> see what comes out.

Similar results. See the attached tarballs.
Comment 9 Andrés Gómez García 2019-01-21 15:45:38 UTC
Checked with 2 other similar NUCs with SKL and I cannot reproduce the problem.

They are running with the same Debian distro and kernel version.

Closing as RESOLVED WORKSFORME while still trying to figure out which could be the difference. I'll reopen/report if I find something else.
Comment 10 Andrés Gómez García 2019-01-21 15:57:26 UTC
(In reply to Andrés Gómez García from comment #9)
> Checked with 2 other similar NUCs with SKL and I cannot reproduce the
> problem.

FTR, checked:

 * BIOS: same on all of them.
 * CPU: same on all of them.
 * Mother board: same on all of them.
 * RAM: 2 with 16Gb, 1 with 8Gb (all from the same maker). The failing NUC has 16Gb, there other one with 16Gb, however, is working.
Comment 11 Andrés Gómez García 2019-01-29 10:20:22 UTC
Passed the cts-runner execution with 2 same NUCs but with the disks swapped. Both pass. I cannot reproduce. This looks really like a ghost.

No clue what happened.
Comment 12 Mark Janes 2019-01-29 16:53:56 UTC
We just found out yesterday that persistent Mesa i965 CI issues were caused by a bug in kernel 4.18 and older:

    https://patchwork.freedesktop.org/patch/252573/

I wouldn't run any graphics workloads on intel systems with kernel older than 4.19.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.