Bug 88521 - GLBenchmark 2.7 TRex renders with artifacts on Gen8 with !UXA
Summary: GLBenchmark 2.7 TRex renders with artifacts on Gen8 with !UXA
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Kenneth Graunke
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-01-17 01:07 UTC by Mark Janes
Modified: 2015-04-23 21:20 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Use PTE cache settings for winsys buffers. (3.44 KB, patch)
2015-01-17 09:31 UTC, Chris Wilson
Details | Splinter Review
Fix the blitter code; fixes the bug. (2.65 KB, patch)
2015-04-15 10:19 UTC, Kenneth Graunke
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Janes 2015-01-17 01:07:31 UTC
This bug affects Gen8 platforms Braswell and Broadwell.  I've been testing on a Braswell C0 part.

I'm running with the oibaf ppa on ubuntu 14.10.  The SNA version from xorg log is:
xserver-xorg-video-intel 2:2.99.917+git1501151931.bb279d~gd~u (Oibaf <fmrummey@gmail.com>)

If you run the GLBenchmark Onscreen TRex benchmark, you can see occasional artifacts where leaves draw lines forward into the screen.  It seems like an off-by-one error, where one of the vertices is squashed when it is rendered.

Changing to UXA eliminates the behavior.
Comment 1 Chris Wilson 2015-01-17 09:31:24 UTC
Created attachment 112377 [details] [review]
Use PTE cache settings for winsys buffers.

Sounds like a mesa bug, try this.
Comment 2 Chris Wilson 2015-01-17 09:31:58 UTC
(Either that or it is a bug in the rendercopy routine that somehow is only seen within a mesa surface.)
Comment 3 Mark Janes 2015-01-19 19:59:47 UTC
I applied the patch, which had no effect on the behavior.

It should be easy to reproduce, please let me know if you have trouble making it happen.
Comment 4 Chris Wilson 2015-01-20 10:16:39 UTC
I have glbenchmark2.7.0 available, what is your full commandline to run the test?
Comment 5 Mark Janes 2015-01-20 17:19:34 UTC
./GLBenchmark -w 1920 -h 1080 -ow 1920 -oh 1080 -t GLB27_TRex_C24Z16_FixedTimeStep
Comment 6 Chris Wilson 2015-01-21 09:59:29 UTC
Ok, I spotted the corruption - it seems easiest to spot watching just below halfway down the screen as a green/brown rope extending across the scene.

That is not the corruption I was thinking of and is internal to mesa. It might be the same GEN8_3DSTATE_VF_INSTANCING pollution but in reverse?

i.e. commit 0532a3313ad9c76a6e1d28e8a1c2ea495583fead
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Nov 5 20:11:54 2014 +0000

    sna/gen8: Clear instancing enabled bit between batches
    
    gen8 sets the instancing bit relative to the vertex element, but we were
    clearing it for the vertex buffer. As the maximum number of vertex
    elements is fixed, just clear them all when emitting our header. Note
    that VF_SGVS is not sufficient by itself to disable all side-effects of
    instancing.
    
    Thanks to Kenneth Graunke for pointing out the change from vertex buffer
    to vertex element of the instancing enable bit.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84958
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 7 Chris Wilson 2015-01-21 14:29:52 UTC
Eliminating the use of render copy (by always doing pageflips instead, note that glbenchmark is buggy - the x11_es path sets a borderWidth so that the window never actually fully covers the screen, and only the x11_es path) the artifacts are still visible.
Comment 8 Eero Tamminen 2015-02-12 16:33:05 UTC
Note: I've seen wrongly drawn vertexes also on Windows, on BSW B0, especially with slightly older Windows driver.  They've been visible also in SynMark (batch tests).  -> Could be a missing HW WA.  Make sure you have latest kernel.
Comment 9 Mark Janes 2015-02-12 16:41:15 UTC
Eero,

The kernel we used for reproducing this included braswell mods from bwidask:

http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=braswell

I think his branch point was mid-january.  If you think there have been relevant kernel changes since then, can you reproduce with a more recent kernel?

-Mark
Comment 10 Kenneth Graunke 2015-02-12 20:07:21 UTC
It's easily reproducible on Broadwell using a stock drm-intel-nightly kernel, though.
Comment 11 Eero Tamminen 2015-02-26 17:10:27 UTC
(In reply to Mark Janes from comment #9)
> The kernel we used for reproducing this included braswell mods from bwidask:
> 
> http://cgit.freedesktop.org/~bwidawsk/drm-intel/log/?h=braswell
> 
> I think his branch point was mid-january.  If you think there have been
> relevant kernel changes since then, can you reproduce with a more recent
> kernel?

Haven't tried, but according to Valtteri, latest drm-intel-nightly kernels hang BSW machine with Egypt & T-Rex tests (not with others).
Comment 12 Kenneth Graunke 2015-04-15 08:55:59 UTC
I ran TRex on my Lenovo X250 (Broadwell GT2) using xserver master and xf86-video-modesetting with Glamor for acceleration, and I see the exact same corruption we saw with SNA.

So I think that completely rules out SNA as the source of the problem.  It's probably a Mesa bug of sorts...
Comment 13 Kenneth Graunke 2015-04-15 08:57:35 UTC
INTEL_DEBUG=sync makes the corruption go away (correct rendering).
always_flush_batch=true has no effect (still broken).
Comment 14 Kenneth Graunke 2015-04-15 10:19:45 UTC
Created attachment 115082 [details] [review]
Fix the blitter code; fixes the bug.

When trying to figure out why synchronization mattered, I started playing around with stalling vs. blitting in brw_buffer_subdata().  It turned out stalling always worked, but blitting failed.

INTEL_DEBUG=sync means that the buffer is never busy, so we can simply map it and edit the data, rather than having to do a stall-avoidance BLT.

Broadwell changed the blitter.  Looking at the XY_SRC_COPY_BLT command, it now requires that the source and destination addresses have to be cacheline aligned.  Our BufferSubData calls were performing linear blits with unaligned addresses.  I suspect the unaligned portion was just...not copied...leading to entirely bunk vertex data.

Fixing this in intel_emit_linear_blit() is pretty easy.  We can just use offset % 64 as the X coordinate, and round the address down.  This fixes the bug.

We almost certainly need to alter intelEmitCopyBlit() as well, as untiled images will likely suffer from a similar bug.  Maybe the other intel_blit.c functions, too.  Given that intelEmitCopyBlit() is allowed to fail, we can always just do that when not cacheline aligned.  We may also want to actually adjust the parameters to make it work in some cases.  Not sure.
Comment 15 Chris Wilson 2015-04-15 10:28:21 UTC
Oh, you missed the joy of:

Issue: if the 1st pixel in XY_SRC_COPY  is not CL aligned when SRC or
DST are linear that will cause failure.

https://vthsd.fm.intel.com/hsd/bdwgfx/bug_de/default.aspx?bug_de_id=1912704
Comment 16 Kenneth Graunke 2015-04-23 21:20:11 UTC
Fixed in master with:

commit 8c17d53823c77ac1c56b0548e4e54f69a33285f1
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Wed Apr 15 03:04:33 2015 -0700

    i965: Make intel_emit_linear_blit handle Gen8+ alignment restrictions.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.