Bug 82050 - R9270X pyrit benchmark perf regressions with latest kernel/llvm
Summary: R9270X pyrit benchmark perf regressions with latest kernel/llvm
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-02 12:04 UTC by Andy Furniss
Modified: 2014-10-10 15:23 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
good (47.35 KB, application/octet-stream)
2014-08-05 17:21 UTC, Andy Furniss
Details
bad (252.78 KB, application/octet-stream)
2014-08-05 17:22 UTC, Andy Furniss
Details
Flush HDP cache via the ring on SI (3.17 KB, patch)
2014-08-12 09:03 UTC, Michel Dänzer
Details | Splinter Review
Only flush HDP cache for indirect buffers from userspace (26.49 KB, text/plain)
2014-08-13 09:11 UTC, Michel Dänzer
Details
drm/ttm: move fpfn and lpfn into each placement (45.61 KB, patch)
2014-08-27 07:16 UTC, Michel Dänzer
Details | Splinter Review
drm/radeon: Add RADEON_GEM_CPU_ACCESS BO creation flag (1.74 KB, patch)
2014-08-27 07:16 UTC, Michel Dänzer
Details | Splinter Review
r600g,radeonsi: Inform the kernel if a BO will likely be accessed by the CPU (3.50 KB, patch)
2014-08-27 07:18 UTC, Michel Dänzer
Details | Splinter Review
valley worse pausing with stream buffer change (2.35 MB, image/png)
2014-09-01 09:58 UTC, Andy Furniss
Details
valley better with stream buffer change reverted (2.38 MB, image/png)
2014-09-01 09:59 UTC, Andy Furniss
Details
valley vanilla mesa bad num bytes moved (24.95 KB, image/png)
2014-09-01 14:36 UTC, Andy Furniss
Details
valley better with revert num bytes moved (2.17 MB, image/png)
2014-09-01 14:37 UTC, Andy Furniss
Details
Elemental screen showing vram usage (1.01 MB, image/png)
2014-10-01 11:55 UTC, Andy Furniss
Details

Description Andy Furniss 2014-08-02 12:04:50 UTC
Don't really have time to bisect this but posting FYI.

Maybe pyrit benchmark is the glxgears of benchmarks - if so please say so and I'll ignore it :-)

Status Que for my R9 270X pyrit 0.3.0 since I got it in May has been around -

Computed 75761.96 PMKs/s total.
#1: 'OpenCL-Device 'AMD PITCAIRN'': 73256.9 PMKs/s (RTT 0.8)
#2: 'CPU-Core (SSE2)': 743.3 PMKs/s (RTT 2.9)
#3: 'CPU-Core (SSE2)': 741.3 PMKs/s (RTT 2.8)
#4: 'CPU-Core (SSE2)': 740.2 PMKs/s (RTT 2.9)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)

This was kernel agd5f drm-fixes-3.16-wip

I noticed yesterday after trying agd5f drm-next-3.17-rebased-on-fixes that I lost around 10,000 PMKs/s with the newer kernel.

Both were with yesterdays mesa but llvm was older say a couple of weeks.

Updated llvm and mesa today and it's regressed more on both kernels - 

drm-fixes-3.16-wip

Computed 58233.89 PMKs/s total.
#1: 'OpenCL-Device 'AMD PITCAIRN'': 56076.6 PMKs/s (RTT 1.1)
#2: 'CPU-Core (SSE2)': 753.6 PMKs/s (RTT 2.9)
#3: 'CPU-Core (SSE2)': 754.5 PMKs/s (RTT 2.9)
#4: 'CPU-Core (SSE2)': 753.0 PMKs/s (RTT 2.9)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)

drm-next-3.17-rebased-on-fixes

Computed 52379.38 PMKs/s total.
#1: 'OpenCL-Device 'AMD PITCAIRN'': 48822.1 PMKs/s (RTT 1.2)
#2: 'CPU-Core (SSE2)': 753.8 PMKs/s (RTT 2.9)
#3: 'CPU-Core (SSE2)': 754.9 PMKs/s (RTT 3.0)
#4: 'CPU-Core (SSE2)': 754.8 PMKs/s (RTT 2.9)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)
Comment 1 Alex Deucher 2014-08-04 15:57:50 UTC
Can you bisect?
Comment 2 Michel Dänzer 2014-08-05 09:40:18 UTC
There was a recent change to LLVM which increased conformance with OpenCL floating point semantics at some performance cost. That might explain at least some of the difference.
Comment 3 Tom Stellard 2014-08-05 15:21:08 UTC
(In reply to comment #2)
> There was a recent change to LLVM which increased conformance with OpenCL
> floating point semantics at some performance cost. That might explain at
> least some of the difference.

Pyrit doesn't use any floating-point operations, so this shouldn't be an issue.
Comment 4 Andy Furniss 2014-08-05 16:02:55 UTC
I bisected LLVM and it came up with -

ph4[llvm]$ git bisect good
ee17bf3fd4189d1981a6e908b4519e600ec7b002 is the first bad commit
commit ee17bf3fd4189d1981a6e908b4519e600ec7b002
Author: Matt Arsenault <Matthew.Arsenault@amd.com>
Date:   Fri Jul 25 23:02:42 2014 +0000

    R600/SI: Allow partial unrolling and increase thresholds.
    
    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@213985 91177308-0d34-0410-b5e6-96231b3b80d8

I don't know when I'll get to do kernel yet.
Comment 5 Tom Stellard 2014-08-05 16:22:48 UTC
Can you post the output of R600_DEBUG=cs from both the "good" and "bad" commits?
Comment 6 Andy Furniss 2014-08-05 17:21:17 UTC
Created attachment 104082 [details]
good
Comment 7 Andy Furniss 2014-08-05 17:22:18 UTC
Created attachment 104083 [details]
bad
Comment 8 Andy Furniss 2014-08-05 23:23:26 UTC
kernel -

fb240a2534802a86742db51b7334138675bc435e is the first bad commit
commit fb240a2534802a86742db51b7334138675bc435e
Author: Michel Dänzer <michel.daenzer@amd.com>
Date:   Thu Jul 31 18:43:49 2014 +0900

    drm/radeon: Always flush the HDP cache before submitting a CS to the GPU
    
    This ensures the GPU sees all previous CPU writes to VRAM, which makes it
    safe:
    
    * For userspace to stream data from CPU to GPU via VRAM instead of GTT
    * For IBs to be stored in VRAM instead of GTT
    * For ring buffers to be stored in VRAM instead of GTT, if the HPD flush
      is performed via MMIO
    
    Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Comment 9 Michel Dänzer 2014-08-12 09:03:39 UTC
Created attachment 104475 [details] [review]
Flush HDP cache via the ring on SI

Does this patch help for the kernel regression?

Though this seems to make some piglit test results unstable...
Comment 10 Andy Furniss 2014-08-12 11:47:47 UTC
(In reply to comment #9)
> Created attachment 104475 [details] [review] [review]
> Flush HDP cache via the ring on SI
> 
> Does this patch help for the kernel regression?
> 
> Though this seems to make some piglit test results unstable...

There is no difference with this.

FWIW I found another regression with this kernel that is caused by the same commit.

Maybe regression is the wrong word, as there was already an issue, just it's worse now.

I will file a separate bug in time (was planning to do new xorg and retest first) but in summary -

Unigine Valley always did have some 1/2 to 1 sec pauses ever since I could run it, first on HD4890 and now radeonsi R9 270X.

Since this kernel commit they are 2 to 4 times longer - also unchanged by patch.

Strangely, if I use ffmpegs x11 grab to make a recording @ 30fps they become short again.
Comment 11 Michel Dänzer 2014-08-13 09:11:52 UTC
Created attachment 104549 [details]
Only flush HDP cache for indirect buffers from userspace

Does this patch help?
Comment 12 Andy Furniss 2014-08-13 15:08:16 UTC
(In reply to comment #11)
> Created attachment 104549 [details]
> Only flush HDP cache for indirect buffers from userspace
> 
> Does this patch help?

No, I'm afraid that doesn't help either.

Valley is the same - pyrit only slightly different, probably within random variation. 

I am testing with "bad" llvm so the numbers are all low.
As I recorded them here's a paste of pyrit good (kernel), head, patch 1 and patch 2

On good -

Running benchmark (57982.3 PMKs/s)... / 

Computed 58917.21 PMKs/s total.
#1: 'OpenCL-Device 'AMD PITCAIRN'': 55101.4 PMKs/s (RTT 1.1)
#2: 'CPU-Core (SSE2)': 757.2 PMKs/s (RTT 2.9)
#3: 'CPU-Core (SSE2)': 756.0 PMKs/s (RTT 3.0)
#4: 'CPU-Core (SSE2)': 755.5 PMKs/s (RTT 2.8)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)


On head

Running benchmark (50267.7 PMKs/s)... \ 

Computed 50096.30 PMKs/s total.
#1: 'OpenCL-Device 'AMD PITCAIRN'': 48501.1 PMKs/s (RTT 1.2)
#2: 'CPU-Core (SSE2)': 757.5 PMKs/s (RTT 2.9)
#3: 'CPU-Core (SSE2)': 757.1 PMKs/s (RTT 2.9)
#4: 'CPU-Core (SSE2)': 757.0 PMKs/s (RTT 2.9)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)

Head + patch one

Running benchmark (50883.7 PMKs/s)... - 

Computed 51220.59 PMKs/s total.
#1: 'OpenCL-Device 'AMD PITCAIRN'': 48583.5 PMKs/s (RTT 1.2)
#2: 'CPU-Core (SSE2)': 756.5 PMKs/s (RTT 3.0)
#3: 'CPU-Core (SSE2)': 756.0 PMKs/s (RTT 2.9)
#4: 'CPU-Core (SSE2)': 754.2 PMKs/s (RTT 2.9)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)

Head + patch two

Running benchmark (51348.9 PMKs/s)... | 

Computed 50781.53 PMKs/s total.
#1: 'OpenCL-Device 'AMD PITCAIRN'': 48676.9 PMKs/s (RTT 1.2)
#2: 'CPU-Core (SSE2)': 752.4 PMKs/s (RTT 2.9)
#3: 'CPU-Core (SSE2)': 755.4 PMKs/s (RTT 2.9)
#4: 'CPU-Core (SSE2)': 752.8 PMKs/s (RTT 2.9)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)
Comment 13 Michel Dänzer 2014-08-14 01:22:01 UTC
Does reverting Mesa commit 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 help for Valley or pyrit with the latest kernel?
Comment 14 Andy Furniss 2014-08-14 11:51:22 UTC
(In reply to comment #13)
> Does reverting Mesa commit 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 help for
> Valley or pyrit with the latest kernel?

Yes, with that reverted perf is roughly back to "good" kernel for both.
Comment 15 Niels Ole Salscheider 2014-08-14 18:24:48 UTC
Does pyrit transfer much data from the GPU to the CPU? If so, my patch "gallium/radeon: Do not use u_upload_mgr for buffer downloads" that I have just sent to the mesa-dev mailing list might help...
Comment 16 Andy Furniss 2014-08-14 20:28:43 UTC
(In reply to comment #15)
> Does pyrit transfer much data from the GPU to the CPU? If so, my patch
> "gallium/radeon: Do not use u_upload_mgr for buffer downloads" that I have
> just sent to the mesa-dev mailing list might help...

It does help pyrit, but as expected I guess, not valley.
Comment 17 Michel Dänzer 2014-08-19 07:15:12 UTC
(In reply to comment #14)
> Yes, with that reverted perf is roughly back to "good" kernel for both.

Can you try restoring the old behaviour for only PIPE_USAGE_DYNAMIC or PIPE_USAGE_STREAM respectively, to see if one of them alone fixes the problem in Valley?
Comment 18 Tom Stellard 2014-08-19 18:32:49 UTC
Does the pyrit benchmark include compile time when calculating PMKs/s ? The patch you've bisected unrolls a loop that makes the pyrit kernel really big, so it will take longer to compile.

Is it possible to run the benchmark for longer?  If so, does the gap between good and bad shrink?
Comment 19 Andy Furniss 2014-08-19 22:06:57 UTC
(In reply to comment #17)
> (In reply to comment #14)
> > Yes, with that reverted perf is roughly back to "good" kernel for both.
> 
> Can you try restoring the old behaviour for only PIPE_USAGE_DYNAMIC or
> PIPE_USAGE_STREAM respectively, to see if one of them alone fixes the
> problem in Valley?

Stream as below gets the old behavior, doing below with dynamic makes no difference AFAICT.

Testing this is subjective as the pauses stop the clock in benchmark mode, so don't show and of course "working" is still somewhat broken :-).

diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c b/src/gallium/drivers/radeon/r600_buffer_common.c
index 22bc97e..9262823 100644
--- a/src/gallium/drivers/radeon/r600_buffer_common.c
+++ b/src/gallium/drivers/radeon/r600_buffer_common.c
@@ -110,11 +110,12 @@ bool r600_init_resource(struct r600_common_screen *rscreen,
        enum radeon_bo_flag flags = 0;
 
        switch (res->b.b.usage) {
+       case PIPE_USAGE_STREAM:
+               flags = RADEON_FLAG_GTT_WC;
        case PIPE_USAGE_STAGING:
                /* Transfers are likely to occur more often with these resources. */
                res->domains = RADEON_DOMAIN_GTT;
                break;
-       case PIPE_USAGE_STREAM:
        case PIPE_USAGE_DYNAMIC:
                /* Older kernels didn't always flush the HDP cache before
                 * CS execution
Comment 20 Andy Furniss 2014-08-19 22:22:42 UTC
(In reply to comment #18)
> Does the pyrit benchmark include compile time when calculating PMKs/s ? The
> patch you've bisected unrolls a loop that makes the pyrit kernel really big,
> so it will take longer to compile.
> 
> Is it possible to run the benchmark for longer?  If so, does the gap between
> good and bad shrink?

It runs for 1min 16s as is, TBH I don't use/know pyrit - the only reason I have it is when I got my radeonsi and was reading how to get opencl there was a link to it as an example of a working app in the wiki.

It does seem to build up speed as time progresses, but now it's slower it seems to plateau slightly longer before the end than it used to.

Maybe it's just not a representative test - hence my query about glxgears, I haven't found any "real" opencl use to benchmark yet - x264 would be nice, but it seems it needs things not yet implemented.
Comment 21 Michel Dänzer 2014-08-27 07:16:12 UTC
Created attachment 105316 [details] [review]
drm/ttm: move fpfn and lpfn into each placement
Comment 22 Michel Dänzer 2014-08-27 07:16:55 UTC
Created attachment 105317 [details] [review]
drm/radeon: Add RADEON_GEM_CPU_ACCESS BO creation flag
Comment 23 Michel Dänzer 2014-08-27 07:18:32 UTC
Created attachment 105318 [details] [review]
r600g,radeonsi: Inform the kernel if a BO will likely be accessed by the CPU

Does this Mesa patch (instead of the PIPE_USAGE_STREAM change) together with the previous two kernel patches I attached help Valley?
Comment 24 Christian König 2014-08-27 08:07:21 UTC
(In reply to comment #22)
> Created attachment 105317 [details] [review] [review]
> drm/radeon: Add RADEON_GEM_CPU_ACCESS BO creation flag

Just a general note: We need to define that flag negated for compatibility reasons. E.g. RADEON_GEM_NO_CPU_ACCESS because code must assume with an old client that the buffer is always CPU accessed.
Comment 25 Michel Dänzer 2014-08-27 08:45:44 UTC
(In reply to comment #24)
> Just a general note: We need to define that flag negated for compatibility
> reasons. E.g. RADEON_GEM_NO_CPU_ACCESS because code must assume with an old
> client that the buffer is always CPU accessed.

No, CPU access works fine even with old clients which don't set the flag.

The flag is just an optimization, preventing BOs which are expected to be accessed by the CPU from being stored in the CPU-inaccessible part of VRAM.
Comment 26 Andy Furniss 2014-08-27 18:42:52 UTC
(In reply to comment #23)
> Created attachment 105318 [details] [review] [review]
> r600g,radeonsi: Inform the kernel if a BO will likely be accessed by the CPU
> 
> Does this Mesa patch (instead of the PIPE_USAGE_STREAM change) together with
> the previous two kernel patches I attached help Valley?

No difference with those.

The big kernel patch didn't apply on drm-fixes-3.17-wip, but it only failed in noveau so I deleted that from it.
Comment 27 Michel Dänzer 2014-08-28 06:59:19 UTC
(In reply to comment #26)
> No difference with those.

Bummer, thanks for testing anyway.

I submitted the change reverting the behaviour of PIPE_USAGE_STREAM for review, but it's strange: I couldn't notice any significant difference in stutter in Valley regardless of any of these changes.

BTW, what CPU are you using?
Comment 28 Andy Furniss 2014-08-28 09:01:47 UTC
(In reply to comment #27)
> (In reply to comment #26)
> > No difference with those.
> 
> Bummer, thanks for testing anyway.
> 
> I submitted the change reverting the behaviour of PIPE_USAGE_STREAM for
> review, but it's strange: I couldn't notice any significant difference in
> stutter in Valley regardless of any of these changes.
> 
> BTW, what CPU are you using?

It's an AMD Phenom II x4 965be.

I always set cpufreq ondemand to perf when testing so it's forced @ 3.4GHz.

When I noticed that ffmpeg x11grab made the pauses "normal" length I did try a different test with cpus loaded by compiling, but this didn't do it.
Comment 29 smoki 2014-08-28 12:19:25 UTC
 I also reported on irc Valley stutter on Kabini, but now i am somhow against reverting because performance suffer with reverting in other games.

 One other reason simply because i tested it first time on Windows today and there i have stutter even worse then then any case we have here :D. And i did't know that :D DX11/DX9/OpenGL any mode all stutter a lot. Our worst combination is a lot smoother than with Catalyst on Windows :).

 So question is, is there other stuter examples than Unigine Valley?
Comment 30 Mathieu Belanger 2014-08-28 17:50:11 UTC
Yes.

Minecraft is unplayable with latest Kernel+latest Mesa.

In the beginning, it's smooth.. after 30 sec or so it start to stuter a little... By the five minutes mark. if you move in the game, it pause for like 5 to 7 seconde, move for 2 secondes, pause for 5 secondes...

By pause I mean the whole system pause, mouse, other terminals.. Everything.

I think it's related to this bug, it started with the kernel 3.17 (Like the OP, I did update mesa, llvm, libdrm, glamor...)

going back to 3.16 did not fix it, I got to downgrade Mesa and LLVM too (to the first relase of the first of this month to be sure) and after that, I played for like 4 hours without any issue.

I tried to revert 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 but I got massive gfx corruptions.
Comment 31 smoki 2014-08-29 01:01:00 UTC
(In reply to comment #30)
> Yes.
> 
> Minecraft is unplayable with latest Kernel+latest Mesa.
> 
> In the beginning, it's smooth.. after 30 sec or so it start to stuter a
> little... By the five minutes mark. if you move in the game, it pause for
> like 5 to 7 seconde, move for 2 secondes, pause for 5 secondes...
> 
> By pause I mean the whole system pause, mouse, other terminals.. Everything.
> 
> I think it's related to this bug, it started with the kernel 3.17 (Like the
> OP, I did update mesa, llvm, libdrm, glamor...)
> 
> going back to 3.16 did not fix it, I got to downgrade Mesa and LLVM too (to
> the first relase of the first of this month to be sure) and after that, I
> played for like 4 hours without any issue.
> 
> I tried to revert 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 but I got massive
> gfx corruptions.

 I only meant about XYZ game which stutter with, and where reverting 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938 helps, if game stutter even with/without reverting than that is i think another issue :).
Comment 32 Michel Dänzer 2014-09-01 06:23:57 UTC
(In reply to comment #30)

Mathieu, it sounds like your problem isn't related to this report. Please file your own report, and it would be great if you could bisect Mesa or the kernel.
Comment 33 Michel Dänzer 2014-09-01 07:11:16 UTC
(In reply to comment #28)
> > I submitted the change reverting the behaviour of PIPE_USAGE_STREAM for
> > review, but it's strange: I couldn't notice any significant difference in
> > stutter in Valley regardless of any of these changes.

Also, according to GALLIUM_HUD=requested-VRAM+VRAM-usage,requested-GTT+GTT-usage, Valley only seems to allocate about 10-20 MB for streaming BOs, so I'm not sure why putting them in VRAM or not makes such a big difference for you.


> > BTW, what CPU are you using?
> 
> It's an AMD Phenom II x4 965be.

I assume the chipset for that doesn't support PCIe 3.0, does it? I wonder if maybe streaming BOs should be in VRAM with PCIe 3.0 but not with PCIe 2.0.
Comment 34 Andy Furniss 2014-09-01 09:16:26 UTC
(In reply to comment #29)
>  I also reported on irc Valley stutter on Kabini, but now i am somhow
> against reverting because performance suffer with reverting in other games.
> 
>  One other reason simply because i tested it first time on Windows today and
> there i have stutter even worse then then any case we have here :D. And i
> did't know that :D DX11/DX9/OpenGL any mode all stutter a lot. Our worst
> combination is a lot smoother than with Catalyst on Windows :).

I tried on Windows with the same settings and you are right that there are stutters. For me they are about 10x shorter than my best Linux case, which means that some effectively don't exist and the ones that do are more like a frame or two. It does I guess illustrate that Valley may be doing something stupid - some are in the same places I see on Linux.

>  So question is, is there other stuter examples than Unigine Valley?

Will have to test more - there are some with Initially with Unreal Reflections.

I am using a pure 64bit setup, which means I don't get to test steam or etqw - you may have a point that if only Valley is really bad and other things gain Valley may be an exception that could be sacrificed.
Comment 35 Andy Furniss 2014-09-01 09:54:38 UTC
(In reply to comment #33)
> (In reply to comment #28)
> > > I submitted the change reverting the behaviour of PIPE_USAGE_STREAM for
> > > review, but it's strange: I couldn't notice any significant difference in
> > > stutter in Valley regardless of any of these changes.
> 
> Also, according to
> GALLIUM_HUD=requested-VRAM+VRAM-usage,requested-GTT+GTT-usage, Valley only
> seems to allocate about 10-20 MB for streaming BOs, so I'm not sure why
> putting them in VRAM or not makes such a big difference for you.
> 
> 
> > > BTW, what CPU are you using?
> > 
> > It's an AMD Phenom II x4 965be.
> 
> I assume the chipset for that doesn't support PCIe 3.0, does it? I wonder if
> maybe streaming BOs should be in VRAM with PCIe 3.0 but not with PCIe 2.0.

Yea I am PCIE 2.0.

Other settings which may or may not be relavent -

vblank_mode=0, swapbufferswait off, 1920x1080 fullscreen, quality high, antialiasing off.

I tried with hud and see 10-20MB requested with the stream change reverted and 8kb with it.

The fps counter on hud does show the pauses - though even the good case looks bad on that - but the biggest pauses it shows are between scenes when the screen has faded to black, I guess you kind of expect something to be loading then.

I'll upload a couple of screens.
Comment 36 Andy Furniss 2014-09-01 09:58:48 UTC
Created attachment 105543 [details]
valley worse pausing with stream buffer change
Comment 37 Andy Furniss 2014-09-01 09:59:55 UTC
Created attachment 105544 [details]
valley better with stream buffer change reverted
Comment 38 Andy Furniss 2014-09-01 10:09:11 UTC
(In reply to comment #36)
> Created attachment 105543 [details]
> valley worse pausing with stream buffer change

I notice that I seem to be pegged more to a single core on this one.
Comment 39 smoki 2014-09-01 11:59:37 UTC
(In reply to comment #34)
> I tried on Windows with the same settings and you are right that there are
> stutters. For me they are about 10x shorter than my best Linux case, which
> means that some effectively don't exist and the ones that do are more like a
> frame or two. It does I guess illustrate that Valley may be doing something
> stupid - some are in the same places I see on Linux.

 Actually there is workaround on Windows by not using Aero, but some Basic theme So driver has problems with Aero or Aero with the driver and this app i don't know much about Windows i don't use it much of the time.

 If you let it run with Basic theme and few rounds you will spot i guess that only first round there is unusual maybe 2-3 times 1-2 sec sttuters, then second time and later it is stutter free.

 All in all people must not use Aero when play Valley, so app even on Windows is not 100% trouble free :)
Comment 40 smoki 2014-09-01 12:59:19 UTC
(In reply to comment #39) 
>  If you let it run with Basic theme and few rounds you will spot i guess
> that only first round there is unusual maybe 2-3 times 1-2 sec stutters,
> then second time and later it is stutter free.

 Andy, you have behavior like that (more or less those seconds for stuter) if PIPE_USAGE_STREAM is reverted, right?

 Then maybe that is the right way to go, global performance will suffer a little but if nothing better can't be done then revert of PIPE_USAGE_STREAM is OK :)
Comment 41 Andy Furniss 2014-09-01 13:20:15 UTC
(In reply to comment #40)
> (In reply to comment #39) 
> >  If you let it run with Basic theme and few rounds you will spot i guess
> > that only first round there is unusual maybe 2-3 times 1-2 sec stutters,
> > then second time and later it is stutter free.
> 
>  Andy, you have behavior like that (more or less those seconds for stuter)
> if PIPE_USAGE_STREAM is reverted, right?

The Bad was with vanilla mesa (couple of days old)

The good was that + the patch in Comment 19

I looked again at Unreal Reflections - there is a difference but it's only right at the start, both have a couple of stutters and they are longer with vanilla then the rest is OK in both cases.
Comment 42 Andy Furniss 2014-09-01 14:33:15 UTC
Playing more with hud I can see that there is a 1 to 1 correlation between the pauses and spikes in num-bytes-moved. The scale on the graphs did get squashed a bit by outliers, which seemed a bit random sometimes - I saw 330 MB on one run - but anyway here's a couple of screens  - I got these using sleep 120 && xwd -root ... the first one landed on a scene change so is black.
Comment 43 Andy Furniss 2014-09-01 14:36:03 UTC
Created attachment 105563 [details]
valley vanilla mesa bad num bytes moved
Comment 44 Andy Furniss 2014-09-01 14:37:05 UTC
Created attachment 105564 [details]
valley better with revert num bytes moved
Comment 45 smoki 2014-09-01 14:59:02 UTC
(In reply to comment #41)
> (In reply to comment #40)
> > (In reply to comment #39) 
> > >  If you let it run with Basic theme and few rounds you will spot i guess
> > > that only first round there is unusual maybe 2-3 times 1-2 sec stutters,
> > > then second time and later it is stutter free.
> > 
> >  Andy, you have behavior like that (more or less those seconds for stuter)
> > if PIPE_USAGE_STREAM is reverted, right?
> 
> The Bad was with vanilla mesa (couple of days old)
> 
> The good was that + the patch in Comment 19

 I asked is Valley play the same with your good case here and with using Basic theme in Windows :). That is the case for me, and average fps is around 80% in comparasion.
 
> I looked again at Unreal Reflections - there is a difference but it's only
> right at the start, both have a couple of stutters and they are longer with
> vanilla then the rest is OK in both cases.

 Those Unreal 4 Engine linux demos are slide show fest on Kabini, so i can't recognize if there is stutter between two frames :D. Iguess those simply needs at least 10X+ more powerfull GPU then i have.
Comment 46 Michel Dänzer 2014-09-02 09:20:24 UTC
I've seen some stutters without any corresponding buffer moves though. Still not sure why it's stuttering so bad sometimes.

BTW, Andy, does the stuttering also seem to get better for you if you run Valley repeatedly?
Comment 47 Andy Furniss 2014-09-02 09:28:22 UTC
(In reply to comment #45)
> (In reply to comment #41)
> > (In reply to comment #40)
> > > (In reply to comment #39) 
> > > >  If you let it run with Basic theme and few rounds you will spot i guess
> > > > that only first round there is unusual maybe 2-3 times 1-2 sec stutters,
> > > > then second time and later it is stutter free.
> > > 
> > >  Andy, you have behavior like that (more or less those seconds for stuter)
> > > if PIPE_USAGE_STREAM is reverted, right?
> > 
> > The Bad was with vanilla mesa (couple of days old)
> > 
> > The good was that + the patch in Comment 19
> 
>  I asked is Valley play the same with your good case here and with using
> Basic theme in Windows :). That is the case for me, and average fps is
> around 80% in comparasion.

Next time I'm in Windows I'll try changing desktop - but as I said, with default desktop valley is 10x better than my best Linux case and that was the one and only run I did.
Comment 48 Andy Furniss 2014-09-02 10:44:20 UTC
(In reply to comment #46)
> I've seen some stutters without any corresponding buffer moves though. Still
> not sure why it's stuttering so bad sometimes.
> 
> BTW, Andy, does the stuttering also seem to get better for you if you run
> Valley repeatedly?

No, it's quite consistent if I quit and re-run.

The amount moved doesn't seem to correlate with the length of pause - and sometimes there are small moves without stutter, so maybe it's not totally this.

Looking at Heaven 4.0 there are no moves at all after load, but there are a few very brief stutters on the night scenes - these are the same with or without patch though.

What does num-bytes-moved measure - from where to where?
Comment 49 Marek Olšák 2014-09-02 11:06:29 UTC
(In reply to comment #48)
> (In reply to comment #46)
> > I've seen some stutters without any corresponding buffer moves though. Still
> > not sure why it's stuttering so bad sometimes.
> > 
> > BTW, Andy, does the stuttering also seem to get better for you if you run
> > Valley repeatedly?
> 
> No, it's quite consistent if I quit and re-run.
> 
> The amount moved doesn't seem to correlate with the length of pause - and
> sometimes there are small moves without stutter, so maybe it's not totally
> this.
> 
> Looking at Heaven 4.0 there are no moves at all after load, but there are a
> few very brief stutters on the night scenes - these are the same with or
> without patch though.
> 
> What does num-bytes-moved measure - from where to where?

The HUD always displays an average value per frame. It's the average of all values between the current and the last update of the HUD.
Comment 50 Andy Furniss 2014-09-02 11:55:15 UTC
(In reply to comment #49)
> (In reply to comment #48)
> > (In reply to comment #46)

> > What does num-bytes-moved measure - from where to where?
> 
> The HUD always displays an average value per frame. It's the average of all
> values between the current and the last update of the HUD.

Ahh, so the fact that HUD stops rendering during the pauses means that spikes are likely anyway.

Though my question wasn't really about the HUD as such, I was wondering where they were moving to/from - I guess the answer may be too obvious, but just to confirm.

I assume it's across PCIE to the card (or maybe from/both) - is it DMA or CPU transfer? Is it dependent on app behavior or driver - eg. running Unigine Reflections I saw a blip in the graph first run, but not again.
Comment 51 Marek Olšák 2014-09-02 12:43:30 UTC
num-bytes-moved comes from TTM. It's the size of all buffer moves done by TTM. This usually happens during command submission if VRAM is full.
Comment 52 Andy Furniss 2014-09-02 22:25:31 UTC
Just updated llvm and my perf on pyrit is back to normal -

Computed 77586.36 PMKs/s total.
#1: 'OpenCL-Device 'AMD PITCAIRN'': 73865.3 PMKs/s (RTT 0.8)
#2: 'CPU-Core (SSE2)': 744.3 PMKs/s (RTT 2.9)
#3: 'CPU-Core (SSE2)': 746.4 PMKs/s (RTT 3.0)
#4: 'CPU-Core (SSE2)': 745.7 PMKs/s (RTT 2.9)
#5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)
Comment 53 Andy Furniss 2014-09-02 23:39:06 UTC
(In reply to comment #52)
> Just updated llvm and my perf on pyrit is back to normal -
> 
> Computed 77586.36 PMKs/s total.
> #1: 'OpenCL-Device 'AMD PITCAIRN'': 73865.3 PMKs/s (RTT 0.8)
> #2: 'CPU-Core (SSE2)': 744.3 PMKs/s (RTT 2.9)
> #3: 'CPU-Core (SSE2)': 746.4 PMKs/s (RTT 3.0)
> #4: 'CPU-Core (SSE2)': 745.7 PMKs/s (RTT 2.9)
> #5: 'Network-Clients': 0.0 PMKs/s (RTT 0.0)

Not llvm it's mesa -

radeonsi: Compile dummy pixel shader on demand
Comment 54 Michel Dänzer 2014-09-03 10:08:21 UTC
(In reply to comment #53)
> > Just updated llvm and my perf on pyrit is back to normal -
[...]
> Not llvm it's mesa -
> 
> radeonsi: Compile dummy pixel shader on demand

Sounds like pyrit ends up creating a lot of Gallium contexts. You might get even better performance with the LLVM regression fixed.
Comment 55 Michel Dänzer 2014-09-03 14:04:38 UTC
BTW, this could also mean that the pyrit performance regression was simply due to LLVM now taking slightly longer to compile a shader.
Comment 56 Andy Furniss 2014-09-03 15:26:13 UTC
(In reply to comment #55)
> BTW, this could also mean that the pyrit performance regression was simply
> due to LLVM now taking slightly longer to compile a shader.

The llvm commit still reverts cleanly, so I tested and didn't gain anything significant.
Comment 57 Christoph Haag 2014-09-30 08:14:18 UTC
So almost a month has gone by... I'm trying drm-next-3.18 and mesa git and many unreal engine demos are still broken like this:
https://www.youtube.com/watch?v=NvgA9_B0dMo
(ignore the excessive jumpy frames that come from dri3 offloading)

R600_DEBUG=nodma does not help by the way.

Has there been any progress?
Comment 58 Michel Dänzer 2014-09-30 08:38:49 UTC
(In reply to comment #57)
> I'm trying drm-next-3.18 and mesa git and many unreal engine demos are still
> broken like this:
> https://www.youtube.com/watch?v=NvgA9_B0dMo

Are you sure that's directly related to the Unigine Heaven stuttering discussed in this report? E.g., does reverting the Mesa commit in question help, or do you see similar symptoms in the Gallium HUD?
Comment 59 Christoph Haag 2014-09-30 10:52:52 UTC
(In reply to comment #58)
> (In reply to comment #57)
> > I'm trying drm-next-3.18 and mesa git and many unreal engine demos are still
> > broken like this:
> > https://www.youtube.com/watch?v=NvgA9_B0dMo
> 
> Are you sure that's directly related to the Unigine Heaven stuttering
> discussed in this report? E.g., does reverting the Mesa commit in question
> help, or do you see similar symptoms in the Gallium HUD?

It does look like the same symptoms. Only rare and short hangs in unigine heaven, but frequent hangs of ~1-2 sconds in unigine valley. The HUD shows that these hangs mostly correlate with jumps in vram/gtt usage.

Is the mesa commit in question 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938? If so, it doesn't revert cleanly anymore, but I can have a look if I can manually see how to do it.
Comment 60 Andy Furniss 2014-09-30 13:32:25 UTC
(In reply to comment #59)
> (In reply to comment #58)
> > (In reply to comment #57)
> > > I'm trying drm-next-3.18 and mesa git and many unreal engine demos are still
> > > broken like this:
> > > https://www.youtube.com/watch?v=NvgA9_B0dMo
> > 
> > Are you sure that's directly related to the Unigine Heaven stuttering
> > discussed in this report? E.g., does reverting the Mesa commit in question
> > help, or do you see similar symptoms in the Gallium HUD?
> 
> It does look like the same symptoms. Only rare and short hangs in unigine
> heaven, but frequent hangs of ~1-2 sconds in unigine valley. The HUD shows
> that these hangs mostly correlate with jumps in vram/gtt usage.
> 
> Is the mesa commit in question 150ac07b855b5c5f879bf6ce9ca421ccd1a6c938? If
> so, it doesn't revert cleanly anymore, but I can have a look if I can
> manually see how to do it.

Some Unreal are OK for me after a glitchy start.

I can reproduce what you see with Scifi hallway and Elemental - the latter is very bad, the former did come good after a while.

I hadn't "seen" these two before today, as in the past they just bailed with an llvm error.

I'll try, when I have time, to see if they are better with the revert.
Comment 61 Michel Dänzer 2014-10-01 01:36:49 UTC
Note that some of the Unreal Engine 4 demos want to use more than 1G of graphics memory (as shown by the GALLIUM_HUD queries requested-VRAM and requested-GTT), so if the GPU has 'only' 1G of VRAM or less, that's a very difficult situation for the graphics memory management code.
Comment 62 Andy Furniss 2014-10-01 11:54:57 UTC
(In reply to comment #61)
> Note that some of the Unreal Engine 4 demos want to use more than 1G of
> graphics memory (as shown by the GALLIUM_HUD queries requested-VRAM and
> requested-GTT), so if the GPU has 'only' 1G of VRAM or less, that's a very
> difficult situation for the graphics memory management code.

I do have 2 gig, but looking at the screenshot of elemantal to be attached I see that used and requested differ.

This shot doesn't really show how long the pauses are - they are really bad, it takes about 2 minutes to render the first few frames with pauses of many seconds after that.

It's so bad it's hard to tell whether the revert helps - probably not, I guess it's something different. Have you tried Elemental?
Comment 63 Andy Furniss 2014-10-01 11:55:54 UTC
Created attachment 107183 [details]
Elemental screen showing vram usage
Comment 64 Christoph Haag 2014-10-01 13:37:55 UTC
(In reply to comment #61)
> Note that some of the Unreal Engine 4 demos want to use more than 1G of
> graphics memory (as shown by the GALLIUM_HUD queries requested-VRAM and
> requested-GTT), so if the GPU has 'only' 1G of VRAM or less, that's a very
> difficult situation for the graphics memory management code.

I too have 2 Gigabyte VRAM. Here is a short clip with some HUD graphs:
https://www.youtube.com/watch?v=vvqbAFV06pA

It's pretty clear that the stutters correlate with activity in "num bytes moved"...

I have also tried the native borderlands 2 for a few minutes today and I'm seeing similar stuttering. It doesn't happen quite so often, but it's still often enough to be an issue.
Comment 65 smoki 2014-10-01 14:12:58 UTC
 
 So is that revert helps http://lists.freedesktop.org/archives/mesa-dev/2014-August/066746.html Keep in mind that revert broke 32bit complitely, lot of corruption :)

  About performance for UE4 demos i can't say a lot, this is on Kabini :) Or if not reverting just try how it goes with kernel 3.16 that should not hit this stuttering. And watch requested-VRAM at begening of these demos, for me that is actually higher then with kernel 3.17 or 3.18-next for the same app :). I am not trying other demos , but seems like newer kernels requests more VRAM from the apps :)
Comment 66 smoki 2014-10-01 14:14:43 UTC
 I mean kernel 3.16 requested-VRAM is lower, then with 3.17+ kernels :D
Comment 67 Andy Furniss 2014-10-01 14:24:13 UTC
(In reply to comment #64)

> It's pretty clear that the stutters correlate with activity in "num bytes
> moved"...

I brought this up earlier and as was explained way the graphing/counting works may mean this is not related.

In summary AIUI the fact there is a pause causes a spike because the count is from the last frame rendered - which is way longer than normal due to the pause.
Comment 68 smoki 2014-10-01 14:27:10 UTC
 Offtopic... but if someone has sound crackling in those UE4 demos (at least Elemental and Vehicle, demos i tried) that is probably because openal 1.15 they shipped, 1.14 an 1.16 works fine for me... Sorry for offtopic,  but there are bugs all over the place and might be related, one never knows :)
Comment 69 smoki 2014-10-01 15:36:44 UTC
 @Andy 

 Oops didn't notice... Elemental demo makes GPU faults for me, is it the same for you or if you have assertation enabled llvm... there is a bug 82544 Michel filled.
Comment 70 Andy Furniss 2014-10-01 15:58:22 UTC
(In reply to comment #69)
>  @Andy 
> 
>  Oops didn't notice... Elemental demo makes GPU faults for me, is it the
> same for you or if you have assertation enabled llvm... there is a bug 82544
> Michel filled.

No gpu faults for me, llvm used to assert but not now, sound seems OK using alsa (I don't have pulse).
Comment 71 smoki 2014-10-01 16:04:44 UTC
(In reply to comment #70)
> (In reply to comment #69)
> >  @Andy 
> > 
> >  Oops didn't notice... Elemental demo makes GPU faults for me, is it the
> > same for you or if you have assertation enabled llvm... there is a bug 82544
> > Michel filled.
> 
> No gpu faults for me, llvm used to assert but not now, sound seems OK using
> alsa (I don't have pulse).

 Yeah you are right, i tested with llvm 3.5 that, runing 3.6svn normaly... Michel should close that on i guess.

 About openal, yes i also use plain alsa no pulse, but have sound crackling with openal 1.15 with any game which ship that and also one which is in Debian sid... does not happen with 1.14 or 1.16 so i basically i replace it with mine 1.16... but OK that does not matter maybe that is only for me :)
Comment 72 Michel Dänzer 2014-10-02 07:11:55 UTC
(In reply to comment #62)
> I do have 2 gig, but looking at the screenshot of elemantal to be attached I
> see that used and requested differ.

That's probably because of VRAM fragmentation. (BTW, I find it easier to keep track of this with requested-VRAM+VRAM-usage,requested-GTT+GTT-usage instead of requested-VRAM+requested-GTT,VRAM-usage+GTT-usage)


> This shot doesn't really show how long the pauses are - they are really bad,
> it takes about 2 minutes to render the first few frames with pauses of many
> seconds after that.
> 
> It's so bad it's hard to tell whether the revert helps - probably not, I
> guess it's something different. Have you tried Elemental?

Yes, but even on Kaveri with only 1G of VRAM, it doesn't take two minutes for it to get going, and I don't notice such long pauses either.

So I think it's better if we track the UE4 issues in a separate report, and it would be great if you guys could bisect the kernel or Mesa for that.


(In reply to comment #65)
> Keep in mind that revert broke 32bit complitely, lot of corruption :)

I haven't been able to reproduce that. If you still can, please file a bug for it, as there's nothing preventing the kernel from using GTT instead of VRAM when the latter is full.


> I am not trying other demos , but seems like newer kernels requests more
> VRAM from the apps :)

The Mesa commit in question makes the r600g and radeonsi drivers try to use VRAM for more things, but only with newer kernels, because older kernels didn't guarantee reliability when using VRAM for those things.
Comment 73 Michel Dänzer 2014-10-02 07:17:24 UTC
(In reply to Andy Furniss from comment #67)
> In summary AIUI the fact there is a pause causes a spike because the count
> is from the last frame rendered - which is way longer than normal due to the
> pause.

Still, it means that *some* BOs were moved during the pause, so it's not impossible that the pause is somehow related to the BO moves.

BTW, make sure CONFIG_CMA isn't enabled in your kernels, in particular those using Ubuntu.


(In reply to smoki from comment #68)
>  Offtopic...

Please don't clutter up bug reports with off-topic comments.
Comment 74 Christoph Haag 2014-10-02 11:14:15 UTC
Well.

I have said that I used drm-next-3.18 and had these hangs.
When I applied http://lists.freedesktop.org/archives/mesa-dev/2014-August/066746.html it did not help.

Now I am using 3.17-rc7 with that mesa patch and I do not see these hangs anymore. Or maybe they are these very short stutters.

Sorry if drm-next-3.18 behavior is not relevant here.

As for the num bytes moved: Does the HUD graph only accumulate everything that happened in the hang? If so, then the hundreds of megabytes still seem more than normal and the used graphs definitely show change before and after the hangs. Whatever you make of that...


CONFIG_CMA is not enabled on either kernel.

Indeed, there's less moving of data with the rc kernel I think.
For comparison: https://www.youtube.com/watch?v=mFaqHGle9Hg
Comment 75 Andy Furniss 2014-10-02 18:48:31 UTC
(In reply to Christoph Haag from comment #74)
> Well.
> 
> I have said that I used drm-next-3.18 and had these hangs.
> When I applied
> http://lists.freedesktop.org/archives/mesa-dev/2014-August/066746.html it
> did not help.
> 
> Now I am using 3.17-rc7 with that mesa patch and I do not see these hangs
> anymore. Or maybe they are these very short stutters.
> 
> Sorry if drm-next-3.18 behavior is not relevant here.
> 
> As for the num bytes moved: Does the HUD graph only accumulate everything
> that happened in the hang? If so, then the hundreds of megabytes still seem
> more than normal and the used graphs definitely show change before and after
> the hangs. Whatever you make of that...
> 
> 
> CONFIG_CMA is not enabled on either kernel.
> 
> Indeed, there's less moving of data with the rc kernel I think.
> For comparison: https://www.youtube.com/watch?v=mFaqHGle9Hg

It would be useful to know if Elemental also worked with 3.17-rc7.
Comment 76 Christoph Haag 2014-10-02 18:54:39 UTC
(In reply to Andy Furniss from comment #75)

> It would be useful to know if Elemental also worked with 3.17-rc7.

It's stuttering quite severely, but it feels more like "normal" performance drops and I don't think it completely hangs like in the videos I made before.

I actually tried it for the first time in months because in the past it hung the gpu and the operating system completely with gpu faults I think. Today I ran it for the first time without any severe problems, so radeonsi is definitely making good progress!
Comment 77 Andy Furniss 2014-10-02 21:28:29 UTC
(In reply to Christoph Haag from comment #76)
> (In reply to Andy Furniss from comment #75)
> 
> > It would be useful to know if Elemental also worked with 3.17-rc7.
> 
> It's stuttering quite severely, but it feels more like "normal" performance
> drops and I don't think it completely hangs like in the videos I made before.
> 
> I actually tried it for the first time in months because in the past it hung
> the gpu and the operating system completely with gpu faults I think. Today I
> ran it for the first time without any severe problems, so radeonsi is
> definitely making good progress!

Ok, I'm going to open a new bug for this one when I have time to test more.

I can get the behavior you see, but only on the last kernel with the old firmware I have installed, anything more recent including current agd5f 3.17 fixes gets long pauses for me.

What is your card?
Comment 78 Andy Furniss 2014-10-04 15:00:39 UTC
(In reply to Andy Furniss from comment #77)

> Ok, I'm going to open a new bug for this one when I have time to test more.

Bisected to the same kernel commit as this one, but did a new bug -

https://bugs.freedesktop.org/show_bug.cgi?id=84662
Comment 79 Michel Dänzer 2014-10-10 01:29:16 UTC
(In reply to Andy Furniss from comment #78)
> https://bugs.freedesktop.org/show_bug.cgi?id=84662

I think that should cover Unigine as well.

Is there still an issue with pyrit?
Comment 80 Andy Furniss 2014-10-10 15:23:49 UTC
(In reply to Michel Dänzer from comment #79)
> (In reply to Andy Furniss from comment #78)
> > https://bugs.freedesktop.org/show_bug.cgi?id=84662
> 
> I think that should cover Unigine as well.

Yea.

> Is there still an issue with pyrit?

Pyrit is OK, so closing this one.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.