Bug 42117 - R200 driver performance, UMS, all mesa versions from 7.6
Summary: R200 driver performance, UMS, all mesa versions from 7.6
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/r200 (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-22 16:51 UTC by Michal
Modified: 2017-11-03 17:28 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michal 2011-10-22 16:51:34 UTC
With disabled ModeSetting, performance in some 3d applications drop almost to zero.

glxgears runs flawlessly,
few maps in openarena is unplayable,
extremetuxracer is unplayable.

oprofile with tuxracer:
CPU: Athlon, speed 1399.45 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        symbol name
146220   16.6991  sample_lambda_2d
123695   14.1267  fetch_texel_2d_f_argb8888
102183   11.6699  sample_linear_2d
86150     9.8388  linear_texel_locations.clone.23
79573     9.0877  interpolate_texcoords
69939     7.9874  _swrast_fog_rgba_span
55210     6.3053  _swrast_texture_span
54151     6.1843  _swrast_compute_lambda
50989     5.8232  radeonReadDepthSpan_z24_s8
21567     2.4631  radeonReadRGBASpan_ARGB8888
11802     1.3479  radeonWriteRGBASpan_ARGB8888

I've tested almost all mesa versions from 7.6 to 7.11, unsuccessfully. With mesa 7.5 all works great. Mesa 7.6.1 compiled with folders radeon and r200 copied from 7.5.2, also works great.
Comment 1 Michal 2011-10-22 17:02:42 UTC
With KMS enabled, problem does not exists.
Comment 2 Alex Deucher 2011-10-24 06:22:51 UTC
Is there some reason why you want to use UMS?  It's not really supported any more.
Comment 3 Michal 2011-10-24 08:56:25 UTC
Well, yes, ums is about 2x faster then kms.

I think the problem is somewhere in textures. Lowering texture quality in openarena, fps jumps from 5 to 60. The same with etracer, with low res textures copied from version 3.5, fps jumps from 3 to 30.
Comment 4 Roland Scheidegger 2011-10-24 11:30:14 UTC
Some performance difference is expected due to kms not supporting tiling on r200, though I would expect a 2x difference only if you also enabled hyperz manually.
That said, if performance jumps a lot higher with lower texture settings, this looks like a problem with bo placement/migration, the old code was simple and terrible in some cases (could easily get texture thrashing) whereas the new code is different (but still not smart enough). In any case, if you don't have libtxc installed try that as openarena can use it and hence textures will use much less vram. If you have some compositing manager running try disabling it as it will also use more memory. Though if you have some terribly memory-constrained chip nothing might help much (should really have at least 64MB vram).

As for the ums problem, it looks like the driver is hitting a software fallback. RADEON_DEBUG=fall might tell you why - not that there's any chance it will get fixed...
Comment 5 Marek Olšák 2011-10-24 11:46:25 UTC
I am leaning to believe that tiling can make such a difference.
Comment 6 Michal 2011-10-24 15:42:28 UTC
In tests I'm using fluxbox without any compositing. My card is radeon 9100 (rebranded 8500) with 128mb vram. Libtxc library didn't change anything.

RADEON_DEBUG=fall puts in loop:
R200 begin tcl fallback Rasterization fallback
R200 begin rasterization fallback: 0x1 Texture mode
R200 end tcl fallback Rasterization fallback
R200 end tcl fallback
R200 end rasterization fallback: 0x1 Texture mode
R200 begin tcl fallback Rasterization fallback
R200 begin rasterization fallback: 0x1 Texture mode
R200 end tcl fallback Rasterization fallback
R200 end tcl fallback
R200 end rasterization fallback: 0x1 Texture mode

I forget to add. In openarena, i get about 2-4 fps in normal play but when I look down to ground, fps immediately jumps to about 80.
Comment 7 Roland Scheidegger 2011-10-24 17:03:22 UTC
Yes that's a fallback. Not sure why it would trigger texture mode fallback.
You could try attaching a debugger and see where r200Fallback gets that true mode bit and work from there, could be from several functions.
But these ums pieces are going to get away very soon so chances someone going to fix it are slim.
Comment 8 Michal 2011-10-25 06:50:55 UTC
Breakpoint 1, r200Fallback (ctx=0x8381c10, bit=1, mode=1 '\001')
    at r200_swtcl.c:678
678        r200ContextPtr rmesa = R200_CONTEXT(ctx);
(gdb) where
#0  r200Fallback (ctx=0x8381c10, bit=1, mode=1 '\001') at r200_swtcl.c:678
#1  0xb63b2dbd in r200WrapRunPipeline (ctx=0x8381c10) at r200_state.c:2449
#2  0xb648f1ee in _tnl_draw_prims (ctx=0x8381c10, arrays=0x83c5804, 
    prim=0x83c42d8, nr_prims=1, ib=0x0, min_index=0, max_index=3)
    at tnl/t_draw.c:478
#3  0xb648f4f9 in _tnl_vbo_draw_prims (ctx=0x8381c10, arrays=0x83c5804, 
    prim=0x83c42d8, nr_prims=1, ib=0x0, index_bounds_valid=1 '\001', 
    min_index=0, max_index=3) at tnl/t_draw.c:384
#4  0xb648709f in vbo_exec_vtx_flush (exec=0x83c41a8, unmap=1 '\001')
    at vbo/vbo_exec_draw.c:384
#5  0xb6484d50 in vbo_exec_FlushVertices_internal (ctx=0x8381c10, 
    unmap=1 '\001') at vbo/vbo_exec_api.c:872
#6  0xb6484f3b in vbo_exec_FlushVertices (ctx=0x8381c10, flags=1)
    at vbo/vbo_exec_api.c:906
#7  0xb646995c in _mesa_BindTexture (target=3553, texName=92)
    at main/texobj.c:1058
#8  0x080612bb in draw_sky (pos=...) at course_render.cpp:377
#9  0x0808ca8b in Racing::loop (this=0x929dde0, timeStep=9.99999982e-14)
    at racing.cpp:375
#10 0x0807592b in main_loop () at loop.cpp:178
#11 0x08095f75 in winsys_process_events () at winsys.cpp:304
#12 0x08076444 in main (argc=1, argv=0xbf845c24) at main.cpp:307
Comment 9 Alex Deucher 2011-10-25 06:57:01 UTC
It should be much easier to add tiled support to radeon and r200 with KMS after we drop DRI1 support since we can just blit to a linear buffer if the CPU needs to access a tiled buffer.
Comment 10 Roland Scheidegger 2011-10-25 08:58:41 UTC
So the fallback must come from r200ValidateState() - which in turn means r200ValidateBuffers() and that means radeon_cs_space_check_with_bo() is failing.
So seems like validation fails because it's either over the vram or gart limit. Not sure the calculation there is really fully correct for non-kms, but you could check the limit values used for these (radeonScreen->gartTextures.size and radeonScreen->texSize[0], should be the same as csm->gart_limit and csm->vram_limit in radeon_cs_do_space_check()) - both values actually come from the ddx. Maybe your gart setting is too low.

As Alex said though we should fix tiling for the kms case.
Comment 11 Michal 2011-10-25 12:54:08 UTC
Breakpoint 1, 0xb7e07f93 in radeon_cs_set_limit () from /usr/lib/libdrm_radeon.so.1
(gdb) s
Single stepping until exit from function radeon_cs_set_limit,
which has no line number information.
rcommonInitCmdBuf (rmesa=0x837c8d8) at radeon_common.c:1297
1297                    radeon_cs_set_limit(rmesa->cmdbuf.cs, RADEON_GEM_DOMAIN_GTT, rmesa->radeonScreen->gartTextures.size);
(gdb) print rmesa->radeonScreen->texSize[0]
$7 = 109051904
(gdb) print rmesa->radeonScreen->gartTextures.size
$8 = 5111808

values equal to this from Xlog

bash-4.1$ cat /var/log/Xorg.0.log |grep textures
[    42.691] (II) RADEON(0): Using 5 MB for GART textures
[    42.691] (II) RADEON(0): Will use 106496 kb for textures at offset 0x1800000
bash-4.1$ cat /var/log/Xorg.0.log |grep GART    
[    42.691] (II) RADEON(0): Using 8 MB GART aperture
[    42.691] (II) RADEON(0): Using 5 MB for GART textures
[    42.716] (II) RADEON(0): [agp] GART texture map handle = 0xe0302000
[    42.717] (II) RADEON(0): [agp] GART Texture map mapped at 0xae7fd000
[    42.964] (II) RADEON(0): [drm] Initialized kernel GART heap manager, 5111808




radeon_cs_do_space_check() - gdb with breakpoint to it never stops/breaks
Comment 12 Michal 2011-10-27 13:39:59 UTC
csm->gart_limit and csm->vram_limit are correct.

With GARTSize "64", openarena works great. ETRacer does not (but no fallbacks)
In ETRacer, when I disabled show fps in options, after few seconds on the map, framerate back to normal.
Comment 13 Roland Scheidegger 2011-10-28 06:49:59 UTC
(In reply to comment #12)
> csm->gart_limit and csm->vram_limit are correct.
> 
> With GARTSize "64", openarena works great. ETRacer does not (but no fallbacks)
> In ETRacer, when I disabled show fps in options, after few seconds on the map,
> framerate back to normal.

I'd have thought 128MB VRAM should be enough for openarena so it wouldn't need gart for textures, especially with texture compression. So I don't know why that's really needed looks like something is wrong. You could also try FBTexPercent option though given that I think there should be plenty of vram I don't think it's going to make a difference neither.
Comment 14 Michal 2011-10-28 09:44:56 UTC
Minimum GARTSize which don't return fallbacks is 16MB.
Now, the problem is somewhere at the kernel side.

  samples|      %|
------------------
  1295082 93.6410 vmlinux
    30154  2.1803 r200_dri.so
    19269  1.3932 etracer
    15854  1.1463 radeon

samples  %        symbol name
1261238  97.3867  __copy_from_user_ll
1495      0.1154  delay_tsc
Comment 15 Michal 2011-10-28 10:13:15 UTC
FBTexPercent to 97
bash-4.1$ cat /var/log/Xorg.0.log|grep text
[  8077.238] (II) RADEON(0): Will use 114688 kb for textures at offset 0x00c08000

etracer runs smoothly at 40 fps(with mesa 7.5.2 60fps).
Comment 16 smoki 2012-01-15 11:34:36 UTC
 I tried etracer also with 7.5.2 UMS and have problems with 1024x1024 textures and run in a few fps.

 But current KMS with 8.0-devel mesa haven't that problem and etracar (with defaut textures) skies smooth there.

 I have enabled ColorTiling, vblank_mode=3 in dri2 drirc, EXAVSync on, SwapbuffersWait on and with in use vsynced aiglx/composite.
Comment 17 mirh 2017-11-03 17:28:22 UTC
xf86-video-ati 6.14.4 supposedly added KMS tiling
https://wiki.freedesktop.org/xorg/radeon/

Could this get performance back to a satisfactorily enough level that we can consider this nonetheless fixed?

Or.. Why here instead the thing is still crossed out?
https://www.x.org/wiki/RadeonFeature/


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.