Bug 101592 - Very slow performance when rendering scenes with transparency, probably caused by excessive copying (intel_miptree_map())
Summary: Very slow performance when rendering scenes with transparency, probably cause...
Status: NEW
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 17.1
Hardware: Other All
: medium normal
Assignee: Intel 3D Bugs Mailing List
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-26 03:25 UTC by Steve Holland
Modified: 2017-09-28 05:24 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Holland 2017-06-26 03:25:51 UTC
I am seeing very poor rendering performance when drawing scenes with transparency. 

This is with Mesa 17.1.3 as distributed with Fedora 26 beta (mesa-dri-drivers-17.1.3-2.fc26). Testing on Intel HD Graphics 620, lspci shows it as "Device 5916"

Rendering has slowed by 10-100x and is significantly (10x) slower than pure software rendering (via LIBGL_ALWAYS_SOFTWARE)

Specifically, I see very high CPU usage during rendering in the intel_miptree_map(), intel_offset_S8(), and intel_miptree_unmap() routines.
Via "perf record" and "perf report", the locus of the CPU usage is clear:

  91.34%  dg_scope_sa  i965_dri.so               [.] intel_miptree_map
   4.53%  dg_scope_sa  i965_dri.so               [.] intel_offset_S8
   1.25%  dg_scope_sa  i965_dri.so               [.] intel_miptree_unmap

From typical stack backtraces (such as pasted below) it
seems to be spending most of its time mapping (with intel_mipmap_tree.c:intel_miptree_map_s8()) or unmapping a rather large (width=815, height=534) miptree (which translates to the size of the entire FreeGLUT window being rendered into)

Notice that all of this is inside a call to glutBitmapCharacter()... i.e. 
it appears that every character being drawn causes creation and destruction of a full-window miptree. 

... Hence the terrible performance. 

Stack trace follows

#0  0x00007f1dfabe4ae1 in intel_miptree_unmap_s8 (slice=<optimized out>, 
    level=0, map=0x12028d0, mt=0x1202dc0, brw=0x7f1e02fd1040)
    at intel_mipmap_tree.c:2728
#1  intel_miptree_unmap (brw=brw@entry=0x7f1e02fd1040, mt=0x1202dc0, level=0, 
    slice=<optimized out>) at intel_mipmap_tree.c:3101
#2  0x00007f1dfabe23b7 in intel_unmap_renderbuffer (ctx=0x7f1e02fd1040, 
    rb=0x12023d0) at intel_fbo.c:217
#3  0x00007f1dfa9758de in unmap_attachment (ctx=ctx@entry=0x7f1e02fd1040, 
    fb=fb@entry=0x1201d30, buffer=buffer@entry=BUFFER_STENCIL)
    at swrast/s_renderbuffer.c:609
#4  0x00007f1dfa9760f0 in _swrast_unmap_renderbuffers (ctx=0x7f1e02fd1040)
    at swrast/s_renderbuffer.c:689
#5  0x00007f1dfa965318 in swrast_render_finish (ctx=0x7f1e02fd1040)
    at swrast/s_context.h:379
#6  _swrast_Bitmap (ctx=ctx@entry=0x7f1e02fd1040, px=426, py=py@entry=263, 
    width=width@entry=9, height=height@entry=16, unpack=0x7f1e02fecf30, 
    bitmap=0x7f1e012e4ac1 "") at swrast/s_bitmap.c:135
#7  0x00007f1dfa9a686d in _mesa_meta_Bitmap (ctx=ctx@entry=0x7f1e02fd1040, 
    x=x@entry=426, y=y@entry=263, width=width@entry=9, height=height@entry=16, 
    unpack=unpack@entry=0x7f1e02fecf30, bitmap1=0x7f1e012e4ac1 "")
    at drivers/common/meta.c:2355
#8  0x00007f1dfabe7fac in intelBitmap (ctx=<optimized out>, x=426, y=263, 
    width=<optimized out>, height=16, unpack=0x7f1e02fecf30, 
---Type <return> to continue, or q <return> to quit--- 
    pixels=0x7f1e012e4ac1 "") at intel_pixel_bitmap.c:358
#9  0x00007f1dfa820b5f in _mesa_Bitmap (width=9, height=16, 
    xorig=<optimized out>, yorig=<optimized out>, xmove=<optimized out>, 
    ymove=<optimized out>, bitmap=0x7f1e012e4ac1 "") at main/drawpix.c:347
#10 0x00007f1e012d149c in glutBitmapCharacter () from /lib64/libglut.so.3
#11 0x00000000004111df in DrawString (String=0x1202620 "GOWJOT")
    at scope_draw.c:79
#12 0x0000000000418260 in DrawWaveformImage (c=0xecbf40, pixelsperdiv=66, 
    usex=2, usey=3, usewidth=792, useheight=528, centerline=267)
    at scope_drawwfm.c:985
#13 0x0000000000418f5e in DrawWaveform (c=0xecbf40, pixelsperdiv=66, usex=2, 
    usey=3, usewidth=792, useheight=528, centerline=267, isSelected=1)
    at scope_drawwfm.c:1161
#14 0x000000000040df0f in ScopeDisplayFunc () at scope_callbacks.c:1361
#15 0x00007f1e012cd3c1 in fghRedrawWindow () from /lib64/libglut.so.3
#16 0x00007f1e012cd84c in fghcbProcessWork () from /lib64/libglut.so.3
#17 0x00007f1e012ceaa9 in fgEnumSubWindows () from /lib64/libglut.so.3
#18 0x00007f1e012ce9b9 in fgEnumWindows () from /lib64/libglut.so.3
#19 0x00007f1e012cd931 in glutMainLoopEvent () from /lib64/libglut.so.3
#20 0x00007f1e012cd9d4 in glutMainLoop () from /lib64/libglut.so.3
#21 0x00000000004085f8 in main (argc=2, argv=0x7ffcf2509128) at scope.c:633
Comment 1 Kenneth Graunke 2017-06-26 04:49:51 UTC
1. What application is this?
2. "has slowed by 10-100x" compared to what?  Some earlier version of Mesa?
Comment 2 Steve Holland 2017-06-26 14:33:18 UTC
Application is dgscope, a newer, not yet published version of http://thermal.cnde.iastate.edu/dataguzzler/download/dgscope-export-2.0.0-beta21.tar.gz

This cropped up when we discovered we had failed to set GLUT_ALPHA in glutInitDisplayMode() (oops!). Other drivers had drawn transparency just fine without GLUT_ALPHA, but this one was drawing everything as fully opaque. 
 
The 10-100x slowdown is compared to previous version of dgscope (without GLUT_ALPHA) running on 3-year old hardware under Fedora 25. Will get you more clarity shortly. 

Curiously, the slowdown doesn't seem to occur if the entire scene has 100% opacity.
Comment 3 Steve Holland 2017-06-26 19:06:37 UTC
Comparison system running exactly the same application, but running Fedora 25
mesa-dri-drivers-13.0.4-3.fc25 and older Intel graphics "Haswell-ULT Integrated Graphics Controller (rev 0b)" is a bit sluggish (say ~3 fps) at a particular rendering operation (transparent text over transparent image). Newer system (Mesa 17.1.3, Kaby Lake HD Graphics 620, Fedora 26 beta) on the same operation is 0.3 fps. 

Same operations with software rendering forced (by LIBGL_ALWAYS_SOFTWARE=1) are 10+fps on both platforms

Just rendering the image is fast in all cases, transparent or not. Overlaying the text character-by-character with glutBitmapCharacter() seems to be what is slowing things down.
Comment 4 Tapani Pälli 2017-09-27 13:14:46 UTC
(In reply to Steve Holland from comment #3)
> Comparison system running exactly the same application, but running Fedora 25
> mesa-dri-drivers-13.0.4-3.fc25 and older Intel graphics "Haswell-ULT
> Integrated Graphics Controller (rev 0b)" is a bit sluggish (say ~3 fps) at a
> particular rendering operation (transparent text over transparent image).
> Newer system (Mesa 17.1.3, Kaby Lake HD Graphics 620, Fedora 26 beta) on the
> same operation is 0.3 fps. 
> 
> Same operations with software rendering forced (by LIBGL_ALWAYS_SOFTWARE=1)
> are 10+fps on both platforms
> 
> Just rendering the image is fast in all cases, transparent or not.
> Overlaying the text character-by-character with glutBitmapCharacter() seems
> to be what is slowing things down.

My recommendation is to implement text rendering using other methods. It is quite unlikely that there is interest to optimize legacy glBitmap path. From the trace it also seems like the implementation of glutBitmapCharacter() is very unoptimal and might not even cache the font and you need to call it for each character.

You could speed this phase up significantly for example by using traditional method of rendering textured quads (that use texture atlas which contains the font glyphs). This pushes the abstraction down a bit but will definitely be worth it.
Comment 5 Steve Holland 2017-09-27 18:01:10 UTC
It seems like for every character rendered the driver is copying the entire window image back and forth, mipmapping and unmapping the entire window. Not at all sure why it is necessary. But I understand the API is rather obsolete. 

Really the the sensible destination for improvement effort would be freeglut, no?   I think GLUT is still a very widely used API.
Comment 6 Tapani Pälli 2017-09-28 05:24:35 UTC
(In reply to Steve Holland from comment #5)
> It seems like for every character rendered the driver is copying the entire
> window image back and forth, mipmapping and unmapping the entire window. Not
> at all sure why it is necessary. But I understand the API is rather
> obsolete. 

FWIW I don't think the whole content is copied but as glBitmap() gets called per character basis (which the API forces) it causes a lot of map/unmap of the buffer per each frame where we need to copy those bitmaps in.
 
> Really the the sensible destination for improvement effort would be
> freeglut, no?   I think GLUT is still a very widely used API.

GLUT could be still used but IMO text rendering should be implemented by the app itself and not rely on glutBitmapCharacter(). There are many tutorials/examples (and actually even ready helper libraries) in the web how to get started.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.