Summary: | i915_drv OpenGL very CPU intensive under Qt5/QtQuick 2 | ||
---|---|---|---|
Product: | Mesa | Reporter: | jpsinthemix |
Component: | Drivers/DRI/i915 | Assignee: | Ian Romanick <idr> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | blocker | ||
Priority: | medium | CC: | chgena, jpsinthemix, v_2e |
Version: | 10.4 | ||
Hardware: | x86 (IA32) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
log given by INTEL_DEBUG=all
Correct (portion) of an INTEL_DEBUG=all log |
Description
jpsinthemix
2014-12-19 06:54:33 UTC
It sounds like the application requires more resources than the aging i915 has to offer. Seeing _swrast_exec_fragment_program in the profile tells me that the driver is falling back to software rasterization. This usually occurs when the fragment program is too big. The i965 can only handle shaders with upto 64 instructions. Unless the shader is "just barely" too big, it is unlikely that we'll be able to do anything about it. What output does running with 'INTEL_DEBUG=perf' give? (In reply to Ian Romanick from comment #1) > It sounds like the application requires more resources than the aging i915 > has to offer. Seeing _swrast_exec_fragment_program in the profile tells me > that the driver is falling back to software rasterization. This usually > occurs when the fragment program is too big. The i965 can only handle > shaders with upto 64 instructions. > > Unless the shader is "just barely" too big, it is unlikely that we'll be > able to do anything about it. What output does running with > 'INTEL_DEBUG=perf' give? For some reason I am unable to get any output via INTEL_DEBUG=perf. This is for the i915_drv, not the i965_drv driver. I think the Qt program I'm using for testing is re-directing both stdin/out to /dev/null. The Qt program I'm running is the display manager 'sddm' which basically runs a QtQuick2 Main.qml script. I've started intel_gpu_top and then run sddm, and its shows 2-3% GPU usage, not surprising I suppose, because as you noted the fragment shader is being run on the CPU via swrast fallback. I have, however, found the problematic shaders; they are hiqsubpixeldistancefieldtext.frag,vert} shown below. This shader pair handles the text rendering for sddm using 'distance-field' anti-aliasing. Note that the 'highp/lowp/mediump' qualifiers (for GLES) are not relevant (and are unset) here. There are also simpler (lower-quality versions), loqsubpixeldistancefieldtext.{frag,vert}, which are not employed by QtQuick2 here, but if I re-build the QtQuick library and force their use instead of the hiqsubpixeldistancefieldtext.* versions, then the program runs as expected: intel_gpu_top usage remains at about 2-3% and CPU usage fluctuates in the 0-1% range. Listings of these lo* shaders are also shown below. My guess here is that the 5 texture2D/texture2DProj() calls in hiqsubpixeldistancefieldtext.frag are triggering the swrast fallback; there are only 2 texture2DProj() calls in loqsubpixeldistancefieldtext.frag. If so, is this related to the i915 limits (i915_context.h) for maximum tex instructions and/or maximum tex indirections (I915_MAX_TEX_INSN=32 and I915_MAX_TEX_INDIRECT=4)? If this is so, is there any wiggle room here in the driver whereby, say up 6 texture2D/texture2DProj() calls could be handled w/o a swrast fallback? Thanks again for your time, John // ==== hiqsubpixeldistancefieldtext.vert // =============================================== uniform highp mat4 matrix; uniform highp vec2 textureScale; uniform highp float fontScale; uniform highp vec4 vecDelta; attribute highp vec4 vCoord; attribute highp vec2 tCoord; varying highp vec2 sampleCoord; varying highp vec3 sampleFarLeft; varying highp vec3 sampleNearLeft; varying highp vec3 sampleNearRight; varying highp vec3 sampleFarRight; void main() { sampleCoord = tCoord * textureScale; gl_Position = matrix * vCoord; // Calculate neighbor pixel position in item space. highp vec3 wDelta = gl_Position.w * vecDelta.xyw; highp vec3 farLeft = vCoord.xyw - 0.667 * wDelta; highp vec3 nearLeft = vCoord.xyw - 0.333 * wDelta; highp vec3 nearRight = vCoord.xyw + 0.333 * wDelta; highp vec3 farRight = vCoord.xyw + 0.667 * wDelta; // Calculate neighbor texture coordinate. highp vec2 scale = textureScale / fontScale; highp vec2 base = sampleCoord - scale * vCoord.xy; sampleFarLeft = vec3(base * farLeft.z + scale * farLeft.xy, farLeft.z); sampleNearLeft = vec3(base * nearLeft.z + scale * nearLeft.xy, nearLeft.z); sampleNearRight = vec3(base * nearRight.z + scale * nearRight.xy, nearRight.z); sampleFarRight = vec3(base * farRight.z + scale * farRight.xy, farRight.z); } // ==== hiqsubpixeldistancefieldtext.frag // =============================================== varying highp vec2 sampleCoord; varying highp vec3 sampleFarLeft; varying highp vec3 sampleNearLeft; varying highp vec3 sampleNearRight; varying highp vec3 sampleFarRight; uniform sampler2D _qt_texture; uniform lowp vec4 color; uniform mediump float alphaMin; uniform mediump float alphaMax; void main() { highp vec4 n; n.x = texture2DProj(_qt_texture, sampleFarLeft).a; n.y = texture2DProj(_qt_texture, sampleNearLeft).a; highp float c = texture2D(_qt_texture, sampleCoord).a; n.z = texture2DProj(_qt_texture, sampleNearRight).a; n.w = texture2DProj(_qt_texture, sampleFarRight).a; #if 0 // Blurrier, faster. n = smoothstep(alphaMin, alphaMax, n); c = smoothstep(alphaMin, alphaMax, c); #else // Sharper, slower. highp vec2 d = min(abs(n.yw - n.xz) * 2., 0.67); highp vec2 lo = mix(vec2(alphaMin), vec2(0.5), d); highp vec2 hi = mix(vec2(alphaMax), vec2(0.5), d); n = smoothstep(lo.xxyy, hi.xxyy, n); c = smoothstep(lo.x + lo.y, hi.x + hi.y, 2. * c); #endif gl_FragColor = vec4(0.333 * (n.xyz + n.yzw + c), c) * color.w; } // ==== loqsubpixeldistancefieldtext.vert // =============================================== uniform highp mat4 matrix; uniform highp vec2 textureScale; uniform highp float fontScale; uniform highp vec4 vecDelta; attribute highp vec4 vCoord; attribute highp vec2 tCoord; varying highp vec3 sampleNearLeft; varying highp vec3 sampleNearRight; void main() { highp vec2 sampleCoord = tCoord * textureScale; gl_Position = matrix * vCoord; // Calculate neighbor pixel position in item space. highp vec3 wDelta = gl_Position.w * vecDelta.xyw; highp vec3 nearLeft = vCoord.xyw - 0.25 * wDelta; highp vec3 nearRight = vCoord.xyw + 0.25 * wDelta; // Calculate neighbor texture coordinate. highp vec2 scale = textureScale / fontScale; highp vec2 base = sampleCoord - scale * vCoord.xy; sampleNearLeft = vec3(base * nearLeft.z + scale * nearLeft.xy, nearLeft.z); sampleNearRight = vec3(base * nearRight.z + scale * nearRight.xy, nearRight.z); } // ==== loqsubpixeldistancefieldtext.frag // =============================================== varying highp vec3 sampleNearLeft; varying highp vec3 sampleNearRight; uniform sampler2D _qt_texture; uniform lowp vec4 color; uniform mediump float alphaMin; uniform mediump float alphaMax; void main() { highp vec2 n; n.x = texture2DProj(_qt_texture, sampleNearLeft).a; n.y = texture2DProj(_qt_texture, sampleNearRight).a; n = smoothstep(alphaMin, alphaMax, n); highp float c = 0.5 * (n.x + n.y); gl_FragColor = vec4(n.x, c, n.y, c) * color.w; } Created attachment 112165 [details]
log given by INTEL_DEBUG=all
Attached is a log generated using INTEL_DEBUG=all
I finally got the INTEL_DEBUG=perf output: QML debugging is enabled. Only use this in a safe environment. i915_program_error: Exceeded max nr indirect texture lookups (6 out of 4) ENTER FALLBACK 10000: Program LEAVE FALLBACK Program ENTER FALLBACK 10000: Program <--- These simply repeat LEAVE FALLBACK Program <--------------| This is what I expected from looking at the fragment shader, too many texture lookups.. Created attachment 112334 [details] Correct (portion) of an INTEL_DEBUG=all log The previous INTEL_DEBUG=all log (attachment # ) was for a modified version of the fragment shader hiqsubpixeldistancefieldtext.frag; my apologies for posting the wrong log. This attachment is a portion of an INTEL_DEBUG=all log showing the ARB assembly of the (unmodified) fragment shader hiqsubpixeldistancefieldtext.frag. Based on Issue (24) of https://www.opengl.org/registry/specs/ARB/fragment_program.txt I'm confused by the 6 indirections: there is the base indirection, 4 texture2DProj() calls, and 1 texture2D() call. So if there is an indirection per texture2D*() call, we would indeed get 1+5=6 indirections. But why can't the texture2D*() texture coordinates for all texture lookups be be done in one phase/node and the texture2D*() calls done in a second phase/node? Or is that the texture coordinate TEMPs have to be set up on a per-texture2D*() basis? This is due to the BUG 89062. The temporary solution is to define environment variable QML_DISABLE_DISTANCEFIELD for Qt5 apps. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/745. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.