Bug 36295

Summary: Incorrect rendering in SPECviewperf benchmark
Product: Mesa Reporter: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer>
Component: Mesa coreAssignee: mesa-dev
Status: RESOLVED WONTFIX QA Contact:
Severity: normal    
Priority: medium CC: brianp, eero.t.tamminen, idr
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: rendering comparison intel vs r600g
output log
Patch to handle NV_fragment_program2's IF/ELSE/ENDIF
new logs
renering result using debian mesa version

Description Pierre-Eric Pelloux-Prayer 2011-04-16 02:30:21 UTC
Created attachment 45701 [details]
rendering comparison intel vs r600g

Using mesa from GIT + r600g, 7 first tests of SPECviewperf 11 show big rendering errors (see attached image).
The same tests run well using an Intel based laptop (with Ubuntu 9.10).
Comment 1 Pierre-Eric Pelloux-Prayer 2011-04-16 02:35:29 UTC
Forgot to mention that tests 3/7 and 4/7 print some Mesa errors when running with ST_DEUG=fallback,mesa :

test3 :
    Mesa: User error: GL_INVALID_OPERATION in glBegin(fragment program not valid)
    Mesa: User error: GL_INVALID_OPERATION in glEnd
    Mesa: User error: GL_INVALID_OPERATION in glProgramStringARB(invalid ARB fragment program option)

test4 :
    Mesa: User error: GL_INVALID_OPERATION in glProgramStringARB(invalid ARB vertex program option)
    Mesa: User error: GL_INVALID_OPERATION in glProgramString(bad program)
    Mesa: User error: GL_INVALID_OPERATION in glProgramStringARB(invalid ARB fragment program option)
    Mesa: User error: GL_INVALID_OPERATION in glDrawElements(vertex program not valid)
Comment 2 Jose Fonseca 2011-04-16 02:50:43 UTC
This is a bug in SPECviewperf11, which relies on NV_fragment_program2 extension, even when it is not advertised.

Me and Brian have a patch that workarounds this by tolerating NV_fragment_program2 's IF/ELSE/ENDIF opcodes.

This is exactly the same stratergy that AMD's fglrx drivers took: not advertise NVIDIA's extension, but silently handle it.
Comment 3 Ian Romanick 2011-04-16 18:49:14 UTC
(In reply to comment #2)
> This is a bug in SPECviewperf11, which relies on NV_fragment_program2
> extension, even when it is not advertised.
> 
> Me and Brian have a patch that workarounds this by tolerating
> NV_fragment_program2 's IF/ELSE/ENDIF opcodes.
> 
> This is exactly the same stratergy that AMD's fglrx drivers took: not advertise
> NVIDIA's extension, but silently handle it.

Oh hell no.  Report this bug to spec.  Seriously.

Issues of differing names of instance ID is one thing, but giant piles of functionality is completely different.  This is a MAJOR bug in viewperf.  It needs to be reported to them, and they need to fix it.  Period.
Comment 4 Ian Romanick 2011-04-16 18:52:21 UTC
Wait... None of the Intel drivers expose that NVIDIA extension either.  Why is rendering correct there but not on r600g?  It seems unlikely that it's because of GL_NV_fragment_program2.
Comment 5 Brian Paul 2011-04-17 11:32:29 UTC
> Oh hell no.  Report this bug to spec.  Seriously.
> 
> Issues of differing names of instance ID is one thing, but giant piles of
> functionality is completely different.  This is a MAJOR bug in viewperf.  It
> needs to be reported to them, and they need to fix it.  Period.

I reported the bugs to SPEC a few weeks ago.  Their website's "contact us" person forwarded my email which detailed the problems (and an offer to write a patch to correct things) to a technical person (supposedly) but I haven't received a reply yet.  I could ping them again.

If you'd take a look at Viewperf 11's checkExtensions() function you'd probably crap your pants.  It looks like someone made a half-hearted attempt at querying some OpenGL features then just gave up.  It's pretty bad.

A few viewperf tests use Cg-generated shaders that apparently were simply captured on an NVIDIA host and baked into the test.  Of course, you can't expect an arbitrary Cg-generated shader to work on a different host.  For example, the catia test uses GL_NV_fragment_program2 in one place (an IF/ELSE/ENDIF) and GL_NV_vertex_program3 in a few others (but the shaders only have some "label:" lines with no actual branches to them).  Then, there's some use of GL_NV_primitive_restart which is predicated on whether glXGetProcAddress() returns non-null.  Ugh.

As Jose mentioned, we have a fairly small patch that lets the ARB vp/fp compiler accept these shaders.

In general, I don't like adding work-arounds like this into Mesa any more than anyone else.  But it's always a trade-off.  The upside of doing a "hack" is that people can run their apps/games/benchmarks/etc and get on with things.  The alternative is to get error reports like the above and general hear-say of "things don't run with Mesa".  I'd bet all the commercial OpenGL vendors have some number of hacks added just so that important applications run to satisfy their users.
Comment 6 Brian Paul 2011-04-17 11:36:11 UTC
(In reply to comment #4)
> Wait... None of the Intel drivers expose that NVIDIA extension either.  Why is
> rendering correct there but not on r600g?  It seems unlikely that it's because
> of GL_NV_fragment_program2.

I'm not sure which sub-test of Viewperf pelloux is referring to.  I don't know which viewperf test is which by looking at the pictures.  The particular issue reported could be a run-of-the-mill r600 bug if it's not related to the Cg shader issue.  Maybe it's primitive restart?
Comment 7 Pierre-Eric Pelloux-Prayer 2011-04-18 00:24:43 UTC
Created attachment 45755 [details]
output log

The screenshot were taken (by SPECviewperf) during the "catia-03" profile.
This profile contains 8 different tests. 
I included only the first 7 screenshots, as the last test only shows a black screen on both Intel and r600 (and a lot of GL_INVALID_OPERATION).
(the image above is simply test1/test2/..../test7 from top to bottom).

The attached file contains the full log resulting from : "ST_DEBUG=fallback,mesa  ./viewperf-gui", using r600g.
Comment 8 Jose Fonseca 2011-04-18 06:25:00 UTC
Created attachment 45763 [details] [review]
Patch to handle NV_fragment_program2's IF/ELSE/ENDIF

Please retry with the attached patch.
Comment 9 Pierre-Eric Pelloux-Prayer 2011-04-18 06:45:30 UTC
Created attachment 45767 [details]
new logs

same results as before. I've attached the logs - there is a little change here : no more "glProgramString" in test 3.
Comment 10 Pierre-Eric Pelloux-Prayer 2011-04-18 06:51:20 UTC
Created attachment 45768 [details]
renering result using debian mesa  version

I've tested the same app with mesa installed on my system Debian unstable/experimental, which is : "VersionĀ : 7.10-4"

The rendering is wrong too, but in a different way (see attached image).
Comment 11 Pierre-Eric Pelloux-Prayer 2011-04-18 07:36:16 UTC
After bisecting, it seems that the new behavior was introduced by commit 8c631cfeae29b5236928f759e222aa35e6e4984c "r600g: rework vertex buffer uploads"
Comment 12 Jose Fonseca 2011-04-18 07:50:52 UTC
> (In reply to comment #2)
> Oh hell no.  Report this bug to spec.  Seriously.
>
> Issues of differing names of instance ID is one thing, but giant piles of
> functionality is completely different.  This is a MAJOR bug in viewperf.  It
> needs to be reported to them, and they need to fix it.  Period.

Even if viewperf fixes the extension detection, I don't think it will do much for us or our users: Viewperf consists mostly of recorded GL calls. There is likely no easy way to record the original Cg shaders and compile on demand for the current hardware. So I think the inevitable outcome of fixing the extension checks in viewperf is a error message saying "your graphics hardware isn't capable enough" for all viewsets. This is, *Ii* they fix it, because I suspect they probably won't bother given that AMD also silently handles it.

So, unless we provide the necessary functionality somehow, the last viewperf Mesa based drivers will ever run will be SPECviewperf10.

Of course, instead of the simple hack to allow some NV_fragment_program2 opcodes, we could actually implement the whole NV_fragment_program2 and NV_vertex_program2 extensions to the spec too. But I'm not very keen on spending a bunch of time working on a non-ARB extension.

Personally I'd like to be able to test/benchmark Mesa with SPECviewperf11 somehow.

That said, as Brian said, it seems most of the corruption is not caused by NV_fragment_program2 shader, but actually by NV_primtive_restart.
Comment 13 Ian Romanick 2011-04-18 12:11:10 UTC
(In reply to comment #5)
> In general, I don't like adding work-arounds like this into Mesa any more than
> anyone else.  But it's always a trade-off.  The upside of doing a "hack" is
> that people can run their apps/games/benchmarks/etc and get on with things. 

The downside is that app developers never fix *THEIR* bugs.

> The alternative is to get error reports like the above and general hear-say of
> "things don't run with Mesa".  I'd bet all the commercial OpenGL vendors have
> some number of hacks added just so that important applications run to satisfy
> their users.

And this is the disaster that we're trying to fix via conformance testing.  This crap has to STOP.  This is why people think OpenGL is joke.
Comment 14 Jose Fonseca 2011-04-19 00:19:54 UTC
(In reply to comment #13)
> The downside is that app developers never fix *THEIR* bugs.
> 
[...]
>
> And this is the disaster that we're trying to fix via conformance testing. 
> This crap has to STOP.  This is why people think OpenGL is joke.

(This is slightly OT, but I'd argue that the OpenGL problem is not that spec conformance per se, but the proliferation of vendor specific extensions, and extensions in general; and the fix is not conformance testing but the ARB ratifying the extensions people care and lumping extensions in core versions. It looks things are moving on the right direction. That said, latest core version is 4.0 and we're still in 2.1, so I'm thankful for extensions that get us half way there.)

Anyway, Ian, I see you strongly feel against the proposed patch, but I still am not sure exactly what you oppose: diverging the spec, or adding the extensions to meet an application's requirement? That is, would fully implementing NV_fragment_program2 to the spec and advertising it for SPECviewperf11's sake be OK with you or not, and why?
Comment 15 Ian Romanick 2011-04-19 17:21:46 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > The downside is that app developers never fix *THEIR* bugs.
> > 
> [...]
> >
> > And this is the disaster that we're trying to fix via conformance testing. 
> > This crap has to STOP.  This is why people think OpenGL is joke.
> 
> (This is slightly OT, but I'd argue that the OpenGL problem is not that spec
> conformance per se, but the proliferation of vendor specific extensions, and
> extensions in general; and the fix is not conformance testing but the ARB
> ratifying the extensions people care and lumping extensions in core versions.
> It looks things are moving on the right direction. That said, latest core
> version is 4.0 and we're still in 2.1, so I'm thankful for extensions that get
> us half way there.)
> 
> Anyway, Ian, I see you strongly feel against the proposed patch, but I still am
> not sure exactly what you oppose: diverging the spec, or adding the extensions
> to meet an application's requirement? That is, would fully implementing
> NV_fragment_program2 to the spec and advertising it for SPECviewperf11's sake
> be OK with you or not, and why?

I don't care too much what extensions we decide to support.  We already support most of NV_fragment_program2.  The primary missing bits are support for clip distance, and we need to add that for GLSL 1.30 anyway.  I don't recall if the rest of the support is in master or on an old, dusty branch.  It might be one one of my asm-shader-rework branches...

I am strongly opposed to allowing an application to use functionality that is not advertised or, in the case of GLSL extensions, not enabled because it prevents us from removing those features in the future.  Over the last 18 months we've removed a lot of old extensions that we did not want the on going burden of supporting.  I fully expect that we will do more of this in the future.  When applications either gracefully don't work or fallback to different rendering paths, this is fine.  When apps start to explode, it's not fine.  When we let applications use undocumented features or use documented features in illegal ways, we paint ourselves into a corner.  We effectively trade short-term gain for long-term pain.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.