When running dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.5 on Brasswell Chrome OS devices, using the Android CTS runner, this test fails due to a timeout. Ilja says the test needs to complete in less than 30 seconds.
This is the last remaining dEQP failure, to my knowledger, that blocks GLES 3.1 on Brasswell Chromebooks. It passed on Skylake and Apollo Lake Chromebooks.
Armchair diagnosis: The failures seems unrelated to Brasswell itself. I think the problem is that CPU speed on Brasswell Chromebooks is slower than other Chromebooks.
FWIW, this tests completes in 3m3s on the BSW systems that Intel runs in CI. It takes 52 seconds on KBL/SKL, so I'm not sure how BSW could possibly meet a 30s deadline.
Are you sure that there is a 30s requirement?
After reading the bug that Chad linked to this, I re-ran with binaries compiled with release flags. My i7 KBL system completes the test in 34s, not the 52s reported by CI.
9355116bdad6ee9914554de8e48ba271bd36a8eb is the first bad commit
Author: Francisco Jerez <email@example.com>
Date: Mon Oct 23 13:47:10 2017 -0700
intel/fs: Don't let undefined values prevent copy propagation.
Test time increases from 10s to 35s on KBL i7.
Oh, interesting. Basically the entire time is spent in fs_copy_prop_dataflow::run(). There are 2600 basic blocks and 5000 values. About 40% of them are constants, which could probably be handled separately to avoid the dataflow costs (who cares what block it came from, it's an immediate, you can just reconstruct it). Not sure about undefined values.
IMHO the ultimate reason for the regression is that the dataflow propagation logic of the copy propagation pass has a serious run-time complexity problem in acyclic CFGs. The patch this was bisected to is merely increasing the amount of work done in each iteration which substantially exacerbates the problem. A proper fix would involve reworking the copy propagation dataflow logic to reduce the polynomial degree of its run-time complexity in the acyclic case. Working on it -- It will likely give you better performance than before my undef handling patch.
Updating importance to high-priority enhancement since this is purely a (largely pre-existing) compile-time performance issue, spec conformance is unaffected by it.
On BSW, the run-time of this test goes from 39s to 175s, on the bisected commit. We need to improve performance by >25% beyond just fixing Curro's regression.
As per Chad's comment, this performance is bad enough that it will prevent us from passing the android CTS.
It may not be a *Khronos* conformance issue, but it is a conformance issue nonetheless, for all of our platforms.
(In reply to Mark Janes from comment #6)
> On BSW, the run-time of this test goes from 39s to 175s, on the bisected
> commit. We need to improve performance by >25% beyond just fixing Curro's
There's a good chance that the fix I'm working on will bring us below the 30s bar, but it's hard to predict with certainty since there may be other passes behaving inefficiently in this test-case.
> As per Chad's comment, this performance is bad enough that it will prevent
> us from passing the android CTS.
> It may not be a *Khronos* conformance issue, but it is a conformance issue
> nonetheless, for all of our platforms.
Except that to the extent that the CTS doesn't match the specification we are more likely to be talking about a conformance problem of the CTS rather than of our implementation...
Patch sent to the mailing list . It reduces the time spent in dataflow propagation by 30x on my CHV system. Test run-time is now around 13s well below the time-out even on CHV.
Shouldn't timeout anymore with current master. Closing.
Thanks Curro. FYI, I've cherry-picked your fix, plus what I believed to be prerequisite commits too, to the ARC++ repo.