While Francisco's constant cache patch series helped perf on the other platforms, *on KBL* (i7-7500U / GT2) it caused a huge perf regression in (GfxBench 4.0) Manhattan 3.1.
"i965/fs: Switch to the constant cache for uniform pull constants" commit drops the performance by 60% and "i965/fs: Fetch one cacheline of pull constants at a time" commit improves that by 50%, so the combined drop is ~40%.
GfxBench CarChase (gl_4 test) test perf also dropped a bit, by 10-15%.
Note: SynMark ShMapPcf test still improved on KBL, by ~30% (same as on SKL GT2 & GT4e).
Curro, are we botching L3 / MOCS settings on KBL perhaps?
When comparing our SKL & KBL GT2 machines perf, KBL perf isn't too much off from what one would expect when taking into account their LLC size and GPU & memory speed differences. So something being seriously wrong with basic settings doesn't seem likely, I think it should then be visible also in other tests, not just ones affected by this patch series.
Don't know whether it's related at all, but I didn't see any clear impact in our trends from this patch series on BXT, although it clearly helped SKL/HSW/BDW/BSW/BYT (and ShMapPcf on KBL).
Curro mentioned at the office yesterday that he's figured this out.
Tracked this down to a kernel bug and sent a fix  to the intel-gfx mailing list. Reassigning to DRM/Intel component.
Author: Francisco Jerez <firstname.lastname@example.org>
Date: Thu Jan 12 12:44:54 2017 +0200
drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.