Bug 88587 - [HSW][GT2] Incorrect code generation with long in loop
Summary: [HSW][GT2] Incorrect code generation with long in loop
Alias: None
Product: Beignet
Classification: Unclassified
Component: Beignet (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Zhigang Gong
QA Contact:
Depends on:
Reported: 2015-01-19 14:39 UTC by Philip Taylor
Modified: 2015-01-21 10:19 UTC (History)
0 users

See Also:
i915 platform:
i915 features:

Test case (3.04 KB, text/plain)
2015-01-19 14:39 UTC, Philip Taylor

Description Philip Taylor 2015-01-19 14:39:06 UTC
Created attachment 112474 [details]
Test case

Running on "Intel(R) HD Graphics Haswell GT2 Mobile" (Gen7.5). Tested with Release_v1.0.0 and with latest master (786da41).

The attached patch adds a test which essentially computes the dot-product of two 16-element arrays, slightly unrolled so it does two multiplications per loop iteration. The inputs are 16-bit and the sum is 64-bit.

Every work item does exactly the same computation. It runs with global and local work size 16.

The output shows the first 8 work items get the correct result, but the next 8 get the wrong result.

If I un-unroll the loop (change "i += 2" to 1, and remove the "sum += b0 * b1") then it gives the correct output.

If I change "long sum = 0" to "int sum = 0", then it gives the correct output.
Comment 1 Zhigang Gong 2015-01-20 02:38:23 UTC
It seems a post register allocation bug. You can disable the post register allocation by set the following environment:

And try again. But that will cause about 8% performance regression.

I will fix it soon. Thanks for reporting this.
Comment 2 Zhigang Gong 2015-01-20 07:40:16 UTC
I just submitted a patch to the mail list, the patch is at:

Could you try it at your side?
Comment 3 Philip Taylor 2015-01-20 11:48:39 UTC
That seems to fix it for me - thanks!
Comment 4 Zhigang Gong 2015-01-21 10:18:59 UTC
The following patch has been pushed to the master and Release_v1.0 branches.

commit 1511fe52ef956634075eae4d5a5d58c9b5ffdde1
Author: Zhigang Gong <zhigang.gong@intel.com>
Date:   Tue Jan 20 14:40:39 2015 +0800

    GBE: fix an ACC register related instruction scheduling bug

    Some instructions modify the ACC register in the gen_context
    stage which's not regonized by current instruction scheduling
    algorithm. This patch fix this bug by checking all the possible
    SEL_OPs which may change the ACC implicitly.

    The corresponding bugzilla link is as below:

    Signed-off-by: Zhigang Gong <zhigang.gong@intel.com>
    Reviewed-by: "Yang, Rong R" <rong.r.yang@intel.com>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.