| Summary: | GLSL cos/sin functions broken on Mesa R600 driver | ||
|---|---|---|---|
| Product: | Mesa | Reporter: | Alain Perrot <alain.perrot> |
| Component: | Drivers/DRI/R600 | Assignee: | Default DRI bug account <dri-devel> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | normal | ||
| Priority: | medium | CC: | amaasikas |
| Version: | git | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| i915 platform: | i915 features: | ||
| Attachments: |
r700_assembler.c Trig function patch
r700_assembler.c Trig function & sincos patch Alternative assemble_TRIG fix Alternative assemble_TRIG and assemble_SCS fix patch to try |
||
|
Description
Alain Perrot
2010-04-29 12:09:06 UTC
A proper fix would be to fix the r700_assembler.c code so that it generates code to translate the radians input to a corresponding value between PI and -PI before passing it to the SIN or COS instructions.
I *think* that this could be achieved through the following r600 instructions
inst count unit opcode DEST SRC0, SRC1, SRC3
0 x: MUL tmp.x, ang, 1/(2.0 * pi)
1 x: FRACT tmp.x, tmp.x
2 x: ADD tmp.x, tmp.x, -0.5F
y: MULADD tmp.y, tmp.x, (2.0 * pi), -(2.0 * pi)
z: MUL tmp.z, tmp.x, (2.0 * pi)
3 x: CMOVEGT tmp.x, tmp.x, tmp.y, tmp.z
4 t: SIN or COS dest, tmp.x
Unfortunately I can't quite figure out how to do it in the code yet to try and see.
Created attachment 35743 [details] [review] r700_assembler.c Trig function patch Alain, Okay, The patch I just posted might fix this bug. It doesn't cause any additional errors in piglit either. I think its working right with http://github.com/jckarter/hello-gl-ch3 too. Of course I have only tested it on my RadeonHD 3100 and its my first attempt at r600 assembly so let me know if it works for you. Note: it could probably be done so its faster but I'm still not to sure on how everything works yet in the software Conn (In reply to comment #3) > Alain, > > Okay, The patch I just posted might fix this bug. It doesn't cause any > additional errors in piglit either. I think its working right with > http://github.com/jckarter/hello-gl-ch3 too. Of course I have only tested it on > my RadeonHD 3100 and its my first attempt at r600 assembly so let me know if it > works for you. > > Note: it could probably be done so its faster but I'm still not to sure on how > everything works yet in the software > > Conn Thanks for your help. Unfortunately, your patch does not fix the cos/sin functions (at least on my Radeon HD 3870 / RV670). The hello-gl-ch3 example works better but still not as expected, you can compare with Mesa software rendering. I tried to play with your patch to make it work without success for now. A note is that there may be a mistake on the last operand to the CNDGT instruction : + setaddrmode_PVSSRC(&(pAsm->S[2].src), ADDR_ABSOLUTE); + pAsm->S[1].src.rtype = DST_REG_TEMPORARY; + pAsm->S[1].src.reg = tmp2; + setswizzle_PVSSRC(&(pAsm->S[2].src), SQ_SEL_X); should probably be : + setaddrmode_PVSSRC(&(pAsm->S[2].src), ADDR_ABSOLUTE); + pAsm->S[2].src.rtype = DST_REG_TEMPORARY; + pAsm->S[2].src.reg = tmp2; + setswizzle_PVSSRC(&(pAsm->S[2].src), SQ_SEL_X); But this update does not make the cos/sin functions work. I will try again tomorrow. On Wed, May 19, 2010 at 3:58 PM, <bugzilla-daemon@freedesktop.org> wrote: > https://bugs.freedesktop.org/show_bug.cgi?id=27901 > > --- Comment #4 from Alain Perrot <alain.perrot@gmail.com> 2010-05-19 15:58:12 PDT --- > (In reply to comment #3) >> Alain, >> >> Okay, The patch I just posted might fix this bug. It doesn't cause any >> additional errors in piglit either. I think its working right with >> http://github.com/jckarter/hello-gl-ch3 too. Of course I have only tested it on >> my RadeonHD 3100 and its my first attempt at r600 assembly so let me know if it >> works for you. >> >> Note: it could probably be done so its faster but I'm still not to sure on how >> everything works yet in the software >> >> Conn > > Thanks for your help. > > Unfortunately, your patch does not fix the cos/sin functions (at least on my > Radeon HD 3870 / RV670). The hello-gl-ch3 example works better but still not as > expected, you can compare with Mesa software rendering. > > I tried to play with your patch to make it work without success for now. A note > is that there may be a mistake on the last operand to the CNDGT instruction : > > + setaddrmode_PVSSRC(&(pAsm->S[2].src), ADDR_ABSOLUTE); > + pAsm->S[1].src.rtype = DST_REG_TEMPORARY; > + pAsm->S[1].src.reg = tmp2; > + setswizzle_PVSSRC(&(pAsm->S[2].src), SQ_SEL_X); > > should probably be : > > + setaddrmode_PVSSRC(&(pAsm->S[2].src), ADDR_ABSOLUTE); > + pAsm->S[2].src.rtype = DST_REG_TEMPORARY; > + pAsm->S[2].src.reg = tmp2; > + setswizzle_PVSSRC(&(pAsm->S[2].src), SQ_SEL_X); > > But this update does not make the cos/sin functions work. > > I will try again tomorrow. > > -- > Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug. > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel > Alain, Okay I'll look at it some more tonight. At least I know I'm on the right track. Thanks for testing. Conn Created attachment 35776 [details] [review] r700_assembler.c Trig function & sincos patch This patch fixes the assemble_TRIG function and eliminates two errors in piglit when running the glean/glsl1 test case. I took the liberty of applying the same change to the assemble_SCS function as well since it to would suffer from the same problem. Alain, The patch I just posted fixes the problem on my machine completely. The problem with the previous version of the patch (other than the source mistake you found ;) )has something to do me coding in the 0.5 special constant. Replacing it with a literal 0.5 constant works. I also extended the definition of PI 3 places to avoid drift of the values when it gets an input significantly greater or less than PI and -PI. Let me know how it works for you. Thanks. Conn Created attachment 35777 [details] [review] Alternative assemble_TRIG fix I can confirm that your patch seems to work for me too. By the way, you beat me at posting a working patch here :-) I also figured out that the 0.5 special constant was an issue in your patch, and I managed to get a working assemble_TRIG function which implements the following instruction sequence (lightly different of yours) to normalize the angle: MULADD tmp.x, angle, 1/(2*PI), 0.5 FRACT tmp.x, tmp.x ADD tmp.y, tmp.x, 1 CNDGE tmp.x, tmp.x, tmp.x, tmp.y MULADD tmp.x, tmp.x, 2*PI, -PI I don't known if it is better or worse than yours beside the fact that it use only one helper variable. I attached my patch (updated to use the same extended value of PI than yours) which fix the assemble_TRIG function, but not the assemble_SCS one. On Thu, May 20, 2010 at 5:40 PM, <bugzilla-daemon@freedesktop.org> wrote: > https://bugs.freedesktop.org/show_bug.cgi?id=27901 > > --- Comment #8 from Alain Perrot <alain.perrot@gmail.com> 2010-05-20 17:40:21 PDT --- > Created an attachment (id=35777) > View: https://bugs.freedesktop.org/attachment.cgi?id=35777 > Review: https://bugs.freedesktop.org/review?bug=27901&attachment=35777 > > Alternative assemble_TRIG fix > > I can confirm that your patch seems to work for me too. > > By the way, you beat me at posting a working patch here :-) > > I also figured out that the 0.5 special constant was an issue in your patch, > and I managed to get a working assemble_TRIG function which implements the > following instruction sequence (lightly different of yours) to normalize the > angle: > > MULADD tmp.x, angle, 1/(2*PI), 0.5 > FRACT tmp.x, tmp.x > ADD tmp.y, tmp.x, 1 > CNDGE tmp.x, tmp.x, tmp.x, tmp.y > MULADD tmp.x, tmp.x, 2*PI, -PI > > I don't known if it is better or worse than yours beside the fact that it use > only one helper variable. > > I attached my patch (updated to use the same extended value of PI than yours) > which fix the assemble_TRIG function, but not the assemble_SCS one. > > -- > Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug. > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel > Alain, Its a tough call on who's is the better solution. Yours uses one less temp reg and mine will allow for a couple of operations to be done in parallel in the future. I guess we both deserve a pat on the back and leave it to someone more experienced to make the call on which one to choose. Good job Conn > Alain,
>
> Its a tough call on who's is the better solution. Yours uses one less
> temp reg and mine will allow for a couple of operations to be done in
> parallel in the future. I guess we both deserve a pat on the back and
> leave it to someone more experienced to make the call on which one to
> choose.
>
> Good job
>
> Conn
You're probably right (and I known nothing about parallelization on GPUs).
In any way, many thanks for your help.
On Fri, May 21, 2010 at 12:48 AM, <bugzilla-daemon@freedesktop.org> wrote: > https://bugs.freedesktop.org/show_bug.cgi?id=27901 > > --- Comment #10 from Alain Perrot <alain.perrot@gmail.com> 2010-05-21 00:48:34 PDT --- >> Alain, >> >> Its a tough call on who's is the better solution. Yours uses one less >> temp reg and mine will allow for a couple of operations to be done in >> parallel in the future. I guess we both deserve a pat on the back and >> leave it to someone more experienced to make the call on which one to >> choose. >> >> Good job >> >> Conn > > You're probably right (and I known nothing about parallelization on GPUs). > > In any way, many thanks for your help. > > -- > Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug. > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel > Alain, For the current code I think your patch a tad better. It uses one less instruction. Short of benchmarking the two solutions I think yours is the one that should go in. Would you please submit a patch that includes the assemble_SCS function as well. After you submit it with that change and a notice that you have singed off on it. I'll nominate it over my own to go in. Conn Created attachment 35787 [details] [review] Alternative assemble_TRIG and assemble_SCS fix Conn, Attached is the updated patch which includes the assemble_SCS function. If it is ok for you, I will submit it (I guess that it should be sent to the dri-devel mailing list ?) On Fri, May 21, 2010 at 11:13 AM, <bugzilla-daemon@freedesktop.org> wrote: > https://bugs.freedesktop.org/show_bug.cgi?id=27901 > > Alain Perrot <alain.perrot@gmail.com> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Attachment #35777 [details]|0 |1 > is obsolete| | > > --- Comment #12 from Alain Perrot <alain.perrot@gmail.com> 2010-05-21 11:13:05 PDT --- > Created an attachment (id=35787) > View: https://bugs.freedesktop.org/attachment.cgi?id=35787 > Review: https://bugs.freedesktop.org/review?bug=27901&attachment=35787 > > Alternative assemble_TRIG and assemble_SCS fix > > Conn, > > Attached is the updated patch which includes the assemble_SCS function. > If it is ok for you, I will submit it (I guess that it should be sent to the > dri-devel mailing list ?) > > -- > Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug. > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel > Alain, Its on the mailing list. I'll inform them to merge it after I run piglit and verify it works. Conn Alain, Your patch passes piglit. I'm going to change the status of this bug now to FIXED Conn (In reply to comment #14) > Alain, > > Your patch passes piglit. > > I'm going to change the status of this bug now to FIXED > > > Conn Ok, thanks. Reopened because patch has not yet been applied Andre wanted to review this before applying. dont' have much net this week to review/test:(
but i'm ok with it if you make last mul conditional on r700 as
it has -1..1 range it seems, also amd shader analyzer gives this difference:
RV610 hd2400
; -------- Disassembly --------------------
00 ALU: ADDR(32) CNT(8)
0 y: MULADD R123.y, R0.x, (0x3E22F983, 0.1591549367f).x,
0.5
z: MOV R0.z, 0.0f
w: MOV R0.w, 1.0f
1 x: FRACT ____, PV0.y
2 z: MULADD R123.z, PV1.x, (0x40C90FDB, 6.283185482f).y,
-(0x40490FDB, 3.141592741f).x
3 t: SIN R0.x, PV2.z
01 EXP_DONE: PIX0, R0.xzzw
END_OF_PROGRAM
4870 RV770
; -------- Disassembly --------------------
00 ALU: ADDR(32) CNT(10)
0 y: MOV R0.y, 0.0f
z: MOV R0.z, 1.0f
w: MULADD R123.w, R0.x, (0x3E22F983, 0.1591549367f).x,
0.5
1 y: FRACT ____, PV0.w
2 x: MULADD R123.x, PV1.y, (0x40C90FDB, 6.283185482f).y,
-(0x40490FDB, 3.141592741f).x
3 z: MUL ____, PV2.x, (0x3E22F983, 0.1591549367f).x
4 t: SIN R0.x, PV3.z
01 EXP_DONE: PIX0, R0.xyyz
END_OF_PROGRAM
On Tue, Jun 8, 2010 at 5:53 AM, <bugzilla-daemon@freedesktop.org> wrote: > https://bugs.freedesktop.org/show_bug.cgi?id=27901 > > --- Comment #18 from Andre Maasikas <amaasikas@gmail.com> 2010-06-08 05:53:36 PDT --- > dont' have much net this week to review/test:( > but i'm ok with it if you make last mul conditional on r700 as > it has -1..1 range it seems, also amd shader analyzer gives this difference: > > RV610 hd2400 > > ; -------- Disassembly -------------------- > 00 ALU: ADDR(32) CNT(8) > 0 y: MULADD R123.y, R0.x, (0x3E22F983, 0.1591549367f).x, > 0.5 > z: MOV R0.z, 0.0f > w: MOV R0.w, 1.0f > 1 x: FRACT ____, PV0.y > 2 z: MULADD R123.z, PV1.x, (0x40C90FDB, 6.283185482f).y, > -(0x40490FDB, 3.141592741f).x > 3 t: SIN R0.x, PV2.z > 01 EXP_DONE: PIX0, R0.xzzw > END_OF_PROGRAM > > 4870 RV770 > ; -------- Disassembly -------------------- > 00 ALU: ADDR(32) CNT(10) > 0 y: MOV R0.y, 0.0f > z: MOV R0.z, 1.0f > w: MULADD R123.w, R0.x, (0x3E22F983, 0.1591549367f).x, > 0.5 > 1 y: FRACT ____, PV0.w > 2 x: MULADD R123.x, PV1.y, (0x40C90FDB, 6.283185482f).y, > -(0x40490FDB, 3.141592741f).x > 3 z: MUL ____, PV2.x, (0x3E22F983, 0.1591549367f).x > 4 t: SIN R0.x, PV3.z > 01 EXP_DONE: PIX0, R0.xyyz > END_OF_PROGRAM > > -- > Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug. > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel > This is very very strange that amd would change the instruction. I wonder if it is a bug in their code.... Perhaps we need someone with an r700 to run the sin and cos tests in piglit . The proposed patch passes on my rs780 (rv610) . On Tue, Jun 8, 2010 at 9:09 AM, <bugzilla-daemon@freedesktop.org> wrote: > https://bugs.freedesktop.org/show_bug.cgi?id=27901 > > --- Comment #19 from Conn Clark <conn.o.clark@gmail.com> 2010-06-08 09:09:49 PDT --- > On Tue, Jun 8, 2010 at 5:53 AM, <bugzilla-daemon@freedesktop.org> wrote: >> https://bugs.freedesktop.org/show_bug.cgi?id=27901 >> >> --- Comment #18 from Andre Maasikas <amaasikas@gmail.com> 2010-06-08 05:53:36 PDT --- >> dont' have much net this week to review/test:( >> but i'm ok with it if you make last mul conditional on r700 as >> it has -1..1 range it seems, also amd shader analyzer gives this difference: >> >> RV610 hd2400 >> >> ; -------- Disassembly -------------------- >> 00 ALU: ADDR(32) CNT(8) >> 0 y: MULADD R123.y, R0.x, (0x3E22F983, 0.1591549367f).x, >> 0.5 >> z: MOV R0.z, 0.0f >> w: MOV R0.w, 1.0f >> 1 x: FRACT ____, PV0.y >> 2 z: MULADD R123.z, PV1.x, (0x40C90FDB, 6.283185482f).y, >> -(0x40490FDB, 3.141592741f).x >> 3 t: SIN R0.x, PV2.z >> 01 EXP_DONE: PIX0, R0.xzzw >> END_OF_PROGRAM >> >> 4870 RV770 >> ; -------- Disassembly -------------------- >> 00 ALU: ADDR(32) CNT(10) >> 0 y: MOV R0.y, 0.0f >> z: MOV R0.z, 1.0f >> w: MULADD R123.w, R0.x, (0x3E22F983, 0.1591549367f).x, >> 0.5 >> 1 y: FRACT ____, PV0.w >> 2 x: MULADD R123.x, PV1.y, (0x40C90FDB, 6.283185482f).y, >> -(0x40490FDB, 3.141592741f).x >> 3 z: MUL ____, PV2.x, (0x3E22F983, 0.1591549367f).x >> 4 t: SIN R0.x, PV3.z >> 01 EXP_DONE: PIX0, R0.xyyz >> END_OF_PROGRAM >> >> -- >> Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email >> ------- You are receiving this mail because: ------- >> You are the assignee for the bug. >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/dri-devel >> > > This is very very strange that amd would change the instruction. I > wonder if it is a bug in their code.... Perhaps we need someone with > an r700 to run the sin and cos tests in piglit . The proposed patch > passes on my rs780 (rv610) . > > -- > Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are the assignee for the bug. > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel > Nope Andre, is right. the evergreen docs describe the sine function as follows Scalar sine. Input must be normalized from radians by dividing by 2*PI. The valid input domain is [-256, +256], which corresponds to an un-normalized input domain [-512*PI, +512*PI]. Out-of-range input results in float 0. while the r600 describes the sine function as follows Scalar sine. Valid input domain [-PI, +PI]. On Tue, Jun 8, 2010 at 9:25 AM, Conn Clark <conn.o.clark@gmail.com> wrote: > On Tue, Jun 8, 2010 at 9:09 AM, <bugzilla-daemon@freedesktop.org> wrote: >> https://bugs.freedesktop.org/show_bug.cgi?id=27901 >> >> --- Comment #19 from Conn Clark <conn.o.clark@gmail.com> 2010-06-08 09:09:49 PDT --- >> On Tue, Jun 8, 2010 at 5:53 AM, <bugzilla-daemon@freedesktop.org> wrote: >>> https://bugs.freedesktop.org/show_bug.cgi?id=27901 >>> >>> --- Comment #18 from Andre Maasikas <amaasikas@gmail.com> 2010-06-08 05:53:36 PDT --- >>> dont' have much net this week to review/test:( >>> but i'm ok with it if you make last mul conditional on r700 as >>> it has -1..1 range it seems, also amd shader analyzer gives this difference: >>> >>> RV610 hd2400 >>> >>> ; -------- Disassembly -------------------- >>> 00 ALU: ADDR(32) CNT(8) >>> 0 y: MULADD R123.y, R0.x, (0x3E22F983, 0.1591549367f).x, >>> 0.5 >>> z: MOV R0.z, 0.0f >>> w: MOV R0.w, 1.0f >>> 1 x: FRACT ____, PV0.y >>> 2 z: MULADD R123.z, PV1.x, (0x40C90FDB, 6.283185482f).y, >>> -(0x40490FDB, 3.141592741f).x >>> 3 t: SIN R0.x, PV2.z >>> 01 EXP_DONE: PIX0, R0.xzzw >>> END_OF_PROGRAM >>> >>> 4870 RV770 >>> ; -------- Disassembly -------------------- >>> 00 ALU: ADDR(32) CNT(10) >>> 0 y: MOV R0.y, 0.0f >>> z: MOV R0.z, 1.0f >>> w: MULADD R123.w, R0.x, (0x3E22F983, 0.1591549367f).x, >>> 0.5 >>> 1 y: FRACT ____, PV0.w >>> 2 x: MULADD R123.x, PV1.y, (0x40C90FDB, 6.283185482f).y, >>> -(0x40490FDB, 3.141592741f).x >>> 3 z: MUL ____, PV2.x, (0x3E22F983, 0.1591549367f).x >>> 4 t: SIN R0.x, PV3.z >>> 01 EXP_DONE: PIX0, R0.xyyz >>> END_OF_PROGRAM >>> >>> -- >>> Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email >>> ------- You are receiving this mail because: ------- >>> You are the assignee for the bug. >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/dri-devel >>> >> >> This is very very strange that amd would change the instruction. I >> wonder if it is a bug in their code.... Perhaps we need someone with >> an r700 to run the sin and cos tests in piglit . The proposed patch >> passes on my rs780 (rv610) . >> >> -- >> Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email >> ------- You are receiving this mail because: ------- >> You are the assignee for the bug. >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/dri-devel >> > > > Nope Andre, is right. > > > the evergreen docs describe the sine function as follows > > Scalar sine. Input must be normalized from radians by dividing by > 2*PI. The valid input > domain is [-256, +256], which corresponds to an un-normalized input > domain [-512*PI, > +512*PI]. Out-of-range input results in float 0. > > > while the r600 describes the sine function as follows > > Scalar sine. Valid input domain [-PI, +PI]. > > > -- > > Conn O. Clark > > Observation: In formal computer science advances are made > by standing on the shoulders of giants. Linux has proved > that if there are enough of you, you can advance just as > far by stepping on each others toes. > I guess I'll modify Alain's patch tonight. If r700 has a valid range -256 to 256 we can get away with just using a FRACT instruction with 2*PI. Created attachment 37437 [details] [review] patch to try If you could try the attached patch on r600, it passes fine for me on r700. I have just tested your patch with my RV670 GPU, and the hello-gl-ch3 example program works fine here. The piglit glean/glsl1-cos(vec4) and glean/glsl1-sin(vec4) tests are also fixed with this patch. Thanks! I've seen that Andre's patch has been committed to Mesa repository (commit d6a5f94ea4d03b05c434fcad125d1f9c50c638e8), so I'm closing this bug as fixed. Thanks again! |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.