Summary: | [r128] Updated EXA patches | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Connor Behan <connor.behan> | ||||||||||||||||||||||||||||||
Component: | Driver/rage128 | Assignee: | Xorg Project Team <xorg-team> | ||||||||||||||||||||||||||||||
Status: | RESOLVED FIXED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||||||||||||||||||||
Severity: | major | ||||||||||||||||||||||||||||||||
Priority: | medium | CC: | alexdeucher | ||||||||||||||||||||||||||||||
Version: | git | ||||||||||||||||||||||||||||||||
Hardware: | All | ||||||||||||||||||||||||||||||||
OS: | All | ||||||||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||||||||||||||
Attachments: |
|
Description
Connor Behan
2012-03-25 13:40:24 UTC
Created attachment 59106 [details] [review] Patch based on work by Garvin, Anholt and Xv. This adds basic Composite RENDER support. This still needs work as I'm not sure how to deal with BGR pixmaps on a card that expects RGB. If you are using cairo 1.12, please revert to an older version otherwise you may hit bug 47266. Also, you may want to cross compare your rage128 code with the r100 EXA code in the radeon driver. The hardware is pretty similar and the current radeon code is more up to date with the latest versions of EXA. Also, FWIW, I ported rage128 support to radeon UMS a while back. Might be a useful reference: http://cgit.freedesktop.org/~agd5f/xf86-video-ati/log/?h=r128-support Created attachment 59877 [details] [review] More mature patch Thanks for the advice. In the new patch, the Solid and Copy hooks are heavily based on Radeon and fix the issues I was seeing with cairo 1.10. I tried out your r128-radeon merge a few years ago but I didn't realize it was relevant to this project. The patch right now has hooks for Solid, Copy and Composite (I had to get rid of support for PICT_a8 and PICT_a8r8g8b8) that all seem to work. It also works with HWCursor, PageFlip and Xv. I tested it on all color depths, and with DRI on and off and with the Composite extension on and off. On all those configurations, the "quick" rendercheck tests (fill, dcoords, scoords, mcoords, tscoords, tmcoords, triangles, bug7366) get a perfect score. Additionally on the configuration with a 16bpp window, DRI on and Composite on (I believe this is by far the most widely used configuration among people that still have an r128), repeat scores 1292/1292, blend scores 8284950/8284950 and gradient scores 19249/51681. I doubt this is a problem with my patch since I also get 19249/51681 from the software renderer. I haven't had time yet to run the composite and cacomposite tests. I ran "rendercheck -t composite -o Clear" on XAA which took 3 hours and got a perfect score. Then I ran it on EXA which took 4 hours and it passed 34808500 / 34816000 tests. All the failed tests had to do with r5g6b5 but if I specify "-f r5g6b5" I don't get any failures. It must have something to do with combining different src, dest and mask formats. Should I remove composite hooks from my patch since they seem to be slower than software on my card? Or if not, would an EXA patch need to score 100% to be accepted? (In reply to comment #4) > Should I remove composite hooks from my patch since they seem to be slower than > software on my card? Slower doing what? The main benefit is generally for text rendering, as measured e.g. by x11perf -aa10text. > Or if not, would an EXA patch need to score 100% to be accepted? You seem to be the one caring the most about the r128 driver right now, so I think that's up to you. :) Created attachment 60471 [details] [review] Git format patch v1 My inability to pinpoint the failed test was due to bug 48801. Now that I've added a quirks function, things look much better. It has passed 24 hours worth of composite and cacomposite tests. This is still not all of them, but I have every expectation that the others will pass as well. Therefore I am attaching the patch that I propose for merging. Should I also submit it using "git send-email"? (If a week goes by without a reply I will just go ahead and do this). Created attachment 60748 [details] [review] Git format patch v2 You should probably try to handle solid pictures better, or you'll run into nastiness with Cairo 1.12, see bug 47266. I tried to adapt your radeon patch http://pastebin.com/rAAxz2EK but I'm having trouble because all of the cairo solid pictures are PICT_a8r8g8b8. This was never supported on my card. If I add it to the list of supported formats in R128GetDatatypePict2, every line of text on my screen is blacked out. In the paste, I have added a variable "trying_solid" to allow PICT_a8r8g8b8 only when a fallback to software would expose the X server bug. This is slightly better but I still see at least one blacked out line whenever I look at a page of text. Any ideas? (In reply to comment #9) > [...] PICT_a8r8g8b8 [...] was never supported on my card. BTW, I think that will cause a lot of software rendering fallbacks, as it's the 32 bit format returned by XRenderFindStandardFormat(). Do problems occur when this format is used for the source picture, or only for the mask? If the latter, at least for solid masks you should be able to use the A8 hardware format. > If I add it to the list of supported formats in R128GetDatatypePict2, BTW, R128CheckCompositeTexture uses R128GetDatatypePict2 for the destination picture format, whereas otherwise it seems to be used only for source/mask picture formats, and R128GetDatatypePict1 for the destination. Is that intended? > every line of text on my screen is blacked out. Not sure it's the cause of the black lines, but I don't think you can just assign pPix->devPrivate.ptr like that for several reasons. See xf86-video-ati commit 6bda7ceda645e838723883d133d614def1511d16 for a possible way to do it. (Thinking about this problem made me realize the previous code there couldn't work with UMS, thanks :) (In reply to comment #10) > BTW, R128CheckCompositeTexture uses R128GetDatatypePict2 for the destination > picture format, whereas otherwise it seems to be used only for source/mask > picture formats, and R128GetDatatypePict1 for the destination. Is that > intended? No, thank-you for catching it... and for committing the UMS code :). My latest attempt is http://pastebin.com/GwdrVZJE. > Do problems occur when this format is used for the source picture, or only for > the mask? The problems are source related because I see the same black lines even if I tell the driver to ignore the mask for the solid pictures. In fact the masks are already in the PICT_a8 format. Could it be that some important registers are simply not big enough for 32 bits? I noticed that the driver is already set up to die on a 32 bit screen: http://cgit.freedesktop.org/xorg/driver/xf86-video-r128/tree/src/r128_driver.c#n775 (In reply to comment #11) > The problems are source related because I see the same black lines even if I > tell the driver to ignore the mask for the solid pictures. Meaning if you fall back when the mask is a solid picture, or what exactly? > In fact the masks are already in the PICT_a8 format. Solid pictures always have PICT_a8r8g8b8 as the format, and they can be used as the mask as well. My point was that it might be possible to treat solid masks as PICT_a8, at least without per-component alpha. > Could it be that some important registers are simply not big enough for 32 > bits? I noticed that the driver is already set > up to die on a 32 bit screen: > http://cgit.freedesktop.org/xorg/driver/xf86-video-r128/tree/src/r128_driver.c#n775 There's no point in supporting depth 32 in addition to depth 24 for the screen, as the alpha channel wouldn't have any effect. Some more oddities I noticed: > if (trying_solid && pPict->format == PICT_a8r8g8b8) { > *tex_cntl_c = R128_DATATYPE_RGB8; R128_DATATYPE_RGB8 sounds like the alpha channel is ignored. > } else if (trying_solid && pPict->format == PICT_a8) { > *tex_cntl_c = R128_DATATYPE_Y8; This is dead code, as pPict->format is always PICT_a8r8g8b8 for solid pictures. Did you mean something like (trying_solid && unit == 1)? > case PICT_x8r8g8b8: > *type = R128_DATATYPE_ARGB8888; PICT_x8r8g8b8 means that the 8 bits of the x channel should not be touched, but R128_DATATYPE_ARGB8888 might. (In reply to comment #12) > Meaning if you fall back when the mask is a solid picture, or what exactly? Every time the PrepareComposite hook checks for pMask != NULL, I made it instead check for pMask != NULL && !trying_solid. By ignoring the mask this way but not falling back, I was hoping to see some image get rendered (possibly with the wrong transparency) in place of the black box. This is what I think would've happened if the only problem were the mask. > > if (trying_solid && pPict->format == PICT_a8r8g8b8) { > > *tex_cntl_c = R128_DATATYPE_RGB8; > > R128_DATATYPE_RGB8 sounds like the alpha channel is ignored. Oops, it used to be R128_DATATYPE_ARGB8888 but I tried changing it to other types to see if any of them worked. > > } else if (trying_solid && pPict->format == PICT_a8) { > > *tex_cntl_c = R128_DATATYPE_Y8; > > This is dead code, as pPict->format is always PICT_a8r8g8b8 for solid pictures. > Did you mean something like (trying_solid && unit == 1)? I don't think it's dead code. When I insert a line that says R128TRACE(("%d, 0x%x\n", unit, (int)pPict->format)); whenever trying_solid is TRUE, the output I see in my log file is always "0, 0x20028888" and "1, 0x8018000". That's why I think the source is always PICT_a8r8g8b8 while the mask is always PICT_a8. So checking for unit == 1 and checking for the format being PICT_a8 are the same thing. > > case PICT_x8r8g8b8: > > *type = R128_DATATYPE_ARGB8888; > > PICT_x8r8g8b8 means that the 8 bits of the x channel should not be touched, but > R128_DATATYPE_ARGB8888 might. Right, the radeon driver does it differently but it does it by using registers that aren't in my r128_reg.h. I did it this way because the old Xserver did: http://cgit.freedesktop.org/xorg/xserver/tree/hw/kdrive/ati/r128_composite.c?id=0cd662ea80579c317d706ebe04971bb29d0f9b4f#n75 (In reply to comment #13) > > > } else if (trying_solid && pPict->format == PICT_a8) { > > > *tex_cntl_c = R128_DATATYPE_Y8; > > > > This is dead code, as pPict->format is always PICT_a8r8g8b8 for solid pictures. > > Did you mean something like (trying_solid && unit == 1)? > > I don't think it's dead code. When I insert a line that says R128TRACE(("%d, > 0x%x\n", unit, (int)pPict->format)); whenever trying_solid is TRUE, the output > I see in my log file is always "0, 0x20028888" and "1, 0x8018000". That's why I > think the source is always PICT_a8r8g8b8 while the mask is always PICT_a8. So > checking for unit == 1 and checking for the format being PICT_a8 are the same > thing. Ah, now I see the way trying_solid is used can't work in all cases: The source picture and/or the mask picture (or neither) can be a solid picture. That either of them is a solid picture does not directly affect the other one. > Right, the radeon driver does it differently but it does it by using registers > that aren't in my r128_reg.h. Looking at the r128 kernel and Mesa drivers, I think it should be possible to mask out writes to the alpha channel via the register at offset 0x1d44, which seems to be called something like R128_PLANE_3D_MASK_C. I went back to the basics and found a couple errors in my logic. There was never actually a problem supporting 32 bit pictures. Before, I thought that falling back to software for PICT_a8r8g8b8 made my screen look much better but I could've achieved the same effect by falling back to software for PictOpOver. When the source / mask formats are PICT_r5g6b5, I can enable 9 of the 13 blend ops and still get rendercheck to pass. When the source / mask formats are PICT_a8r8g8b8, I can enable 7 of the 13 ops. Unfortunately PictOpOver is not on either list and that is what I need to support. All the new stuff in cairo is (a8r8g8b8 IN a8) OVER screen. Lying to the card about what the operation is doesn't seem to have any effect. I made R128CCEComposite print out its arguments for the solid pictures. Sometimes srcX and srcY take on strange values like 5157 or -11233. Is this supposed to happen when there is no transform? I don't see how one of my mistakes could affect the arguments being sent to it. (In reply to comment #15) > When the source / mask formats are PICT_r5g6b5, I can enable 9 of the 13 blend > ops and still get rendercheck to pass. When the source / mask formats are > PICT_a8r8g8b8, I can enable 7 of the 13 ops. I assume you've verified the enabled ops are actually hit? :) One poential gotcha about the RENDER extension is that it uses pre-multiplied colour values. Have you e.g. played with the *PREMULT* bits of the R128_REG_PRIM_TEXTURE_COMBINE_CNTL_C register? > I made R128CCEComposite print out its arguments for the solid pictures. > Sometimes srcX and srcY take on strange values like 5157 or -11233. Is this > supposed to happen when there is no transform? The coordinates don't matter for solid pictures, as they always sample the same solid colour. (In reply to comment #16) > I assume you've verified the enabled ops are actually hit? :) Everything except Dst is hit. The sheer amount of output from my driver when running rendercheck made me think that all ops were hit. However, rendercheck performs Src in addition to the op specified on the command line so this is what was confusing me. > One poential gotcha about the RENDER extension is that it uses pre-multiplied > colour values. Have you e.g. played with the *PREMULT* bits of the > R128_REG_PRIM_TEXTURE_COMBINE_CNTL_C register? Thanks, I just tried that. All four combinations of the two PREMULT bits yield the same result - same blank text with cairo 1.10, same blank text with cairo 1.12 and same "rendercheck -t composite -o Over" errors. I also tried sending them to R128_SEC_TEX_COMBINE_CNTL_C. I just took another look. The two successful operations where the composite finishes are InReverse and AtopReverse. The destination is not transparent so these are essentially the same operation. A curious thing I noticed is that they still pass rendercheck when I grossly tamper with the floating point numbers in VTX_OUT. But what concerns me more is a way for me to artificially satisfy rendercheck. If I tell rendercheck to judge ops that fail (like Src and Over) against Fa = 0 and Fb = Aa instead of the coefficients that are supposed to be used by Src and Over, rendercheck no longer complains about them. It seems that the Fa and Fb values used by my card are "stuck". The card thinks it is performing an InReverse or an AtopReverse no matter what I specify in the blend_cntl. This would explain why cairo text is blank when I try to accelerate it - it is being multiplied by 0. Finally got vertex bundles to work. The problem was actually in r128_accel.c where the initial setup commands are issued. It is too early to use the CCE at that point so I just wrote the commands to the mmio. I used to think "a register value is a register value" but this is wrong - it was important that they be written using the CCE. I have much to learn. This basically doubled my aa10text speed, but it's far from passing rendercheck - the triangle primitives look interesting. I'll post an update soon since there is a lot to play with. If I had just figured this out two days ago, I wouldn't have had to annoy the readers of the mailing list. Created attachment 63093 [details] [review] Not quite ready This week has been very productive. This time, rendercheck passes 100% and the ops really are being composited by the hardware. Supporting solid pictures indeed fixes the corruption with cairo 1.12. There are three bugs I've noticed. 1. The only rendering glitch I've noticed which doesn't disappear after a split second happens when I click and drag on the xfce desktop. This draws a translucent box (over with no mask). If I start dragging from the top left or top right corner, everything is fine. If I start dragging from the bottom left corner however, the upper left triangle of the box will be filled with translucent rectangles of a random color. Similarly, if I start dragging from the bottom right, the upper right triangle will be filled with them. Sometimes, the boxes will draw correctly for a few minutes and then go back to showing the rectangles. Presumably, the memory that was incorrectly being read happened to be free at that time. I would think this is not a show stopper. 2. In order to speed up by UploadToScreen hook (and bypass the "big hammer" for UMS solid pictures), I made it call R128DMA. This is a DRM using function that was already in r128_video.c. I had it working when I first tested it out, but now if I try to test it again, it usually overflows the event queue shortly after X starts. I never had it working for DownloadFromScreen. Is there any problem with this function or the way I am calling it? (In the past, I've avoided using DMAForXv because it tends overflow the queue if I watch a video that lasts more than a minute, so I realize that I may be asking about a long standing bug in another part of the driver.) I would think this is not a show stopper either. 3. This one might be a show stopper. When compositing is enabled, 3D apps don't stay anchored to the window correctly. When I start e.g. glxgears, it will look okay for about a second, then the gears will suddenly move so that they are anchored in the top left corner of the screen. If the window is not near the top left corner of the screen, it will just show a black background. If I keep moving the window, the gears keep getting repositioned so that I see them in the right place, but when I leave the window still for a second, they will go back to the top left corner. I tried sending dimensions of the dst rectangle to R128_SC_TOP_LEFT and R128_SC_BOTTOM_RIGHT. I also tried choosing one texture offset register instead of writing to all of them. Mach64 also has 11 texture offset registers and the mach64 driver has a method for determining which one will be used. When I tried using this, composited glyphs were still fine, but the driver created solid pixmaps were butchered and none of the above problems were solved. An interesting line might be line 311 of r128_exa_render.c (line 1954 of the patch) where I increment l2h. Before I did this, the only pixmaps that didn't looked like "two pixmaps stacked on top of eachother" were the pixmaps having POT height. Incrementing l2h made some pixmaps look better and some worse. By trial and error I came up with a reliable condition for when I should increment it. Thanks for all your help. (In reply to comment #20) > 2. In order to speed up by UploadToScreen hook (and bypass the "big hammer" for > UMS solid pictures), [...] Not sure how any change in the UploadToScreen hook could achieve the latter. The purpose of the big hammer is to prevent the CPU from touching VRAM while it's still being used by the GPU, which could be due to any other rendering hook. As for the problem, I suspect that's why DMAForXv defaults to off. > 3. This one might be a show stopper. When compositing is enabled, 3D apps don't > stay anchored to the window correctly. I suspect that's because you're clobbering the R128_WINDOW_XY_OFFSET register and possibly other hardware state also used by the 3D driver. A simple solution might be to save the register contents before clobbering them (though that might only work reliably while the CCE is idle) and restoring them in R128LeaveServer(). Created attachment 63236 [details] [review] Not quite ready 2 Yep, that was the problem. The patch now takes the R128_WINDOW_XY_OFFSET value into account so it doesn't have to be clobbered. Created attachment 63274 [details] [review] Not quite ready 3 I fixed the translucent rectangle bug by using exaOffscreenAlloc properly (my back and depth buffers were too small). This had the upshot of making page flipping work. I plan on sending this to the list soon. (In reply to comment #22) > The patch now takes the R128_WINDOW_XY_OFFSET value into account so it doesn't > have to be clobbered. I'm afraid that can't work reliably either though: A command stream writing a different value could be in flight while you're reading the register, e.g. with several 3D apps running. Good point. That means I have to call exaWaitSync before the INREG right? (In reply to comment #25) > Good point. That means I have to call exaWaitSync before the INREG right? That won't necessarily do the trick when exaMarkSync hasn't been called since the last LeaveServer call. Calling the info->ExaDriver->WaitMarker hook directly should do the trick (but only while it waits for idle regardless of the marker value). BTW, I really don't think it makes sense to have UploadTo/DownloadFromScreen hooks which just use memcpy, as EXA also uses memcpy when the hooks aren't there. Created attachment 63362 [details] [review] Not quite ready 4 (In reply to comment #26) > Calling the info->ExaDriver->WaitMarker hook > directly should do the trick (but only while it waits for idle regardless of > the marker value). Ok, and this is the same as calling exaMarkSync right before exaWaitSync I think. Now the phrase "big hammer" makes more sense because calling the WaitMarker hook directly actually does slow down solid pixmaps. But it's still much faster than XAA, and I'd rather have this than incorrect rendering. > BTW, I really don't think it makes sense to have UploadTo/DownloadFromScreen > hooks which just use memcpy, as EXA also uses memcpy when the hooks aren't > there. They are gone now. I'll bring them back in a future patch if and when I solve the DMAForXv problem. I guess by bug 51137 report should really say "Remove UTS/DFS". Created attachment 63462 [details] [review] Not quite ready 5A Created attachment 63463 [details] [review] Not quite ready 5B Of the six registers in COMPOSITE_SETUP(), only R128_WINDOW_XY_OFFSET caused problems if I clobbered it while a 3D app was running. I have gone back to clobbering the other five with patch A. This is because of a bug I just found in the previous patch. Firefox and thunderbird were fine until I opened a 3D program. After that, they showed no text until I did a VT switch. I don't know why it was just those two (they were compiled with --disable-system-cairo maybe?) but patch A fixes this without adversely affecting any 3D program I try. Patch B also fixes it but not as well. This is your original suggestion of saving the register contents and restoring them in LeaveServer. It seems like this would be more proper but the additional reading and writing allows me to mess up 3D apps if I open many of them at once and move / resize the windows very quickly. This is nearly impossible to do with patch A. Assuming that you don't find major problems, my plan is to send patch A to the list. Comment on attachment 63463 [details] [review] Not quite ready 5B Review of attachment 63463 [details] [review]: ----------------------------------------------------------------- ::: src/r128_dri.c @@ +340,5 @@ > + exaMarkSync(pScreen); > +#ifdef RENDER > + if (info->state_2d.composite_setup && info->have3DWindows) { > + R128CCE_REFRESH(pScrn, info); > + RESTORE_COMPOSITE_STATE(); RESTORE_COMPOSITE_STATE() is called after 128CCEReleaseIndirect(), so the register writes aren't actually submitted to the hardware until after the next EnterServer call. I suggest moving the state restore into the info->CCEInUse block before the R128CCEReleaseIndirect() call. Created attachment 63619 [details] [review] Git format patch v3A Created attachment 63620 [details] [review] Git format patch v3B That did not help noticably. When I use patch B, I can still cause the glxgears to be cut off by black rectangles if I drag the window quickly enough. I also tried calling the WaitMarker before releasing the indirect buffer. Created attachment 64051 [details] [review] Git format patch v4A Fixed by xf86-video-r128 commit c16d4e8bc068cb8127fa95347d26c7b80258f816. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.