Bug 32511

Summary: glDrawPixels broken on savage
Product: Mesa Reporter: Tormod Volden <bugzi11.fdo.tormod>
Component: Drivers/DRI/SavageAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium    
Version: git   
Hardware: x86 (IA32)   
OS: Linux (All)   
See Also: https://launchpad.net/bugs/579071
Whiteboard:
i915 platform: i915 features:
Attachments: full backtrace in savageWriteRGBASpan_565()
several clusters ("lines") of 194 pixels each
one "line" magnified, with my 1 and 194 markers
Xorg.0.log
debug dmesg output including drawpix run
map only the apertures, and use the same handle for framebuffer

Description Tormod Volden 2010-12-19 13:11:57 UTC
Created attachment 41267 [details]
full backtrace in savageWriteRGBASpan_565()

The drawpix demo draws pixels spread around the screen instead of inside the window. Each line drawn (194 pixels) get displayed in a cluster, but not as a line. Only every 6th "line" get displayed, starting from the lower right of the screen, going left, then up (see drawpix-many.png).

The cluster is divided in 4 squares of 8 times 8 pixels, where the first and fourth square is not complete. Each square is drawn from left to right, then from top to bottom. Square are drawn from left to right within a cluster. (See drawpix-line.png).

I have tracked the drawing in mesa down to where it writes each pixel into the renderbuffer in savageWriteRGBASpan_565(), see the full backtrace in write_rgba-backtrace.gdb.

This is with latest git. I think I have seen similar corruption using the antspotlight screensaver since a long time ago, and I never tried drawpix before so this can be an old bug. The bug reporter of the downstream Ubuntu bug says drawpix worked fine in Ubuntu 8.04 which had mesa 7.0.3.
Comment 1 Tormod Volden 2010-12-19 13:13:45 UTC
Created attachment 41268 [details]
several clusters ("lines") of 194 pixels each
Comment 2 Tormod Volden 2010-12-19 13:14:43 UTC
Created attachment 41269 [details]
one "line" magnified, with my 1 and 194 markers
Comment 3 Tormod Volden 2010-12-20 10:34:19 UTC
This is tested with latest 2.6.37 kernel, and libdrm 2.4.21. LIBGL_ALWAYS_INDIRECT does not change anything, other than the xserver crashing when drawpix exits. LIBGL_ALWAYS_SOFTWARE works fine. I have a Savage TwisterK card, Dewey in the Ubuntu report has a SuperSavage IX/C.
Comment 4 Tormod Volden 2010-12-20 10:53:44 UTC
Just in case someone looks at it, in the above backtrace I had disabled the fast_draw_rgba_pixels so the drawing is done by savageWriteRGBASpan_565() through the "slow" path. Normally, with the fast path, savageWriteRGBSpan_565() is used, and the result is the same.
Comment 5 Tormod Volden 2010-12-21 16:39:55 UTC
Created attachment 41353 [details]
Xorg.0.log
Comment 6 Tormod Volden 2011-01-06 15:01:20 UTC
It seems to me that the frame buffer written to is tiled, and that the functions writing into it are unaware of this. I discussed shortly with Alex on IRC, and he said the savages have multiple apertures to the same framebuffer, some providing linear translation.

BTW, tiling is a prerequisite for DRI on savage, so I could not check if turning off tiling would fix it.

Would be nice if someone could explain how this is meant to work in mesa. I would like to continue debugging this, but I am stuck at this point without knowing how it should work.
Comment 7 Tormod Volden 2011-01-06 15:15:01 UTC
I googled up a discussion I had 9 months ago and have totally forgotten about (scary), which points to format reworks causing the regression:

(dri-devel-2010-04-15.log)
11:50 #dri-devel: < tormod> when a GL-application splats blue spots into other windows (savage, no compiz), is that likely a drm bug?
11:52 #dri-devel: <+ajax> savage! retro.
11:54 #dri-devel: <+ajax> it's likely to be a bug in the dri driver
12:28 #dri-devel: < eosie> how large are these blue spots?
12:29 #dri-devel: < tormod> eosie, I'll try to get a screenshot
12:34 #dri-devel: < tormod> eosie, http://imagebin.ca/view/ZNojCc4.html
12:38 #dri-devel: < eosie> cool
12:40 #dri-devel: < agd5f> tormod: tiling issue
12:41 #dri-devel: < agd5f> I suspect somthing isn't using the right aperture
12:42 #dri-devel: < agd5f> IIRC, savage had several apertures exposed via the PCI BARs
12:42 #dri-devel: < agd5f> the first was the linear view, the others were tiled
12:45 #dri-devel: < tormod> I am not using that savage laptop much any longer, but I think this is new since mesa 7.6
12:45 #dri-devel: < agd5f> tormod: I'd guess a problem is the savage span code
12:46 #dri-devel: < agd5f> *in
12:46 #dri-devel: < tormod> where is the span code?
12:46 #dri-devel: < agd5f> savage_span.c
12:46 #dri-devel: < agd5f> IIRC
12:47 #dri-devel: < agd5f> but it's been ages since I looked at savage
12:47 #dri-devel: < tormod> it's been ages since anyone looked at it :)
12:48 #dri-devel: < agd5f> probably one of the format reworks broke something
12:54 #dri-devel: < tormod> yes I remember format reworks broke some other stuff, that got fixed
Comment 8 Tormod Volden 2011-01-06 15:31:26 UTC
Is it possible that the frame buffer address gets wrongly calculated (because of wrong formats or non-initialized stuff) so that it points beyond the linear aperture and reaches inadvertently into a tiled aperture?

Can I verify values in gdb against addresses listed in Xorg.0.log? I have tried writing to addresses in gdb to see if I get something on the screen but I got only errors. I don't know if DRI buffers are writable from gdb.
Comment 9 Tormod Volden 2011-01-08 04:09:53 UTC
Created attachment 41768 [details]
debug dmesg output including drawpix run

The kernel drm savage_drv.h lists this information:
#define SAVAGE_FB_SIZE_S3       0x01000000      /*  16MB */
#define SAVAGE_FB_SIZE_S4       0x02000000      /*  32MB */
#define SAVAGE_MMIO_SIZE        0x00080000      /* 512kB */
#define SAVAGE_APERTURE_OFFSET  0x02000000      /*  32MB */
#define SAVAGE_APERTURE_SIZE    0x05000000      /* 5 tiled surfaces, 16MB each */

I also noticed in drm debug information that drmMapBufs() at savage_xmesa.c:252 returns 0. Is this an issue?

[  336.272629] [drm:drm_ioctl], pid=1559, cmd=0xc00c6419, nr=0x19, dev 0xe200, auth=1
[  336.272647] [drm:drm_mapbufs], 0 buffers, retcode = -22
[  336.272654] [drm:drm_ioctl], ret = ffffffea
Comment 10 Tormod Volden 2011-01-08 14:40:18 UTC
It seems like offsetting the aperture address fixes this issue. I am not sure what is the proper place to do this, but this patch at least works for me. It fixes drawpix and a lot of xscreensaver hacks: glblur works again, molecule/polyhedra/juggler3d and many others are now free of corruption. antspotlight works better, but sometimes hangs or dies, probably for unrelated reasons.

diff --git a/src/mesa/drivers/dri/savage/savagecontext.h b/src/mesa/drivers/dri/savage/savagecontext.h
index 75bec62..2557605 100644
--- a/src/mesa/drivers/dri/savage/savagecontext.h
+++ b/src/mesa/drivers/dri/savage/savagecontext.h
@@ -311,9 +311,9 @@ extern int SAVAGE_DEBUG;
 #define DEBUG_DMA            0x010
 #define DEBUG_STATE          0x020
 
-#define TARGET_FRONT    0x0
-#define TARGET_BACK     0x1
-#define TARGET_DEPTH    0x2
+#define TARGET_FRONT    0x2
+#define TARGET_BACK     0x3
+#define TARGET_DEPTH    0x4
 
 #define SUBPIXEL_X -0.5
 #define SUBPIXEL_Y -0.375
Comment 11 Tormod Volden 2011-01-09 04:42:31 UTC
For reference a 7 year old thread from when this driver was written: http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg16255.html
Comment 12 Tormod Volden 2011-01-17 13:57:23 UTC
While debugging https://bugzilla.kernel.org/show_bug.cgi?id=4607 I noticed the back buffer was wrong after sleep/resume while the front buffer was fine (this with my patch applied). So is the there something missing in the DDX that should set up the apertures?
Comment 13 Tormod Volden 2011-01-25 11:23:42 UTC
The above resume problem was fixed in the DDX by http://cgit.freedesktop.org/xorg/driver/xf86-video-savage/commit/?id=b018d343e6a6810afdaf1a73091dd9bc8c1c95bd
Comment 14 Tormod Volden 2011-02-12 04:13:17 UTC
I have tracked this down to kernel commit 41c2e75e60200a860a74b7c84a6375c105e7437f:
    ...
    But also, because userspace isn't capable of passing such offsets,
    I had to modify drm_find_matching_map() to ignore the offset passed
    in for maps of type _DRM_FRAMEBUFFER or _DRM_REGISTERS.
    
    If we ever support multiple _DRM_FRAMEBUFFER or _DRM_REGISTERS maps
    for a given device, we might have to change that trick, but I don't
    think that happens on any current driver.

The savage driver uses one drm map for the framebuffer and a second for the apertures (which for most cards are in the same BAR, but offset by 0x2000000). Since above commit (in 2.6.30) the drm will confuse the two maps, and return the handle of the first when DRI tries to map the second. This explains why the workaround in comment 10 worked (on most cards).

I will try to modify the DDX to create only one DRM map which covers both framebuffer and apertures (kind of what happens already). Corresponding changes will be needed in mesa to deal with the offset of the apertures in the framebuffer map.
 
For Paramount and Savage 2000 chipsets it is more complicated, since here the frontbuffer and apertures are in separate PCI BARs. Maybe one drm map can work here also, but the offset may be different. Not sure how to report the offset to mesa without breaking the savage ABI.

Or should really the kernel drm be fixed to support multiple framebuffer maps?
Comment 15 Tormod Volden 2011-04-03 14:43:44 UTC
Created attachment 45201 [details] [review]
map only the apertures, and use the same handle for framebuffer

This patch works on my TwisterK but apparently not on some other cards. Since the first apertures should provide a linear mapping of the framebuffer there is no need for a separate framebuffer DRM map, and only the apertures need to be mapped. The aperture DRM map handle can then be used as framebuffer handle as well.

Probably not the final patch, but I will just attach it here for reference.
Comment 16 Brian Paul 2011-04-05 07:53:09 UTC
Another approach to fixing glDrawPixels might be to use the _mesa_meta_DrawPixels() function from meta.c.  Just plug it into the driver dispatch table in savageDDInitSpanFuncs() instead of savageDrawPixels().

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.