Bug 20212 - Incorrect Pixmap pointer causing prepareaccess / finishaccess inbalance
Summary: Incorrect Pixmap pointer causing prepareaccess / finishaccess inbalance
Status: RESOLVED WORKSFORME
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/Acceleration/EXA (show other bugs)
Version: git
Hardware: Other All
: medium blocker
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords: patch
Depends on:
Blocks: xserver-1.6.1
  Show dependency treegraph
 
Reported: 2009-02-19 01:36 UTC by Thomas Hellström
Modified: 2011-10-11 01:11 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Make exaPixmapIsOffscreen fix up the pixmap ptr (434 bytes, patch)
2009-02-19 01:39 UTC, Thomas Hellström
no flags Details | Splinter Review
Possible fix (1.92 KB, patch)
2009-02-23 01:16 UTC, Michel Dänzer
no flags Details | Splinter Review
Possible fix, take two (1.94 KB, patch)
2009-02-23 01:46 UTC, Michel Dänzer
no flags Details | Splinter Review
Possible fix, take three (2.30 KB, patch)
2009-02-23 02:40 UTC, Michel Dänzer
no flags Details | Splinter Review
Possible fix, take four (2.89 KB, patch)
2009-02-23 04:02 UTC, Michel Dänzer
no flags Details | Splinter Review

Description Thomas Hellström 2009-02-19 01:36:56 UTC
In exaCopyDirty, if the transfer function fails or is not present, 
ExaDoPrepareAccess will be called before copying pixmap data.
However, this call may fail, as exaPixmapIsOffscreen may return false, see the section below. Still exaFinishAccess will be called at the function exit causing an inbalance. This will wreak havoc in drivers relying on exaPrepareAccess and exaFinishAccess being balanced.

So how do exaPixmapIs Offscreen return FALSE? Apparently under some circumstances pPixmap->devPrivate.ptr is set to a non-NULL non-fb address, so exaPixmaIsOffscreen will not fix it up to the fb address. Drivers implementing PixmapIsOffscreen may use this pointer to determine whether the pixmap is offscreen or not, and since it will contain a system address, it will return false.
Comment 1 Thomas Hellström 2009-02-19 01:39:38 UTC
Created attachment 23112 [details] [review]
Make exaPixmapIsOffscreen fix up the pixmap ptr

This patch makes exaPixmapIsOffscreen fix up the pixmap pointer regardless whether it's NULL or not. This Fixes the problem for me, but I'm not sure this is the correct fix.
Comment 2 Michel Dänzer 2009-02-19 02:26:47 UTC
> Apparently under some circumstances pPixmap->devPrivate.ptr is set to a
> non-NULL non-fb address

That sounds like the real bug, which should be tracked down.

I'm not sure we can use your patch even as a workaround, I think nouveau relies on its own ptr value being preserved.

Is this not broken yet in 1.5?
Comment 3 Thomas Hellström 2009-02-19 02:43:22 UTC
(In reply to comment #2)
> > Apparently under some circumstances pPixmap->devPrivate.ptr is set to a
> > non-NULL non-fb address
> 
> That sounds like the real bug, which should be tracked down.
> 
> I'm not sure we can use your patch even as a workaround, I think nouveau relies
> on its own ptr value being preserved.

Understood.

What are the rules for the pPixmap->devPrivate.ptr value? When is it supposed to be NULL, and when is it supposed to be populated?

I have some concerns, though, about ExaDoPrepareAccess being called in the migration code, as if the corresponding driver call fails, it will try to move out the pixmap recursively? 

> 
> Is this not broken yet in 1.5?
> 

It's possible. We've had very limited testing on that version. 

Although it's possible to work around this issue in the driver it's pretty important that this get fixed before the 1.6 release.




Comment 4 Michel Dänzer 2009-02-19 02:55:22 UTC
(In reply to comment #3)
> What are the rules for the pPixmap->devPrivate.ptr value? When is it supposed
> to be NULL, and when is it supposed to be populated?

The basic idea is for it to be non-NULL only between PrepareAccess and FinishAccess, to catch paths which try to access the data behind our back.


> I have some concerns, though, about ExaDoPrepareAccess being called in the
> migration code, as if the corresponding driver call fails, it will try to move
> out the pixmap recursively? 

The driver PrepareAccess hook may only fail if there's a DownloadFromScreen hook which can't fail.


> > Is this not broken yet in 1.5?
> 
> It's possible. We've had very limited testing on that version.

It would be good to verify that, and if 1.5 isn't broken maybe try to bisect what broke it.
Comment 5 Thomas Hellström 2009-02-19 03:13:05 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > What are the rules for the pPixmap->devPrivate.ptr value? When is it supposed
> > to be NULL, and when is it supposed to be populated?
> 
> The basic idea is for it to be non-NULL only between PrepareAccess and
> FinishAccess, to catch paths which try to access the data behind our back.
> 


So In the Nouveau case, where the driver populates this pointer, when we're migrating a pixmap from system to VRAM, and the do_migrate code is relying on exaPixmapIsOffscreen() to return true, How is the driver tricked into returning true?

Comment 6 Michel Dänzer 2009-02-19 03:38:14 UTC
(In reply to comment #5)
> So In the Nouveau case, where the driver populates this pointer, when we're
> migrating a pixmap from system to VRAM, and the do_migrate code is relying on
> exaPixmapIsOffscreen() to return true, How is the driver tricked into returning
> true?

Actually, while I'm not sure what Maarten's change (2056c10d12f6b4facf628b861230f9b0a13f3a73 on server-1.6-branch) was needed for or if it couldn't be achieved in a better way, it looks like it only affects drivers which provide a CreatePixmap hook anyway, so it probably isn't relevant for openchrome.
Comment 7 Thomas Hellström 2009-02-19 04:03:05 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > So In the Nouveau case, where the driver populates this pointer, when we're
> > migrating a pixmap from system to VRAM, and the do_migrate code is relying on
> > exaPixmapIsOffscreen() to return true, How is the driver tricked into returning
> > true?
> 
> Actually, while I'm not sure what Maarten's change
> (2056c10d12f6b4facf628b861230f9b0a13f3a73 on server-1.6-branch) was needed for
> or if it couldn't be achieved in a better way, it looks like it only affects
> drivers which provide a CreatePixmap hook anyway, so it probably isn't relevant
> for openchrome.
> 

So this means that the patch would work OK?

This bug more importantly impacts GMA500 as well.

/Thomas
Comment 8 Michel Dänzer 2009-02-19 04:13:59 UTC
(In reply to comment #7)
> So this means that the patch would work OK?

Not sure, but possibly not, as exaPixmapIsOffscreen() will be called for nouveau as well. Maarten?

> This bug more importantly impacts GMA500 as well.

Better find the root cause then. :)
Comment 9 Maarten Maathuis 2009-02-19 09:52:45 UTC
2056c10d12f6b4facf628b861230f9b0a13f3a73 only affects driver allocated pixmap drivers, the change was made to allow non-offscreen pixmaps, which are guaranteed to have a valid pointer at all times. So the pointer has to be NULL'ed before the driver gets to it. It's completely irrelevant to "classic" exa.

The patch should cause no issues for nouveau, there is still the issue who or what is messing with devPrivate.ptr, or forgetting to init it.
Comment 10 Michel Dänzer 2009-02-23 01:16:01 UTC
Created attachment 23201 [details] [review]
Possible fix

I suspect devPrivate.ptr is non-NULL due to ModifyPixmapHeader(). Thomas, does this patch fix the problem? Maarten, would this work for nouveau as well?
Comment 11 Maarten Maathuis 2009-02-23 01:45:29 UTC
This patch has a problem, in the sense that it now forbids non-offscreen pixmaps. So at the very least add a if (exaPixmapIsOffscreen(pPixmap)) before NULL'ing devprivate-ptr.
Comment 12 Michel Dänzer 2009-02-23 01:46:49 UTC
Created attachment 23205 [details] [review]
Possible fix, take two

Better patch which makes sure the ret variable isn't used uninitialized.
Comment 13 Michel Dänzer 2009-02-23 01:48:21 UTC
(In reply to comment #11)
> This patch has a problem, in the sense that it now forbids non-offscreen
> pixmaps.

What exactly do you mean with 'non-offscreen' pixmaps? I don't see how this can break so long as exaPrepareAccess is always called before dereferencing devPrivate.ptr.
Comment 14 Maarten Maathuis 2009-02-23 02:06:29 UTC
Check ExaDoPrepareAccess() please and see what happens for non-offscreen pixmaps.
Comment 15 Michel Dänzer 2009-02-23 02:11:16 UTC
(In reply to comment #14)
> Check ExaDoPrepareAccess() please and see what happens for non-offscreen
> pixmaps.

Still not sure what you mean... if you mean that the driver PrepareAccess hook doesn't get called, then that is by design, you'll have to make sure exaPixmapIsOffscreen() returns TRUE.

Otherwise please be more specific.
Comment 16 Maarten Maathuis 2009-02-23 02:16:37 UTC
The driver has to manage all pixmaps (in driver_alloc case), including those that make no sense whatsoever in an acceleration path. So you create non-offscreen pixmaps, which guarantees you it never ever touches acceleration code. But at the same time you need to have valid devPrivate.ptr. That is all i'm trying to do.
Comment 17 Michel Dänzer 2009-02-23 02:32:29 UTC
(In reply to comment #16)
> But at the same time you need to have valid devPrivate.ptr.

ExaDoPrepareAccess gives you that.
Comment 18 Maarten Maathuis 2009-02-23 02:34:31 UTC
Except when the pixmap isn't considered offscreen. Because you NULL'ed it after ModifyPixmapHeader.
Comment 19 Michel Dänzer 2009-02-23 02:40:54 UTC
Created attachment 23206 [details] [review]
Possible fix, take three

Okay, I think I see the problem now - why didn't you just mention the  !(pExaScr->info->flags & EXA_HANDLES_PIXMAPS) test rather than having me second-guess?

So how about this?
Comment 20 Maarten Maathuis 2009-02-23 03:31:07 UTC
How about also NULL'ing it again on FinishAccess? Perhaps only when exaPixmapIsOffscreen fails.
Comment 21 Michel Dänzer 2009-02-23 04:02:09 UTC
Created attachment 23207 [details] [review]
Possible fix, take four

> How about also NULL'ing it again on FinishAccess? Perhaps only when
> exaPixmapIsOffscreen fails.

No, the whole idea is to always keep it NULL except between PrepareAcess and FinishAccess.

This patch takes care of exaFinishAccess as well as exaPixmapIsOffscreen.
Comment 22 Maarten Maathuis 2009-02-23 07:19:26 UTC
This comment could be removed (imo):

/* Rehide pixmap pointer if we're doing that. */

Also, maybe add a note to exa.h that under no circumstance the driver can rely on the contents of devPrivate.ptr when FinishAccess is called. While i don't expect anyone to do this, it was still possible in the past (for EXA_HANDLES_PIXMAPS).

Codewise the patch looks ok.
Comment 23 Thomas Hellström 2009-02-23 08:29:49 UTC
Unfortunately it doesn't fix the problem. I still get a couple of these pixmap. Typically when I start mozilla.

/Thomas
Comment 24 Michel Dänzer 2009-02-24 00:15:52 UTC
(In reply to comment #23)
> Unfortunately it doesn't fix the problem. I still get a couple of these pixmap.

Bummer. Were you able to find out anything about how the problem comes to be? Maybe a Prepare/FinishAccess asymmetry somewhere?

> Typically when I start mozilla.

As in seamonkey? If so, I wonder if bug 16416 could be related...
Comment 25 Thomas Hellström 2009-02-24 00:49:33 UTC
(In reply to comment #24)
> (In reply to comment #23)
> > Unfortunately it doesn't fix the problem. I still get a couple of these pixmap.
> 
> Bummer. Were you able to find out anything about how the problem comes to be?
> Maybe a Prepare/FinishAccess asymmetry somewhere?
> 

No, I don't think, so, I've debugged the prepareaccess read / write refcount and it seems to be OK, except the read refcount drops below zero when this bug is hit.
The driver should probably hang on an inbalance in the other direction since it uses TTM syncing in the prepare / finishaccess hooks.

It's a bit odd, though, that it's the read refcount indicating that we want to read from offscreen memory, given that the stale pointer value points to a system area. This would perhaps indicate that the pointer is not nulled after migration from system to fb. 

Anyway what I've found out so far, is that this seems to be ARGB 8888 pixmaps used for "over" compositing of glyphs without a mask pixmap.

> > Typically when I start mozilla.
> 
> As in seamonkey? If so, I wonder if bug 16416 could be related...
> 

The outcome looks very similar to what I'm seeing. 
This is because prepareAccess is used for synchronization instead of the normal exa Sync methods. When the prepareAccess refcount is nonzero for an offscreen buffer backing a pixmap, no synchronization takes place, and the client will write into the pixmap before the GPU is done with it. So the problem becomes much more obvious if the GPU fifo is populated (for example by running gears) when the prepareaccess refcount is incorrect.

If the normal exa sync methods are used in the ati driver, the bugs are probably not related. 
Comment 26 Michel Dänzer 2009-02-24 03:00:08 UTC
(In reply to comment #25)
> It's a bit odd, though, that it's the read refcount indicating that we want to
> read from offscreen memory, given that the stale pointer value points to a
> system area. This would perhaps indicate that the pointer is not nulled after
> migration from system to fb. 

Hmm, I don't see how that could happen...

> Anyway what I've found out so far, is that this seems to be ARGB 8888 pixmaps
> used for "over" compositing of glyphs without a mask pixmap.

From exaGlyphsToDst() then? If so, I wonder if http://lists.x.org/archives/xorg-devel/2009-February/000276.html could indirectly avoid the problem...

Is it a glyph cache pixmap (1024xyyy) or an actual glyph pixmap?


> If the normal exa sync methods are used in the ati driver, the bugs are
> probably not related. 

They are.


BTW, are you getting per-pixmap driver private data via a CreatePixmap hook / exaGetPixmapDriverPrivate or via some other means?
Comment 27 Maarten Maathuis 2009-03-04 09:58:31 UTC
Thomas: give xserver git a try, if it works without spewing errors, then i suggest you bisect over the last series of exa patches.
Comment 28 Thomas Hellström 2009-03-06 06:34:49 UTC
Sorry for being unresponsive. 
I'll hopfully get some time early next week for this.

/Thomas
Comment 29 Matt Turner 2010-12-02 19:15:55 UTC
(In reply to comment #28)
> Sorry for being unresponsive. 
> I'll hopfully get some time early next week for this.

How time flys. :)

Where do we stand on this? Is this still a bug?
Comment 30 Jeremy Huddleston Sequoia 2011-10-11 01:11:36 UTC
No response in over 2 years.  Closing.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.