Bug 31538

Summary: xserver segfaults in glxext.c:DrawableGone() (NULL pointer)
Product: xorg Reporter: Thierry Vignaud <thierry.vignaud>
Component: Server/GeneralAssignee: Xorg Project Team <xorg-team>
Status: CLOSED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: medium CC: mattst88
Version: 7.4 (2008.09)   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
GDB trace
none
the patch fix
none
the actual patch fix none

Description Thierry Vignaud 2010-11-10 19:23:59 UTC
Created attachment 40190 [details]
GDB trace

like bug bug #31537, my girlfriend found out another bug in xserver that triggered a segfault wit firefox on some sites. See attached GDB trace.

The attached patch fixed it.
Even if the crash didn't trigger once we updated to xserver-1.9, I think it still make sense
Comment 1 Thierry Vignaud 2010-11-10 19:24:28 UTC
Created attachment 40191 [details] [review]
the patch fix
Comment 2 Thierry Vignaud 2010-11-10 19:25:51 UTC
Created attachment 40192 [details] [review]
the actual patch fix
Comment 3 Alan Coopersmith 2010-11-19 22:18:38 UTC
Thanks for the patch - if you add a "Signed-off-by" and then 
git send-email it to xorg-devel as described at
http://www.x.org/wiki/Development/Documentation/SubmittingPatches
then the developers will review it & apply it quicker.
Comment 4 Matt Turner 2010-12-03 17:00:16 UTC
Unless I'm an idiot or the code is perl and I didn't know, that patch isn't even valid C. "return if !glxPriv;"?

What did you do to cause it to crash? If you can't trigger the crash in xserver-1.9, it probably means this isn't the real fix and that between whatever version you got the crash on and 1.9, the real fix was made.

As such, I'm going to close as Invalid.

In the future, mail patches to xorg-devel@lists.x.org. Otherwise they just languish in bugzilla for all eternity.
Comment 5 Julien Cristau 2010-12-04 04:25:10 UTC
On Fri, Dec  3, 2010 at 17:00:17 -0800, bugzilla-daemon@freedesktop.org wrote:

> In the future, mail patches to xorg-devel@lists.x.org. Otherwise they just
> languish in bugzilla for all eternity.

he did send the patch to the list.
Comment 6 Thierry Vignaud 2010-12-06 10:40:25 UTC
That's just a very bad joke.
How can you close a bug that has a valid backtrace as INVALID ???
The fact it was very hard to reproduce with xserver-1.7 and the fact it's harder to reproduce with xserver-1.9 doesn't mean it's magically fixed.
Programers that care should fix the code when provided with a valid trace.

Of course I should have compile test the suggested fix and simply move the if () does the trick.

But that doesn't change anything to the fact that you got a valid backtrace. Even if it's a transcient bug hard to reproduce b/c of timing issue, maybe depending on the proper driver and the proper software, doesn't mean the bug doesn't exist.

You have a damn backtrace for god sake!

And now I remember how I hesitated reporting a hard to reproduce bug...
If I should have known...
Comment 7 Matt Turner 2010-12-08 17:41:42 UTC
(In reply to comment #6)

You've provided a patch for a bug you haven't described how to reproduce against an X server from two releases ago. So, from my previous message:

> What did you do to cause it to crash? If you can't trigger the crash in
> xserver-1.9, it probably means this isn't the real fix and that between
> whatever version you got the crash on and 1.9, the real fix was made.

So, I don't think closing as invalid was an unreasonable thing to do, especially when the patch fixing the issue wasn't compile tested (part of the problem being that you called it the _actual_ patch fix...)

Comparing comments on the mailing list and in this bug, it's unclear whether you think it's still applicable to X server 1.9. In comment 6 you say it's harder to reproduce with xserver-1.9 but on the list you say 1.9 doesn't trigger it anymore but you think it's relevant.

http://patchwork.freedesktop.org/patch/2719/
http://patchwork.freedesktop.org/patch/2721/

So again,
   (1) what steps do you take to trigger this crash?
   (2) have you ever triggered the crash in X server versions 1.8 or later?
   (3) if you haven't triggered it in 1.8 or later, why do you think the patch is still relevant?

If it's not happening in 1.8 or 1.9 then, again, that probably means that something up the call chain that had been passing a NULL pointer into DrawableGone has now been fixed, and as such, checking for NULL in DrawableGone may not be necessary.
Comment 8 Matt Turner 2010-12-21 09:13:20 UTC
No response after two weeks?
Comment 9 Thierry Vignaud 2011-01-03 15:19:44 UTC
(In reply to comment #6)

> So again,
>   (1) what steps do you take to trigger this crash?
>   (2) have you ever triggered the crash in X server versions 1.8 or later?
>   (3) if you haven't triggered it in 1.8 or later, why do you think the patch
> is still relevant?

1) my wife was browsing on some heavy image sites which sometimes make xserver segfaulted
this needed quite a loaded firefox or chromium but I've no precise URL to provide
at the time we were using an old P4 with a sis card (using free sis driver)
We no more have this machine

2) no but like I said it was already hard to reproduce with xserver-1.7 and I
don't have anymore the machine that exposed it.
That doesn't mean it cannot happen anymore

3) quantic bugs/Heisenbugs may be hard to reproduce but that doesn't mean they don't exist.
indeed it was easier to reproduce with xserver-1.7 (got several time the same backtrace after several hours of browsing) but that doesn't mean it cannot happen anymore with newer servers

I think put a protection in place is a safe move against such a bug to rehappen

Anyway it's your call if you want to have a more robust code base or not...

> If it's not happening in 1.8 or 1.9 then, again, that probably means that
> something up the call chain that had been passing a NULL pointer into
> DrawableGone has now been fixed, and as such, checking for NULL in 
> DrawableGone may not be necessary.

I would prefer to see some audit of the possible call chains then...
Comment 10 Michel Dänzer 2011-01-04 09:33:38 UTC
Does http://lists.x.org/archives/xorg-devel/2010-December/016969.html instead of your patch fix the problem?
Comment 11 Thierry Vignaud 2011-01-04 12:18:31 UTC
I don't know since I do not have the hardware anymore but I don't think so since:
- the changes are after the deference of glxPriv which was NULL in my backtraces
- in the backtraces of the other bugs, glxPriv wasn't NULL

The refcounting may help though.
That may be the same family of bugs.
Comment 12 Jeremy Huddleston Sequoia 2011-09-20 09:37:41 UTC
The real fix would be to figure out why glxPriv was NULL to begin with.  Since you reported that this doesn't occur when you move up to the 1.9 server, my guess is that the underlying issue (passing a NULL glxPriv) was properly fixed.

Closing as fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.