Bug 31501

Summary: crash accessing font info with xfs in fontpath
Product: xorg Reporter: Jon Turney <jon.turney>
Component: Server/GeneralAssignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: high CC: acelists, andrewbass, jeremyhu, keithp, kibi, Kornel.Benko, shashankthebest, yselkowi
Version: gitKeywords: regression
Hardware: Other   
OS: All   
Whiteboard: 2012BRB_Reviewed
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 40982, 44202    
Attachments:
Description Flags
Possible fix
none
Allow font library to return Suspended more than once per request none

Description Jon Turney 2010-11-09 07:23:48 UTC
Doing some investigation of this reported crash whilst using xserver 1.9.2 with xfs: http://cygwin.com/ml/cygwin-xfree/2010-11/msg00008.html, I'm able to reproduce this crash with the following sequence of actions:

X &
xterm &
xset fp+ tcp/myfontserver:7100
xlsfonts

Program received signal SIGSEGV, Segmentation fault.
[Switching to thread 4712.0x1220]
0x005b2e29 in doListFontsAndAliases (client=0x125fdc0, c=0x120fce8) at dixfonts.c:611
611             fpe = c->fpe_list[c->current.current_fpe];
(gdb) p c
$1 = (LFclosurePtr) 0x120fce8
(gdb) p *c
$2 = {client = 0x120fce0, num_fpes = 18939104, fpe_list = 0x0, names = 0x0, current = {
    pattern = "▒W a\002\000\000\000ed-medium-r-normal--24-230-75-75-c-240-jisx0208.1983-0\va▒\207\va\000\000\000\000▒▒ \0019\017\000\000\b\000\000\000\000\000\000\000|▒%a\000\000\000\000\001\000\000\000\001\000\000\000▒\207\va\220▒\va@▒\va\000\000\000\000\030▒ \001\t\017\000\000\t\000\000\000\024\004\000\000▒▒%a\000\000\000\000\000\000\000\000\001\000\000\000\020\231\va▒\227\va▒\207\va\000\000\000\000H▒ \001▒\016\000\000\v\000\000\000\024\004\000\000▒▒%a\000\000\000\000\000\000\000\000\001\000\000\000"..., patlen = 62, current_fpe = 5, max_names = 3, list_started = 0, private = 0x12df550},
  saved = {
    pattern = "*\000\000\000\001\000\000\000\020\231\va▒\227\va▒\207\va\000\000\000\000▒&\001i\017\000\000\005\000\000\000\024\004\000\000|▒#a\000\000\000\000\000\000\000\000\001\000\000\000\020\231\va▒\227\va▒\207\va\000\000\000\000▒▒ \0019\017\000\000\b\000\000\000\000\000\000\000|▒%a\000\000\000\000\001\000\000\000\001\000\000\000▒\207\va\220▒\va@▒\va\000\000\000\000\030▒ \001\t\017\000\000\t\000\000\000\024\004\000\000▒▒%a\000\000\000\000\000\000\000\000\001\000\000\000\020\231\va▒\227\va▒\207\va\000\000\000\000H▒ \001▒\016\000\000\v\000\000\000\024\004\000\000▒▒%a\000\000\000\000\000\000\000\000\001\000\000\000"..., patlen = 1, current_fpe = 0, max_names = 65535, list_started = 1, private = 0x125fb10},
  haveSaved = 1, savedName = 0x12df338 "-jis-fixed-medium-r-normal--24-230-75-75-c-240-jisx0208.1983-0",
  savedNameLen = 64}
(gdb)

On further investigation, the reason for the closure c being corrupt seems to be related to the changes added in commit 3ab6cd31 to fix bug #3040.

In doListFontsAndAliases(), if we get a Suspended result when the client is already sleeping with the same closure (I don't really understand enough what the code is doing to know if that's expected or not), then the xinerama_sleep code frees() the closure c, so when it next wakes, the closure c is being used after being freed.

Being a use after free bug possibly explains why the original reporter is able to avoid the crash by changing the sequence of actions slightly.

The crash is not observed after reverting the noted commit, or with xserver 1.8.2
Comment 1 Jon Turney 2010-12-09 14:52:05 UTC
See also https://bugzilla.redhat.com/show_bug.cgi?id=658587
Comment 2 James Jones 2010-12-09 16:27:42 UTC
Created attachment 40973 [details] [review]
Possible fix

Haven't tested this thoroughly, as I'm not convinced early-out is really the right way to handle Xinerama here, but I think this should fix the crashes at least.  Feel free to try it out.
Comment 3 Colin Harrison 2010-12-15 04:44:50 UTC
Hi,

It's not pretty, but James' patch fixes the crash for me.

Thanks,
Colin Harrison
Comment 4 Julien Cristau 2011-03-07 04:12:14 UTC
*** Bug 31675 has been marked as a duplicate of this bug. ***
Comment 5 Jeremy Huddleston Sequoia 2011-04-11 14:25:14 UTC
James, have you sent this to xorg-devel for review / comment?  I didn't see it in a quick check of emails from you in December.
Comment 6 Julien Cristau 2011-04-27 01:09:38 UTC
*** Bug 36060 has been marked as a duplicate of this bug. ***
Comment 7 aceman 2011-04-30 05:03:34 UTC
*** Bug 34007 has been marked as a duplicate of this bug. ***
Comment 8 Cyril Brulebois 2011-06-03 09:26:27 UTC
Here's another ping for this bad crasher. :)
Comment 9 Jeremy Huddleston Sequoia 2011-07-29 11:23:36 UTC
I've sent it to xorg-devel.  Please review and comment in that thread
Comment 10 James Jones 2011-07-29 13:13:18 UTC
Discussion regarding why my patch ("Possible fix"), and a proposed patch from Adam Jackson, aren't functionally correct from a long time ago:

< jamesjones> ajax: Did that patch I attached to the font loading crash bug look at all correct?
< ajax> oh man, i seriously just finished writing a patch for that
< ajax> if you beat me to it i'm going to feel very lame
< jamesjones> sorry
< jamesjones> I'm not at all sure it's correct.
< ajax> oh wow, i see what you did.
< jamesjones> It seems like the whole idea of bailing out if you need to sleep ruins the point of Xinerama looping
< ajax> yeah, i don't think that's right
< ajax> one second...
< jamesjones> but I was too lazy to dig deeper since that patch was enough to unblock my testing.
< ajax> jamesjones: give this one a try? http://fpaste.org/BqJf/raw/
< ajax> i'm a bit ill at how that whole thing works, honestly.
< jamesjones> Give me a second, rebuilding
< jamesjones> But yeah, it doesn't make sense to me.
< ajax> at any rate it fixes my testcase of xset fp+ unix/:7100 && xlsfonts
< jturney> heh, it removes code, so it must be right :)
< jamesjones> Hehe
< ajax> nothing like having a library where your app has to implement functions with particular names and signatures to work
< jamesjones> but why the looping, if it's OK to return Success when the operation isn't done on screen[n] where n>0?
< ajax> jamesjones: it's not quite that.
< ajax> most of the font ops don't have any screen state, so you only have to call them once.
< ajax> PolyText and ImageTexto sleep do have to hit every screen, so you want to make sure that they only bump the client's sleep count once if they d
< ajax> so the minimal thing would have been to only change the closures for {Poly,Image}Text to not keep a sleep count
< ajax> but if you're doing that you may as well do them all
< jamesjones> right, I figured it was something like that, but then 1) Why loop the others 2) Why is it OK to early-out in PolyText/ImageText?  Does the looping get restarted somewhere I'm not seeing when the client wakes up?
< jamesjones> I suppose I should just load up xinerama and step through it
< ajax> the others aren't actually looped by xinerama
< ajax> (which i had missed the first time around, i just assumed they were)
< jamesjones> ah, k
< jamesjones> that makes more sense.
< ajax> the text rendering routines push themselves onto the work queue if they don't have a font they need because the font server hasn't gotten back to them
< ajax> check out the context around ClientSleep in doImageText, it's quite horrific
< ajax> the way they wake up is that the font server's fd selects readable, and the handler for that notices which request tat reply was for and pokes the client back into the work queue
< ajax> i think that's right anyway?  i kind of want to not know.
< jamesjones> it sounds right if there's only one screen
< ajax> well, if there's multiple screens, you might run through the sleep pattern multiple times
< jamesjones> Yeah, but it looks like each screen would loop in and block, only sleeping once
< jamesjones> then when it woke up, only the first screen would be serviced.
< jamesjones> *loop in and detect the need to sleep rather
< ajax> ew.  yeah.
< ajax> okay, so
< jamesjones> Also, it seems like with your patch, those other screens would leak their closure structs
< ajax> i think the way to fix that is to fix PanoramiX*Text* check for sleep too
< jamesjones> which is what I was attempting to avoid in mine.
< jamesjones> Yeah, that sounds like it has a better chance at correctness.
< jamesjones> Or I was thinking something like looping at a lower level, so there would be xinerama data in teh closure, but that sounded like it would violate all kinds of design stuff
< jamesjones> Your patch does avoid the crash I was seeing, FWIW
< ajax> progress
< ajax> thanks for testing.  i'll fix the xinerama thing and send the patches off to the list
< jamesjones> np, sounds good
< ajax> every once in a while i dream about working in a language where closures aren't something you have to hand roll every time
< jamesjones> If there weren't a bunch entrenched old languages I happen to know, I'd have to do something creative with my time.  Couldn't have that.
< ajax> aaaaaargh
< ajax> if you change PanoramiX*Text* to be restartable then there's a race between each screen where some other client on the same drawable or gc could modify your state
< ajax> so there goes the atomicity invariant
Comment 11 Julien Cristau 2012-06-14 12:39:54 UTC
*** Bug 48900 has been marked as a duplicate of this bug. ***
Comment 12 Andy Valencia 2012-08-15 16:26:06 UTC
Please note that those of us who live with Ubuntu's LTS world will be arriving at 12.04 over the coming months.  This bug will jump right out at you if you run an environment with many stations and non-standard collections of fonts.  It sure blew *me* right out of the water...
Comment 13 Jon Turney 2013-12-30 21:45:50 UTC
See also http://patchwork.freedesktop.org/patch/14507/
Comment 14 Keith Packard 2015-09-11 05:18:47 UTC
Created attachment 118203 [details] [review]
Allow font library to return Suspended more than once per request

fonts: Continue when font calls return Suspended more than once
    
    Patch 3ab6cd31cbdf8095b2948034fce5fb645422d8da fixed Xinerama
    interactions with font servers by not putting clients to sleep
    multiple times. However, it introduced additional changes dealing with
    libXfont routine returning Suspended more than once for the same
    request. This additional change was to abandon processing of the
    current request and free the closure data by jumping to
    'xinerama_sleep' in each of the functions.
    
    Font library functions shouldn't return Suspended more than once,
    except for ListFontsWithInfo, which produces multiple replies, and
    thus ends up returning Suspended many times during processing.
    
    With the jump to xinerama_sleep occurring after the first reply was
    processed, the closure for the request was freed and future calls into
    the ListFontsWithInfo callback resulted in dereferencing freed
    memory.
    
    This patch removes the added branches, reverting the code to its
    previous behaviour, which permitted multiple Suspended returns and
    simply waited for the client to be signaled again so that the callback
    could continue processing the request.
    
    Signed-off-by: Keith Packard <keithp@keithp.com>
    Cc: Adam Jackson <ajax@redhat.com>
Comment 15 Jon Turney 2015-09-14 13:05:44 UTC
Thanks for taking a look at this.

This appears to fix my testcase above, so Tested-by:

I don't know if this is likely to regress bug #3040, or how to test that.
Comment 16 Jeremy Huddleston Sequoia 2016-05-30 08:04:28 UTC
That patch was committed:

commit f9a04d19aef77bf787b8d322305a6971d24a6ba1
Author: Keith Packard <keithp@keithp.com>
Date:   Mon Sep 21 07:16:12 2015 +0100

As such, closing this ticket.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.