Summary: | _cairo_image_surface_composite_trapezoids attempts to calloc huge amount of memory | ||
---|---|---|---|
Product: | cairo | Reporter: | Padraig O'Briain <padraig.obriain> |
Component: | general | Assignee: | M Joonas Pihlaja <jpihlaja> |
Status: | RESOLVED FIXED | QA Contact: | cairo-bugs mailing list <cairo-bugs> |
Severity: | normal | ||
Priority: | high | ||
Version: | 1.2.4 | ||
Hardware: | SPARC | ||
OS: | Solaris | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Some debugging functions to dump information about a stroke.
possibly a minimal test case? Extra logging inside the stroker small test case to trigger the bug debug log for x86 debug log for sparc Patch to fix pen, miter join, and stroke face computation. Exercise confusion between dashing and mitering. |
Description
Padraig O'Briain
2006-12-07 05:20:19 UTC
Hi, Those extents look like something that the tessellator would give when it clamps the input polygon's edges to fit within 31 bits. Can you easily check what the coordinates are that are passed to _cairo_bentley_ottmann_tessellate_polygon()? The bounding box over all edges of the polygon should have a width and height < 2^31, or else it will start clamping values. Thanks, Joonas I cannot find _cairo_bentley_ottmann_tessellate_polygon(). Do you mean _cairo_traps_tessellate_polygon? Ah, sorry, I didn't even look at the version of cairo this was reported against. I've had my head inside one file of the 1.3 series for a while, and very recently made a change that might produce this kind of situation. Fearing the worst, I assumed the change had bitten you already. :} Ok, so about your problem. The trapezoids themselves are generated by the stroker: _cairo_surface_fallback_stroke() calls _cairo_path_fixed_stroke_to_traps() just before the call to _clip_and_composite_trapezoids(); and the extent of the trapezoids is computed from each trapezoid in cairo_traps_add_trap() as it is added by the stroker. I wonder if the 0x7FFFFFFF is telling us that some call to _cairo_fixed_from_double() is being given a wonderfully large value which is subsequently clamped by the double -> int cast in that function. There are quite a number of invocations of that function under the stroker code, though. I suppose the thing that most would help is to know the parameters to _cairo_path_stroke_to_traps(). (If you want, I could make a patch that would let you dump the info in gdb, so you don't have to chase pointers around manually.) (If you want, I could make a patch that would let you dump the info in gdb, so you don't have to chase pointers around manually.) I would really appreciate that. Created attachment 8016 [details] Some debugging functions to dump information about a stroke. Here's the patch to add some debug dumping functions to cairo-surface-fallback.c. I tested this by grabbing http://cairographics.org/releases/cairo-1.2.4.tar.gz and doing this: $ tar xvzf cairo-1.2.4.tar.gz [...] $ cd cairo-1.2.4 $ patch -p1 </tmp/dump-debug.patch patching file src/cairo-surface-fallback.c $ CFLAGS='-O0 -g' ./configure [...] $ make [...] $ LD_LIBRARY_PATH=`pwd`/src/.libs/ $ export LD_LIBRARY_PATH $ gdb some-test-program [...] (gdb) run [program args] [...interrupt the program...] Program received signal SIGINT, Interrupt. 0xb7f8b1ac in fbRasterizeEdges8 (buf=0xb7615008, width=766, stride=192, l=0xbfa9ad10, r=0xbfa9ace8, t=86155809, b=92250657) at fbedge.c:154 154 ap[lxi] = clip255 (ap[lxi] + rxs - lxs); (gdb) break _cairo_surface_fallback_stroke Breakpoint 1 at 0xb7f568d4: file cairo-surface-fallback.c, line 990. (gdb) continue Continuing. Breakpoint 1, _cairo_surface_fallback_stroke (surface=0x80e50d0, op=CAIRO_OPERATOR_OVER, source=0xbfa9b1d4, path=0x80e51c0, stroke_style=0x80e5200, ctm=0xbfa9b180, ctm_inverse=0xbfa9b150, tolerance=0.10000000000000001, antialias=CAIRO_ANTIALIAS_DEFAULT) at cairo-surface-fallback.c:990 990 _cairo_traps_init (&traps); (gdb) p dump_debug_stroke ("stroke", surface, op, source, path, stroke_style, ctm, ctm_inverse, tolerance, antialias) stroke= op=2 ]...] (gdb) Thanks for the patch. Sorry for the delay in responding. I was experiencing the delights of Heathrow Airport yesterday. I made some slight changes to the patch. I dump the output the output to a file instead of stdout as gnome-theme-manager to /dev/null. I changed if (1) in _cairo_surface_fallback_stroke to if (1). The output from dump_debug_stroke just before I see (0x7fffffff, 0x7fffffff) in the trap_extents box is below. Is there enough information there to indicate what is going wrong. stroke= op=2 source= type=0 ref_count=1 status=0 matrix= xx=1 {3ff0000000000000} yx=0 {0000000000000000} xy=0 {0000000000000000} yy=1 {3ff0000000000000} x0=0 {0000000000000000} y0=0 {0000000000000000} filter=2 extend=3 stroke_style= line_width=2 {4000000000000000} line_cap=0 line_join=0 miter_limit=10 {4024000000000000} dash=[2] 8 {4020000000000000} 2 {4000000000000000} dash_offset=9 {4022000000000000} ctm= xx=1 {3ff0000000000000} yx=0 {0000000000000000} xy=0 {0000000000000000} yy=1 {3ff0000000000000} x0=0 {0000000000000000} y0=0 {0000000000000000} ctm_inverse= xx=1 {3ff0000000000000} yx=0 {0000000000000000} xy=0 {0000000000000000} yy=1 {3ff0000000000000} x0=0 {0000000000000000} y0=0 {0000000000000000} tolerance=0.1 {3fb999999999999a} antialias=0 path= has_current_point=1 last_move_point= x=1 (0x10000) y=1 (0x10000) current_point= x=1 (0x10000) y=1 (0x10000) ops= moveto= x=1 (0x10000) y=1 (0x10000) lineto= x=393 (0x1890000) y=393 (0x1890000) lineto= x=393 (0x1890000) y=393 (0x1890000) lineto= x=1 (0x10000) y=1 (0x10000) closepath= moveto= x=1 (0x10000) y=1 (0x10000) Created attachment 8093 [details]
possibly a minimal test case?
Thanks for the debugging output. I finally got a good cairo build on a
headless sparc solaris 9 box and could try the test. Could you see if the
attached program produces the same bad trapezoids which are causing you
trouble? What I'm seeing here is: 338 produced trapezoids with
extents=(0x4afb,0x4afb)-(0x189b505,0x189b505).
I've tried to get this bug to reproduce using the stroke data, but just can't
trigger it. What compiler do you use to build cairo, btw? I had a fair bit of
trouble using gcc 4.0.3 from the SUNW0gccfss package, but the sun studio 11
compiler produced a fine library that passed all[1] of the test suite.
[1] disregarding the ones that require a font backend, but those were my fault
for having a badly configured fontconfig.
I tried your test program and I am seeing the same extents you are seeing. I cannot reproduce this bug on my desktop system but one of our QA guys has two systems on which the problem occurs. We are building with Sun Studio 11 and are running on tghe latest Solaris Nevada build, currently that it 54. I will try to work backwards to see if I can figure out why the bad extents occur. Any guidance would be much appreciated. When _clip_and_composite_trapezoids is called and generates the bad extents, op is 2 and traps->num_traps is 101. Is this value unexpected? (In reply to comment #8) > I cannot reproduce this bug on my desktop system but one of our QA guys has two > systems on which the problem occurs. Sweet. That's for the standalone test case I hope? To get a handle on where the bad traps are generated, you could add the following assertions in cairo-traps.c:_cairo_traps_add_trap(): if (traps->num_traps >= traps->traps_size) { int inc = traps->traps_size ? traps->traps_size : 32; status = _cairo_traps_grow_by (traps, inc); if (status) return status; } + assert (top < 0x2FFFFFFF); + assert (bottom < 0x2FFFFFFF); + assert (left->p1.x < 0x2FFFFFFF); + assert (left->p1.y < 0x2FFFFFFF); + assert (left->p2.x < 0x2FFFFFFF); + assert (left->p2.y < 0x2FFFFFFF); + assert (right->p1.x < 0x2FFFFFFF); + assert (right->p1.y < 0x2FFFFFFF); + assert (right->p2.x < 0x2FFFFFFF); + assert (right->p2.y < 0x2FFFFFFF); trap = &traps->traps[traps->num_traps]; trap->top = top; trap->bottom = bottom; (Some of those asserts are redundant, but it can't hurt.) In the mean time, I'll get cozy with cairo-path-fixed-stroke.c. sprinkle it with lots of debug output, and later send a patch along with the expected output. Before calling _clip_and_composite_trapezoids() in _cairo_surface_fallback_stroke() the traps struct should have 338 in traps.num_traps. The test program does not fail for me on any machine. I see that you have just added some comments. I have not have a chance to study them yet. I have tracked it as far as _cairo_stroker_join and case CAIRO_LINE_JOIN_MITER. x1, y1, dx1, dy1 are all zero which causes mx and my to be enormous. I put in the assert calls and they sem to confirm my previous comment. Stack trace of crash is below core 'core' of 9838: gnome-theme-manager ff33f730 _lwp_kill (6, 0, 5, 6, ffffffff, 6) + 8 ff2cfb48 abort (ffbfd530, 1, 6, ff38d5c0, babac, 0) + 108 ff2cfd8c _assert (febe1d8c, febe1d38, 95, 14000, ba8c0, ff38a5f0) + 60 febaabd0 _cairo_traps_add_trap (14000, ff0000, fffb25, 399fd0, 399fe8, ffbfddf4) + 29c febabd0c _cairo_traps_tessellate_polygon (ffbfddf4, 399fe8, 0, 1, 399fd0, febf5d68) + 2f8 feba0f2c _cairo_stroker_join (ffbfdca0, ffbfdcf0, 14400, fffebb80, 388da0, fffebb58) + 5f4 feba25e0 _cairo_stroker_close_path (ffbfdca0, 0, 14400, 18a0000, 1570000, febf5d68) + 160 feb9f814 _cairo_path_fixed_interpret (6, 4, 388ee0, feba19d0, feba230c, feba2480) + 1e4 feba285c _cairo_path_fixed_stroke_to_traps (385b68, fffabc68, fffac5a4, 54000, 54400, fffac718) + 14c feba9f50 _cairo_surface_fallback_stroke (3582a0, 2, ffbfdeb0, 385b68, 388da0, ffbfdf50) + 88 feba73e4 _cairo_surface_stroke (3582a0, 0, ffbfdf20, 385b68, 388da0, 388e40) + 140 feb9b020 _cairo_gstate_stroke (0, 0, ff0000, 5addc, 388e70, 388d88) + b8 feb96138 cairo_stroke_preserve (385b60, c0788000, 0, 0, 0, 385b60) + 18 feb96110 cairo_stroke (385b60, 0, 0, 406fe000, 0, 385b60) + 4 ff08c714 gtk_default_draw_focus (374c50, 59, 2, 188, fff6c8f4, 18a) + 3fc Created attachment 8162 [details]
Extra logging inside the stroker
Sorry about going quiet for so long. I've been studying the stroker
code in detail and preparing the mother of all logging patches. :) So
far I've found a number of interesting cases in the stroker code and
elsewhere that can lead to all sorts of havoc when given atypical
input[1], and especially, I've managed to reproduce the symptoms of
this bug thanks to your detective work with the miter joins!
Now, the way the symptoms of the bug are provoked is by setting the
transformation matrix to one with very large coefficients *after* path
construction using cairo_move_to() and friends, but *before* the
actual stroking occurs. The goal here is to make the inverse
computation produce a near zero matrix and thus cause numeric
underflow when computing directions in user space in _compute_face().
When this happens _compute_face() fails[2] and leaves the face set to
random garbage on the stack (what I'm seeing in the test case is 0.0
as in your case.) The garbage usr_vector then messes up the miter
length test in _cairo_stroker_add_join ().
Apart from the fact that the user vector is potentially random garbage
off the stack, one reason it is likely behaving differently on the x86
is that the result of converting out of bounds float values all map to
the same number 0x80000000. I was shocked this was happening and had
assumed it would have worked like the sparc does -- mapping out of
bounds positive values of 0x7FFFFFFF and negative ones to 0x80000000.
Another cause is likely the difference in memory management by the OS
and malloc(), where malloc() starts returning NULL at different
points[3].
So far, the above is merely speculation as to what is really happening
inside gnome-theme-manager, but I'm hoping it is an accurate guess, To
find out, the attached patch adds reams of logging from inside the
stroker and should let us catch the bug in action. It should be
applied to a clean cairo-1.2.4 without the previous patch, since the
entire logging thing has been redone. Now the logs go to
/tmp/debug-cairo.<pid>.log, and are continuously flushed to make sure
there's nothing in stdio buffers left when it crashes or asserts, so
it's much slower than before. Also, more verbose logs are produced.
I'm fairly optimistic that with this new logging we're finally going
to get to the bottom of this one. From my point of view, the best
case is that gnome-theme-manager itself is setting an invalid
transformation matrix before stroking. ;) The worst case involves
tracking down unknown memory corruptions inside cairo. In any case,
I'd really appreciate it if you could run the gnome-theme-manager test
case again and make the complete logs available somewhere.
For now, I'll prepare an immediate fix to the miter and pen code and
post it here. This will hopefully land into the next release as well.
As for the bigger problem of making sure that a sudden influx of
garbage from float-to-int casts doesn't cause madness in cairo is
something we're going to have to look at more closely for future
releases of cairo.
Joonas
[1] It starts breaking down when given either near degenerate inputs,
or huge inputs, and a couple from really tiny inputs.
[2] The output from the attached logging patch contains the message
"WARNING=_compute_face() bailed: magnitudeless direction vector?"
when this happens.
[3] The bug can likely be provoked in a way that leads to really large
allocations and lots of swapping on a x86 as well.
(In reply to comment #12) > Sorry about going quiet for so long. I've been studying the stroker > code in detail and preparing the mother of all logging patches. :) So > far I've found a number of interesting cases in the stroker code and > elsewhere that can lead to all sorts of havoc when given atypical > input[1], and especially, I've managed to reproduce the symptoms of > this bug thanks to your detective work with the miter joins! Yes, thank you! And thank you, Joonas! You've done some excellent detective work here as well. The writeup you give here is fascinating and I would love for a wider audience to read it. Could you re-send it to the cairo mailing list please? -Carl Created attachment 8163 [details]
small test case to trigger the bug
A small test case to trigger the symptoms of the bug. It should stroke a small
one pixel by one pixel rectangle around (0,0) with a line width of one pixel,
but fails due to numerical underflow in the stroker and matrix inversion
routines. In the produced log, watch out for entries like this:
_compute_face()=
slope=
x=1 (0x10000)
y=0 (0x0)
device space=
line_dx=1
line_dy=0
user space=
line_dx=0
line_dy=0
mag=0
WARNING=_compute_face() bailed: magnitudeless direction vector?
leading later to failure of the miter code:
case CAIRO_LINE_JOIN_MITER:=
-in.out=0
test qty (must be >= 2)=100
miter limit=10
accept miter limit test=true
user space=
x1=-217
y1=-203.358
dx1=5.28219e-315
dy1=-2.35344e-185
device space=
dx1=5.28219e-65
dy1=-2.35344e+65
user space=
x2=0
y2=-26214.4
dx2=2.11536e-314
dy2=0
device space=
dx2=2.11536e-64
dy2=0
my=-26214.4
mx=-217
Created attachment 8164 [details] debug log for x86 For future reference, here's the debug log of the test case in attachment 8163 [details] when run on x86. Created attachment 8165 [details] debug log for sparc Debug log when running attachment 8163 [details] on a sparc v9, 32 bit code. The most interesting thing compared to x86, not directly related to this bug, is that the pen's vertex computation fails silently and produces a bad number of vertices: pen= radius=5e-251 tolerance=0.1 vertices=[-2147483648] BTW, in relation to the debug logs, I forgot to mention that the 0xabababab values in the log are from uninitialised cairo_stroker_t values that I've set to that value. Thanks for the update. I have taken off for Christmas so it will be the new year before I can get back to this. Created attachment 8174 [details] [review] Patch to fix pen, miter join, and stroke face computation. Here's a patch that should cover up the failing miter computations in the stroker. It also fixes the needed_vertices computation in cairo-pen.c. I'll be taking off too and be back next year. Merry Christmas! Joonas I applied the patch from comment #18 and it seems to have fixed my problem/ Do you want me to apply the exytra logging patch from comment #12 instead to see what is going on? If you're happy with the patch of comment #18 and think that chasing this bug further isn't worth the effort, we can leave things as they stand. However, the patch only covers up the symptoms of the bug, but sheds no light on its root cause. I'm definitely interested in the results of the extra logging patch, so if it's not too much trouble I'd really appreciate it if you could make the complete log available up to the point where the error occurs. If nothing else, the debug log should tell us if there's memory trashing inside cairo, or if cairo is being passed bad matrices by gnome-theme-manager. I applied the extra logging and I now get a crash. Assertion failed: left->p2.x < 0x2FFFFFFF, file cairo-traps.c, line 149 I have a log file which is 25Mb. When I bzip it is it still 600K. Is it OK to attach to the bug? Thanks! Looks like the assertions from comment #9 caught it. Now let's hope the log is properly flushed :) If bugzilla lets you upload the compressed log here, that would be best. If not, you could email it directly to me at jpihlaja@cc.helsinki.fi. Joonas I received the debug log in my email. It was indeed very helpful and turns out that no mysterious matrices are involved anywhere at all. Instead, it's a combination of two bugs in the stroker. Firstly, the miter point solver in the joining code doesn't deal with degenerate miters very well. I realise you pointed as much out already, but I was too stubborn to listen, and since I got it into my head that it must have been the matrices, I totally missed the solver. Secondly, there actually is code in the join computation that is supposed to check for degenerate miters, so the problem with the solver just shouldn't be happening at all in this case, because no join should be happening at all. I think the solution might just be to make the check for when to apply joins stricter, but I'm not 100% sure yet. In any case, what's happening is that the stroker->first_face is not at the start of the path when the path's dash offset makes it start on an OFF dash segment, yet it's trying to apply a join when closing the path. Is the patch in comment #18 the correct fix? Does this also need to be applied to 1.3.x? Hi, Please accept my apologies for going AWOL on this bug -- I totally forgot about it since the patches and thought it was closed, but really had just dropped the ball on it. Happily, the bug itself has been fixed in cairo mean while. I'm attaching the test case extracted from the debug log mentioned in comment #23, where there's a dashed rectangular path that causes cairo 1.2.4 to go insane in its miter computation. The basic problem was that in your problem case cairo was trying to make a miter between a zero length subpath and a new subpath, but wasn't very good at dealing with the degeneracy. The zero length subpath comes about because the dash pattern has a new subpath start at just the same time as it's supposed to be crossing a corner. In the mean time, cairo has learnt to compute the outer point of a miter more robustly so that this case doesn't provoke the NaNs in _cairo_stroker_join() that it used to and which were causing the large trapezoids. So I'm closing this bug as fixed (no more NaNs) and opening a new bug to deal specifically with the artifacts from the degenerate miter. Joonas Created attachment 19489 [details]
Exercise confusion between dashing and mitering.
The lower left corner of the dashed stroke shows a miter where it's not supposed to be.
|
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.