Yjis problem has been seen on one Solaris sparc box only. Run gnome-theme-manager and moving from HighContrastLargePrintInverse theme to LargePrint theme the program hangs for a couple of minutes. This seems to coincide with calloc call in _cairo_image_surface_composite_trapezoids. The amount of memory being allocated is 1066237952 bytes; that is 0x8000 * 0x7f34. Stack trace is below. I have worked back to _clip_and_composite_trapezoids. There, the following code is executed: cairo_box_t trap_extents; _cairo_traps_extents (traps, &trap_extents); _cairo_box_round_to_rectangle (&trap_extents, &extents); and the trap_extents box is (0, 0xcc0000) (0x7fffffff, 0x7fffffff) Are these valid values? If not, where are these values generated? libc.so.1`malloc libc.so.1`calloc+0x58 libcairo.so.2.9.2`_cairo_image_surface_composite_trapezoids+0x170 libcairo.so.2.9.2`_cairo_surface_composite_trapezoids+0xb8 libcairo.so.2.9.2`_cairo_surface_fallback_composite_trapezoids+0x150 libcairo.so.2.9.2`_cairo_surface_composite_trapezoids+0xfc libcairo.so.2.9.2`_composite_traps_draw_func+0x8c libcairo.so.2.9.2`_clip_and_composite+0x134 libcairo.so.2.9.2`_clip_and_composite_trapezoids+0x3b4 libcairo.so.2.9.2`_cairo_surface_fallback_stroke+0x5c libcairo.so.2.9.2`_cairo_surface_stroke+0x140 libcairo.so.2.9.2`_cairo_gstate_stroke+0xb8 libcairo.so.2.9.2`cairo_stroke_preserve+0x18 libcairo.so.2.9.2`cairo_stroke+0x4 libgtk-x11-2.0.so.0.1000.6`gtk_default_draw_focus+0x3fc libgtk-x11-2.0.so.0.1000.6`gtk_paint_focus+0x160 libgtk-x11-2.0.so.0.1000.6`gtk_tree_view_bin_expose+0x18ac libgtk-x11-2.0.so.0.1000.6`gtk_tree_view_expose+0x94 libgtk-x11-2.0.so.0.1000.6`_gtk_marshal_BOOLEAN__BOXED+0xf8 libgobject-2.0.so.0.1200.4`g_closure_invoke+0x1a8
Hi, Those extents look like something that the tessellator would give when it clamps the input polygon's edges to fit within 31 bits. Can you easily check what the coordinates are that are passed to _cairo_bentley_ottmann_tessellate_polygon()? The bounding box over all edges of the polygon should have a width and height < 2^31, or else it will start clamping values. Thanks, Joonas
I cannot find _cairo_bentley_ottmann_tessellate_polygon(). Do you mean _cairo_traps_tessellate_polygon?
Ah, sorry, I didn't even look at the version of cairo this was reported against. I've had my head inside one file of the 1.3 series for a while, and very recently made a change that might produce this kind of situation. Fearing the worst, I assumed the change had bitten you already. :} Ok, so about your problem. The trapezoids themselves are generated by the stroker: _cairo_surface_fallback_stroke() calls _cairo_path_fixed_stroke_to_traps() just before the call to _clip_and_composite_trapezoids(); and the extent of the trapezoids is computed from each trapezoid in cairo_traps_add_trap() as it is added by the stroker. I wonder if the 0x7FFFFFFF is telling us that some call to _cairo_fixed_from_double() is being given a wonderfully large value which is subsequently clamped by the double -> int cast in that function. There are quite a number of invocations of that function under the stroker code, though. I suppose the thing that most would help is to know the parameters to _cairo_path_stroke_to_traps(). (If you want, I could make a patch that would let you dump the info in gdb, so you don't have to chase pointers around manually.)
(If you want, I could make a patch that would let you dump the info in gdb, so you don't have to chase pointers around manually.) I would really appreciate that.
Created attachment 8016 [details] Some debugging functions to dump information about a stroke. Here's the patch to add some debug dumping functions to cairo-surface-fallback.c. I tested this by grabbing http://cairographics.org/releases/cairo-1.2.4.tar.gz and doing this: $ tar xvzf cairo-1.2.4.tar.gz [...] $ cd cairo-1.2.4 $ patch -p1 </tmp/dump-debug.patch patching file src/cairo-surface-fallback.c $ CFLAGS='-O0 -g' ./configure [...] $ make [...] $ LD_LIBRARY_PATH=`pwd`/src/.libs/ $ export LD_LIBRARY_PATH $ gdb some-test-program [...] (gdb) run [program args] [...interrupt the program...] Program received signal SIGINT, Interrupt. 0xb7f8b1ac in fbRasterizeEdges8 (buf=0xb7615008, width=766, stride=192, l=0xbfa9ad10, r=0xbfa9ace8, t=86155809, b=92250657) at fbedge.c:154 154 ap[lxi] = clip255 (ap[lxi] + rxs - lxs); (gdb) break _cairo_surface_fallback_stroke Breakpoint 1 at 0xb7f568d4: file cairo-surface-fallback.c, line 990. (gdb) continue Continuing. Breakpoint 1, _cairo_surface_fallback_stroke (surface=0x80e50d0, op=CAIRO_OPERATOR_OVER, source=0xbfa9b1d4, path=0x80e51c0, stroke_style=0x80e5200, ctm=0xbfa9b180, ctm_inverse=0xbfa9b150, tolerance=0.10000000000000001, antialias=CAIRO_ANTIALIAS_DEFAULT) at cairo-surface-fallback.c:990 990 _cairo_traps_init (&traps); (gdb) p dump_debug_stroke ("stroke", surface, op, source, path, stroke_style, ctm, ctm_inverse, tolerance, antialias) stroke= op=2 ]...] (gdb)
Thanks for the patch. Sorry for the delay in responding. I was experiencing the delights of Heathrow Airport yesterday. I made some slight changes to the patch. I dump the output the output to a file instead of stdout as gnome-theme-manager to /dev/null. I changed if (1) in _cairo_surface_fallback_stroke to if (1). The output from dump_debug_stroke just before I see (0x7fffffff, 0x7fffffff) in the trap_extents box is below. Is there enough information there to indicate what is going wrong. stroke= op=2 source= type=0 ref_count=1 status=0 matrix= xx=1 {3ff0000000000000} yx=0 {0000000000000000} xy=0 {0000000000000000} yy=1 {3ff0000000000000} x0=0 {0000000000000000} y0=0 {0000000000000000} filter=2 extend=3 stroke_style= line_width=2 {4000000000000000} line_cap=0 line_join=0 miter_limit=10 {4024000000000000} dash=[2] 8 {4020000000000000} 2 {4000000000000000} dash_offset=9 {4022000000000000} ctm= xx=1 {3ff0000000000000} yx=0 {0000000000000000} xy=0 {0000000000000000} yy=1 {3ff0000000000000} x0=0 {0000000000000000} y0=0 {0000000000000000} ctm_inverse= xx=1 {3ff0000000000000} yx=0 {0000000000000000} xy=0 {0000000000000000} yy=1 {3ff0000000000000} x0=0 {0000000000000000} y0=0 {0000000000000000} tolerance=0.1 {3fb999999999999a} antialias=0 path= has_current_point=1 last_move_point= x=1 (0x10000) y=1 (0x10000) current_point= x=1 (0x10000) y=1 (0x10000) ops= moveto= x=1 (0x10000) y=1 (0x10000) lineto= x=393 (0x1890000) y=393 (0x1890000) lineto= x=393 (0x1890000) y=393 (0x1890000) lineto= x=1 (0x10000) y=1 (0x10000) closepath= moveto= x=1 (0x10000) y=1 (0x10000)
Created attachment 8093 [details] possibly a minimal test case? Thanks for the debugging output. I finally got a good cairo build on a headless sparc solaris 9 box and could try the test. Could you see if the attached program produces the same bad trapezoids which are causing you trouble? What I'm seeing here is: 338 produced trapezoids with extents=(0x4afb,0x4afb)-(0x189b505,0x189b505). I've tried to get this bug to reproduce using the stroke data, but just can't trigger it. What compiler do you use to build cairo, btw? I had a fair bit of trouble using gcc 4.0.3 from the SUNW0gccfss package, but the sun studio 11 compiler produced a fine library that passed all[1] of the test suite. [1] disregarding the ones that require a font backend, but those were my fault for having a badly configured fontconfig.
I tried your test program and I am seeing the same extents you are seeing. I cannot reproduce this bug on my desktop system but one of our QA guys has two systems on which the problem occurs. We are building with Sun Studio 11 and are running on tghe latest Solaris Nevada build, currently that it 54. I will try to work backwards to see if I can figure out why the bad extents occur. Any guidance would be much appreciated. When _clip_and_composite_trapezoids is called and generates the bad extents, op is 2 and traps->num_traps is 101. Is this value unexpected?
(In reply to comment #8) > I cannot reproduce this bug on my desktop system but one of our QA guys has two > systems on which the problem occurs. Sweet. That's for the standalone test case I hope? To get a handle on where the bad traps are generated, you could add the following assertions in cairo-traps.c:_cairo_traps_add_trap(): if (traps->num_traps >= traps->traps_size) { int inc = traps->traps_size ? traps->traps_size : 32; status = _cairo_traps_grow_by (traps, inc); if (status) return status; } + assert (top < 0x2FFFFFFF); + assert (bottom < 0x2FFFFFFF); + assert (left->p1.x < 0x2FFFFFFF); + assert (left->p1.y < 0x2FFFFFFF); + assert (left->p2.x < 0x2FFFFFFF); + assert (left->p2.y < 0x2FFFFFFF); + assert (right->p1.x < 0x2FFFFFFF); + assert (right->p1.y < 0x2FFFFFFF); + assert (right->p2.x < 0x2FFFFFFF); + assert (right->p2.y < 0x2FFFFFFF); trap = &traps->traps[traps->num_traps]; trap->top = top; trap->bottom = bottom; (Some of those asserts are redundant, but it can't hurt.) In the mean time, I'll get cozy with cairo-path-fixed-stroke.c. sprinkle it with lots of debug output, and later send a patch along with the expected output. Before calling _clip_and_composite_trapezoids() in _cairo_surface_fallback_stroke() the traps struct should have 338 in traps.num_traps.
The test program does not fail for me on any machine. I see that you have just added some comments. I have not have a chance to study them yet. I have tracked it as far as _cairo_stroker_join and case CAIRO_LINE_JOIN_MITER. x1, y1, dx1, dy1 are all zero which causes mx and my to be enormous.
I put in the assert calls and they sem to confirm my previous comment. Stack trace of crash is below core 'core' of 9838: gnome-theme-manager ff33f730 _lwp_kill (6, 0, 5, 6, ffffffff, 6) + 8 ff2cfb48 abort (ffbfd530, 1, 6, ff38d5c0, babac, 0) + 108 ff2cfd8c _assert (febe1d8c, febe1d38, 95, 14000, ba8c0, ff38a5f0) + 60 febaabd0 _cairo_traps_add_trap (14000, ff0000, fffb25, 399fd0, 399fe8, ffbfddf4) + 29c febabd0c _cairo_traps_tessellate_polygon (ffbfddf4, 399fe8, 0, 1, 399fd0, febf5d68) + 2f8 feba0f2c _cairo_stroker_join (ffbfdca0, ffbfdcf0, 14400, fffebb80, 388da0, fffebb58) + 5f4 feba25e0 _cairo_stroker_close_path (ffbfdca0, 0, 14400, 18a0000, 1570000, febf5d68) + 160 feb9f814 _cairo_path_fixed_interpret (6, 4, 388ee0, feba19d0, feba230c, feba2480) + 1e4 feba285c _cairo_path_fixed_stroke_to_traps (385b68, fffabc68, fffac5a4, 54000, 54400, fffac718) + 14c feba9f50 _cairo_surface_fallback_stroke (3582a0, 2, ffbfdeb0, 385b68, 388da0, ffbfdf50) + 88 feba73e4 _cairo_surface_stroke (3582a0, 0, ffbfdf20, 385b68, 388da0, 388e40) + 140 feb9b020 _cairo_gstate_stroke (0, 0, ff0000, 5addc, 388e70, 388d88) + b8 feb96138 cairo_stroke_preserve (385b60, c0788000, 0, 0, 0, 385b60) + 18 feb96110 cairo_stroke (385b60, 0, 0, 406fe000, 0, 385b60) + 4 ff08c714 gtk_default_draw_focus (374c50, 59, 2, 188, fff6c8f4, 18a) + 3fc
Created attachment 8162 [details] Extra logging inside the stroker Sorry about going quiet for so long. I've been studying the stroker code in detail and preparing the mother of all logging patches. :) So far I've found a number of interesting cases in the stroker code and elsewhere that can lead to all sorts of havoc when given atypical input[1], and especially, I've managed to reproduce the symptoms of this bug thanks to your detective work with the miter joins! Now, the way the symptoms of the bug are provoked is by setting the transformation matrix to one with very large coefficients *after* path construction using cairo_move_to() and friends, but *before* the actual stroking occurs. The goal here is to make the inverse computation produce a near zero matrix and thus cause numeric underflow when computing directions in user space in _compute_face(). When this happens _compute_face() fails[2] and leaves the face set to random garbage on the stack (what I'm seeing in the test case is 0.0 as in your case.) The garbage usr_vector then messes up the miter length test in _cairo_stroker_add_join (). Apart from the fact that the user vector is potentially random garbage off the stack, one reason it is likely behaving differently on the x86 is that the result of converting out of bounds float values all map to the same number 0x80000000. I was shocked this was happening and had assumed it would have worked like the sparc does -- mapping out of bounds positive values of 0x7FFFFFFF and negative ones to 0x80000000. Another cause is likely the difference in memory management by the OS and malloc(), where malloc() starts returning NULL at different points[3]. So far, the above is merely speculation as to what is really happening inside gnome-theme-manager, but I'm hoping it is an accurate guess, To find out, the attached patch adds reams of logging from inside the stroker and should let us catch the bug in action. It should be applied to a clean cairo-1.2.4 without the previous patch, since the entire logging thing has been redone. Now the logs go to /tmp/debug-cairo.<pid>.log, and are continuously flushed to make sure there's nothing in stdio buffers left when it crashes or asserts, so it's much slower than before. Also, more verbose logs are produced. I'm fairly optimistic that with this new logging we're finally going to get to the bottom of this one. From my point of view, the best case is that gnome-theme-manager itself is setting an invalid transformation matrix before stroking. ;) The worst case involves tracking down unknown memory corruptions inside cairo. In any case, I'd really appreciate it if you could run the gnome-theme-manager test case again and make the complete logs available somewhere. For now, I'll prepare an immediate fix to the miter and pen code and post it here. This will hopefully land into the next release as well. As for the bigger problem of making sure that a sudden influx of garbage from float-to-int casts doesn't cause madness in cairo is something we're going to have to look at more closely for future releases of cairo. Joonas [1] It starts breaking down when given either near degenerate inputs, or huge inputs, and a couple from really tiny inputs. [2] The output from the attached logging patch contains the message "WARNING=_compute_face() bailed: magnitudeless direction vector?" when this happens. [3] The bug can likely be provoked in a way that leads to really large allocations and lots of swapping on a x86 as well.
(In reply to comment #12) > Sorry about going quiet for so long. I've been studying the stroker > code in detail and preparing the mother of all logging patches. :) So > far I've found a number of interesting cases in the stroker code and > elsewhere that can lead to all sorts of havoc when given atypical > input[1], and especially, I've managed to reproduce the symptoms of > this bug thanks to your detective work with the miter joins! Yes, thank you! And thank you, Joonas! You've done some excellent detective work here as well. The writeup you give here is fascinating and I would love for a wider audience to read it. Could you re-send it to the cairo mailing list please? -Carl
Created attachment 8163 [details] small test case to trigger the bug A small test case to trigger the symptoms of the bug. It should stroke a small one pixel by one pixel rectangle around (0,0) with a line width of one pixel, but fails due to numerical underflow in the stroker and matrix inversion routines. In the produced log, watch out for entries like this: _compute_face()= slope= x=1 (0x10000) y=0 (0x0) device space= line_dx=1 line_dy=0 user space= line_dx=0 line_dy=0 mag=0 WARNING=_compute_face() bailed: magnitudeless direction vector? leading later to failure of the miter code: case CAIRO_LINE_JOIN_MITER:= -in.out=0 test qty (must be >= 2)=100 miter limit=10 accept miter limit test=true user space= x1=-217 y1=-203.358 dx1=5.28219e-315 dy1=-2.35344e-185 device space= dx1=5.28219e-65 dy1=-2.35344e+65 user space= x2=0 y2=-26214.4 dx2=2.11536e-314 dy2=0 device space= dx2=2.11536e-64 dy2=0 my=-26214.4 mx=-217
Created attachment 8164 [details] debug log for x86 For future reference, here's the debug log of the test case in attachment 8163 [details] when run on x86.
Created attachment 8165 [details] debug log for sparc Debug log when running attachment 8163 [details] on a sparc v9, 32 bit code. The most interesting thing compared to x86, not directly related to this bug, is that the pen's vertex computation fails silently and produces a bad number of vertices: pen= radius=5e-251 tolerance=0.1 vertices=[-2147483648] BTW, in relation to the debug logs, I forgot to mention that the 0xabababab values in the log are from uninitialised cairo_stroker_t values that I've set to that value.
Thanks for the update. I have taken off for Christmas so it will be the new year before I can get back to this.
Created attachment 8174 [details] [review] Patch to fix pen, miter join, and stroke face computation. Here's a patch that should cover up the failing miter computations in the stroker. It also fixes the needed_vertices computation in cairo-pen.c. I'll be taking off too and be back next year. Merry Christmas! Joonas
I applied the patch from comment #18 and it seems to have fixed my problem/ Do you want me to apply the exytra logging patch from comment #12 instead to see what is going on?
If you're happy with the patch of comment #18 and think that chasing this bug further isn't worth the effort, we can leave things as they stand. However, the patch only covers up the symptoms of the bug, but sheds no light on its root cause. I'm definitely interested in the results of the extra logging patch, so if it's not too much trouble I'd really appreciate it if you could make the complete log available up to the point where the error occurs. If nothing else, the debug log should tell us if there's memory trashing inside cairo, or if cairo is being passed bad matrices by gnome-theme-manager.
I applied the extra logging and I now get a crash. Assertion failed: left->p2.x < 0x2FFFFFFF, file cairo-traps.c, line 149 I have a log file which is 25Mb. When I bzip it is it still 600K. Is it OK to attach to the bug?
Thanks! Looks like the assertions from comment #9 caught it. Now let's hope the log is properly flushed :) If bugzilla lets you upload the compressed log here, that would be best. If not, you could email it directly to me at jpihlaja@cc.helsinki.fi. Joonas
I received the debug log in my email. It was indeed very helpful and turns out that no mysterious matrices are involved anywhere at all. Instead, it's a combination of two bugs in the stroker. Firstly, the miter point solver in the joining code doesn't deal with degenerate miters very well. I realise you pointed as much out already, but I was too stubborn to listen, and since I got it into my head that it must have been the matrices, I totally missed the solver. Secondly, there actually is code in the join computation that is supposed to check for degenerate miters, so the problem with the solver just shouldn't be happening at all in this case, because no join should be happening at all. I think the solution might just be to make the check for when to apply joins stricter, but I'm not 100% sure yet. In any case, what's happening is that the stroker->first_face is not at the start of the path when the path's dash offset makes it start on an OFF dash segment, yet it's trying to apply a join when closing the path.
Is the patch in comment #18 the correct fix? Does this also need to be applied to 1.3.x?
Hi, Please accept my apologies for going AWOL on this bug -- I totally forgot about it since the patches and thought it was closed, but really had just dropped the ball on it. Happily, the bug itself has been fixed in cairo mean while. I'm attaching the test case extracted from the debug log mentioned in comment #23, where there's a dashed rectangular path that causes cairo 1.2.4 to go insane in its miter computation. The basic problem was that in your problem case cairo was trying to make a miter between a zero length subpath and a new subpath, but wasn't very good at dealing with the degeneracy. The zero length subpath comes about because the dash pattern has a new subpath start at just the same time as it's supposed to be crossing a corner. In the mean time, cairo has learnt to compute the outer point of a miter more robustly so that this case doesn't provoke the NaNs in _cairo_stroker_join() that it used to and which were causing the large trapezoids. So I'm closing this bug as fixed (no more NaNs) and opening a new bug to deal specifically with the artifacts from the degenerate miter. Joonas
Created attachment 19489 [details] Exercise confusion between dashing and mitering. The lower left corner of the dashed stroke shows a miter where it's not supposed to be.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.