Summary: | cairo-perf-trace failure | ||
---|---|---|---|
Product: | cairo | Reporter: | M Welinder <terra> |
Component: | general | Assignee: | Chris Wilson <chris> |
Status: | RESOLVED FIXED | QA Contact: | cairo-bugs mailing list <cairo-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | emanuele.aina, marco.barisione, ppaalanen |
Version: | 1.12.16 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Compressed trace
Zenity trace testcase [PATCH] cairo-trace: Fix duplicated surface push on similar-image [PATCH] cairo-trace: Extract common helper for _surface_create_similar[_image]() [PATCH 1/3] cairo-trace: Fix duplicated surface push on similar-image [PATCH 2/3] cairo-trace: Sync _create_similar_image() with _create_similar() [PATCH 3/3] cairo-trace: Extract common helper for _surface_create_similar[_image]() |
Description
M Welinder
2014-01-14 00:34:45 UTC
Created attachment 92010 [details]
Compressed trace
That "invalid value" error is tormenting me too. All tracing attempts of complicated applications (Midori, netsurf-gtk, epiphany-browser) end up either in segfault during the tracing, or then the produced trace causes "invalid value" when trying to replay it. The only thing I managed to do, is to get a backtrace of when the error happens during replay: #0 0x00007ffff7bd0244 in _csi_error () from /home/pq/tmp/pixman-install/lib/libcairo-script-interpreter.so.2 #1 0x00007ffff7bc5744 in _csi_ostack_get_context.3734 () from /home/pq/tmp/pixman-install/lib/libcairo-script-interpreter.so.2 #2 0x00007ffff7bc56de in _do_cairo_op.3739 () from /home/pq/tmp/pixman-install/lib/libcairo-script-interpreter.so.2 #3 0x00007ffff7bc0c8d in _reset_clip.3755 () from /home/pq/tmp/pixman-install/lib/libcairo-script-interpreter.so.2 #4 0x00007ffff7bd1d58 in csi_object_execute () from /home/pq/tmp/pixman-install/lib/libcairo-script-interpreter.so.2 #5 0x00007ffff7bce59c in _scan_execute.5675 () from /home/pq/tmp/pixman-install/lib/libcairo-script-interpreter.so.2 #6 0x00007ffff7bc8482 in scan_execute.5474 () from /home/pq/tmp/pixman-install/lib/libcairo-script-interpreter.so.2 #7 0x00007ffff7bcac32 in _scan_file.5693 () from /home/pq/tmp/pixman-install/lib/libcairo-script-interpreter.so.2 #8 0x00007ffff7bd1eed in _csi_scan_file.2241 () from /home/pq/tmp/pixman-install/lib/libcairo-script-interpreter.so.2 #9 0x00007ffff7bd1ddd in _csi_file_execute.2233 () from /home/pq/tmp/pixman-install/lib/libcairo-script-interpreter.so.2 #10 0x00007ffff7bd1d8a in csi_object_execute () from /home/pq/tmp/pixman-install/lib/libcairo-script-interpreter.so.2 #11 0x00007ffff7bc517f in cairo_script_interpreter_run () from /home/pq/tmp/pixman-install/lib/libcairo-script-interpreter.so.2 #12 0x00000000004062a0 in cairo_perf_trace (perf=0x7fffffffdb50, target=0x611cc0 <builtin_targets>, trace=0x7fffffffe06a "./netsurf-load-madoka.trace") at cairo-perf-trace.c:757 #13 0x0000000000406fbc in main (argc=3, argv=0x7fffffffdcf8) at cairo-perf-trace.c:1032 And for whatever reason, libcairo-script-interpreter.so seems to never contain debug information while libcairo.so does, but that's off-topic. I do run the traces through csi-bind from the Cairo trace repositories, but that doesn't help. What can I do to investigate this? Is there some other tool, that would verbosely tell, what it parses from the trace, so that I might at least determine whether the trace itself is corrupted, or if the script interpreter is broken? I found DEBUG_SCAN in cairo-script-scanner.c, but it does not tell what the operators are, and I couldn't figure out how to print what they are. I have been recording and playing back on Raspberry Pi with Cairo 1.12.16. I have managed to create one working trace from Leafpad. All the traces work or do not work equally well on both RPi and x86_64. > And for whatever reason, libcairo-script-interpreter.so seems to never contain debug information while libcairo.so does, but that's off-topic.
As far as can tell that's because Link Time Optimization effectively disables debug symbols on libcairo-script-interpreter.so and libcairo-trace.so. My understandin is that LTO does not work well with debug symbols yet, but I wonder why it seems to work fine on libcairo.so.
After brutally stripping away the section related to -flto in build/configure.ac.warnings I got debug symbols working everywhere.
To get back to the main point of this bug, I seem to be affected too and here's the simple way to trigger it on my Debian installation (sid, libcairo2 1.14.0-2.1) using zenity:
$ cairo-trace cairo-trace zenity --info --text foo
cairo-trace: Recording cairo trace data to /home/em/zenity.25955.trace
$ cairo-perf-trace /home/em/zenity.25955.trace
[ # ] backend test min(s) median(s) stddev. count
[ 0] xcb zenity.25955 Error during replay, line 61: invalid value (typically too big) for the size of the input (surface, pattern, etc.)
[ 0] xlib zenity.25955 Error during replay, line 61: invalid value (typically too big) for the size of the input (surface, pattern, etc.)
[ # ] image: pixman 0.32.6
[ 0] image zenity.25955 Error during replay, line 61: invalid value (typically too big) for the size of the input (surface, pattern, etc.)
[ # ] image16: pixman 0.32.6
[ 0] image16 zenity.25955 Error during replay, line 61: invalid value (typically too big) for the size of the input (surface, pattern, etc.)
Testing latest master (4a225fca5f12) on my laptop (Debian sid amd64) I'm still able to reproduce the issue (but debug symbols work everywhere without hacks, yay!): $ CAIRO_TRACE_SO=util/cairo-trace/.libs/libcairo-trace.so util/cairo-trace/cairo-trace zenity --info --text foo cairo-trace: Recording cairo trace data to ./zenity.24699.trace $ perf/cairo-perf-trace ./zenity.24699.trace [ # ] backend test min(s) median(s) stddev. count [ 0] xcb zenity.24699 Error during replay, line 86: invalid value (typically too big) for the size of the input (surface, pattern, etc.) [ 0] xlib zenity.24699 Error during replay, line 86: invalid value (typically too big) for the size of the input (surface, pattern, etc.) [ # ] image: pixman 0.32.6 [ 0] image zenity.24699 Error during replay, line 86: invalid value (typically too big) for the size of the input (surface, pattern, etc.) [ # ] image16: pixman 0.32.6 [ 0] image16 zenity.24699 Error during replay, line 86: invalid value (typically too big) for the size of the input (surface, pattern, etc.) Breaking with gdb on _csi_error() shows that _set_source() is peeking at the stack, expecting a context object and getting a surface object instead. Indeed, the current top of the stack contains: 0 CSI_OBJECT_TYPE_PATTERN 1 CSI_OBJECT_TYPE_SURFACE 2 CSI_OBJECT_TYPE_CONTEXT This is in response to a cairo_set_source_surface() call, so I would suspect that the pattern object is out-of-place but this would mean that there are two bugs: the pattern object in the stack and _set_source() assuming a pattern even when replaying a cairo_set_source_surface() call. That said my understanding of how the cairo script stack management works is too limited to figure out if it's a problem with the trace or with the interpreter, so some help would be greatly appreciated. Created attachment 110850 [details]
Zenity trace
Created attachment 110900 [details]
testcase
Shortest program whose trace shows the problem,
probably in util/cairo-trace/trace.c
I think it is the same problem that led to this commit: http://cgit.freedesktop.org/cairo/commit/util/cairo-trace/trace.c?id=c1ac8db7e60bcec8235b09729b34154e95790534 which changed the structure of the code in cairo_surface_create_similar but not as much in cairo_surface_create_similar_image they were similar before, but now seem to follow a different stack logic. Using cairo_surface_create_similar in my testcase does not produce the error. With my limited understanding on how the interpreter stack is supposed to work, I suspect that cairo_surface_create_similar_image() is pushing the reference surface one time too many. I've attached a very simple patch that at least allows me to get working traces for every GTK3 program I've tested (zenity, gedit, epiphany). Indeed I was surprised that cairo_surface_create_similar() and cairo_surface_create_similar_image() weren't particularly similar (no pun intended). Should I try to update cairo_surface_create_similar_image() to look more like cairo_surface_create_similar() or even sharing code if possible? Created attachment 110950 [details] [review] [PATCH] cairo-trace: Fix duplicated surface push on similar-image Created attachment 110953 [details] [review] [PATCH] cairo-trace: Extract common helper for _surface_create_similar[_image]() I've reworked the patch taking in consideration what Massimo suggested, extracting a helper function from cairo_surface_create_similar() to be shared with cairo_surface_create_similar_image(). Comment on attachment 110950 [details] [review] [PATCH] cairo-trace: Fix duplicated surface push on similar-image Review of attachment 110950 [details] [review]: ----------------------------------------------------------------- LGTM. I confirm that it fixes the issue (at least on the testcase). Comment on attachment 110953 [details] [review] [PATCH] cairo-trace: Extract common helper for _surface_create_similar[_image]() Review of attachment 110953 [details] [review]: ----------------------------------------------------------------- I expressed some concerns about this refactoring on IRC, but I'll also post them here to keep track of them together with the bug. While removing some code duplication between similar and similar-image is ideally good, in practice the two operators have different signatures (they use different parameters in a different order), so I'm afraid that coalescing both in a single function makes them more complicated than needed. I agree that the changes from c1ac8db7e60bcec8235b09729b34154e95790534 should also apply to similar-image, but I believe that merging the emission of the two operator is not a good idea. Ideally I would merge the previous patch (which fixes the cairo_surface_create_similar_image() function so that it does not pushes more surfaces than expected on the stack) and (in a separate commit) resync the implementation of similar and similar-image, possibly by only sharing the code for _emit_other_obj(). If possible, I would also try to find out (and document in a comment) why _emit_surface_op() should not be used in similar/similar-image. Created attachment 111956 [details] [review] [PATCH 1/3] cairo-trace: Fix duplicated surface push on similar-image Reinstate the minimal patch from attachment 110950 [details] [review] as suggested by Andrea to fix the bug with the less intrusive change possible and move all the enhancements to the code in separate commits. Created attachment 111957 [details] [review] [PATCH 2/3] cairo-trace: Sync _create_similar_image() with _create_similar() After commit c1ac8db7e60b the cairo_surface_create_similar() and cairo_surface_create_similar_image() overrides diverged, with the latter not getting the bugfixes that have been applied to the first. This commit updates the cairo_surface_create_similar_image() override implementation to look like the one of cairo_surface_create_similar(), ensuring that the surface operand gets printed in the trace and always assigning an identifier for the newly created surface. Created attachment 111958 [details] [review] [PATCH 3/3] cairo-trace: Extract common helper for _surface_create_similar[_image]() To avoid any divergence between cairo_surface_create_similar_image() and cairo_surface_create_similar() in future, coalesce the code to emit the trace line in a common helper. Unfortunately the signature of the two trace lines are slightly different (in 'similar' the content type precedes width and height, while for 'similar-image' width and height come before the format identifier), but to be able to run new traces with older cairo tools it cannot be changed. This is optional, I still included it in the patchset since I feel that the two implementations may drift again after this resync, leading to subtle bugs. > If possible, I would also try to find out (and document in a comment) why _emit_surface_op() should not be used in similar/similar-image.
I honestly don't know, _emit_surface_op() calls into _emit_current() (by way of _emit_surface()) which seems to be willing to roll the stack up to the requested surface instead of using indexed addressing or even 'dup'.
I'm not sufficiently intimate with the cairo interpreter stack machine to understand the two behaviours, but they seem quite different. :/
(In reply to Emanuele Aina from comment #17) > > If possible, I would also try to find out (and document in a comment) why _emit_surface_op() should not be used in similar/similar-image. > > I honestly don't know, _emit_surface_op() calls into _emit_current() (by way > of _emit_surface()) which seems to be willing to roll the stack up to the > requested surface instead of using indexed addressing or even 'dup'. > > I'm not sufficiently intimate with the cairo interpreter stack machine to > understand the two behaviours, but they seem quite different. :/ TL;DR: Let's merge patches 1&2 and ask Chris if it is safe to use _emit_surface ;) Each patch in the patchset is: Tested-By: Andrea Canciani <ranma42@gmail.com> I confirm that all of the patches work in my environment and in particular that the first one fixes this bug. The first 2 patches are Reviewed-By: Andrea Canciani <ranma42@gmail.com> The first patch fixes this bug, it is quite straightforward, provides a complete explanation in the commit message and does not affect unrelated code The second one tries to reduce code complexity by restoring the similarity between two functions Before merging the third one, I would try to get a more complete understanding of the cairoscript generator... For example, inside cairo_surface_create_for_rectangle() there is a code fragment that is suspiciously similar to the one which was in the old version of cairo_surface_create_similar[_image](). Would this be the right fragment to abstract? (I think the concern that Bryce expressed on IRC might be related to this) Even better, should we just use _emit_surface[_op]()? In some functions, a comment explains why _emit_surface should not be used (example: cairo_surface_mark_dirty() in trace.c), but the one which are involved in this bug do not say anything on this. > For example, inside cairo_surface_create_for_rectangle() there is a code > fragment that is suspiciously similar to the one which was in the old > version of cairo_surface_create_similar[_image](). > Would this be the right fragment to abstract? > (I think the concern that Bryce expressed on IRC might be related to this) Thanks, I'll have a look to see if I can find the right level of abstraction. > Even better, should we just use _emit_surface[_op]()? > In some functions, a comment explains why _emit_surface should not be used > (example: cairo_surface_mark_dirty() in trace.c), but the one which are > involved in this bug do not say anything on this. A quite difficult to understand (for me) issue is that _emit_surface() is quite different from what the highlighted functions do to print the indexing information. In particular _emit_surface() seems to pop all the objects on the stack until it reaches the target surface, which seems quite destructive, while our functions all try different indexing strategies to pick the right surface without altering the stack. Unfortunately I haven't had the chance to investigate _emit_surface() further, so at the moment I can't do much more than what patch 3/3 does. Given that the original problem has been fixed I guess we can close this bug for now. Closing as per last comment, sounds like the follow-on work would be done separately. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.