Summary: | broken files in collection | ||
---|---|---|---|
Product: | openclipart.org | Reporter: | Daniel Stone <daniel> |
Component: | clipart | Assignee: | default user for a product <clipart> |
Status: | RESOLVED NOTOURBUG | QA Contact: | |
Severity: | normal | ||
Priority: | high | CC: | jwatt, sas00003 |
Version: | unspecified | ||
Hardware: | x86 (IA32) | ||
OS: | Windows (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | |||
Bug Blocks: | 8627 |
Description
FreeDesktop Bugzilla Database Corruption Fix User
2005-07-10 22:12:31 UTC
> * Unbound prefix.
I'm pretty sure every file I've ever created for OpenClipart.org has this problem.
I really do not want to see extra markup added to ever file for a few crappy
renders but I would love to see an SVG Tidy script to allow people to post
process their files to work with crappy software.
One of our ancilliary goals is to promote the SVG standard and the last thing we
want to do is allow it to decend into a mess of really bad markup like HTML did.
If we continue to use Batik and Adobe as our yardstick and be very conservative
about what mistakes we tolerate from other crappy renderers I'd be fairly happy
about it.
ive fixed these errors for the 0.14 release, and i found this error (unbound namespace prefix) is in fact an xml error, not an svg error. these files must be considered not well formed. they not only break svg implementations: Batik, mozilla, librsvg2 but also xml implementations: msxml, saxon also , if there is no default namespace, there will be no error, since xml alows a document to be in the NULL namespace. but these files will not be SVG files, since SVG requires a default namespace. so i suggest we should care about this as well. Alan Horkan writes: > > * Unbound prefix. > > I'm pretty sure every file I've ever created > for OpenClipart.org has this problem. No, not at all. Of the 95 files in your pattern collection, 23 are broken due to containing the null character (U+0000), which is not allowed in XML, but none have any problem with unbound prefixes. > I really do not want to see extra markup added > to ever file for a few crappy renders It's not every file that has this problem; in fact it's less than 1%. And some of the files with the unbound prefix problem cannot be displayed by Batik or Adobe, which I don't consider "crappy" renderers. Moreover, any problem that causes reasonable XML parsers to choke on the files needs to be fixed anyway, regardless of what SVG renderers do. > One of our ancilliary goals is to promote the > SVG standard and the last thing we want to do > is allow it to decend into a mess of really bad > markup like HTML did. Yes, indeed. So we should fix bad markup when we find it. > If we continue to use Batik and Adobe as our > yardstick and be very conservative about what > mistakes we tolerate from other crappy renderers > I'd be fairly happy about it. I'm not suggesting that we should tolerate outright mistakes in renderers. But we certainly shouldn't present renderers with bad SVG. Of the five types of problem that I found, four cause the XML parser to fail (with good reason), while the other one (path elements without a d attribute) is a violation of the SVG spec. Created attachment 3235 [details]
Kernel log debug output from CVS "drm" module
Below is the list of problematic files updated for release 0.16.
One new type of problem is now detected: invalid path data (but due to a gross
inefficiency in my path parser, only short paths have been tested at present).
All invalid paths that have been found appear to be caused by an old
Sodipodi/Inkscape bug, where ellipses were incorrectly written. These can fixed
by replacing the invalid <path> elements with <ellipse> elements. It's also
possible to fix the files by opening them in Inkscape 0.42 and saving as Plain
SVG (not Inkscape SVG, as this doesn't fix the paths), but you then need to
restore the metadata.
There are only 44 files in this list, but some of them have multiple problems
(usually all of the same type) - see the attached file problems.txt for
details.
animals/birds/jonathon_s_duck_01.svg
animals/fish/altum_angelfish_01.svg
animals/fish/brown_fish_01.svg
animals/fish/clown_loach_01.svg
animals/fish/giraffe_cichlid_01.svg
animals/horse_1_rotkevich_konsat_01.svg
animals/mammals/horse_1_rotkevich_konsat_01.svg
computer/etiquette_cd-rom_01.svg
computer/hardware/etiquette_printer_01.svg
computer/hardware/etiquette_scanner_01.svg
computer/icons/etiquette_printer_01.svg
computer/icons/etiquette_scanner_01.svg
computer/icons/lemon-theme/actions/reload.svg
computer/icons/lemon-theme/apps/browser.svg
computer/icons/lemon-theme/filesystems/home1.svg
computer/icons/lemon-theme/filesystems/home13.svg
computer/icons/lemon-theme/filesystems/home3.svg
computer/icons/lemon-theme/filesystems/home5.svg
computer/icons/lemon-theme/filesystems/home6.svg
computer/icons/otto_02.svg
decorations/sakura_01.svg
education/otto_02.svg
food/beverages/coffe_tea_01.svg
logos/OpenClipArtLibrary/open_clip_art_librarylogo_02.svg
logos/OpenClipArtLibrary/open_clip_art_librarylogo_03.svg
logos/OpenClipArtLibrary/open_clipart_library_proposal_02_global_01.svg
plants/flowers/sakura_01.svg
recreation/games/dice.svg
shapes/blokken_arjen_meijer_01.svg
signs_and_symbols/flags/africa/saint_helena.svg
signs_and_symbols/flags/america/argentina.svg
signs_and_symbols/flags/america/british_virgin_islands.svg
signs_and_symbols/flags/america/canada/canada_new_brunswick.svg
signs_and_symbols/flags/europe/denmark/denmark_jutland.svg
signs_and_symbols/flags/europe/france/france_st_pierre_and_miquelon.svg
signs_and_symbols/flags/europe/germany/germany_eastfrisia.svg
signs_and_symbols/flags/europe/united_kingdom/south_georgia_and_south_sandwich_islands.svg
signs_and_symbols/flags/oceania/polynesia/pitcairn_islands.svg
signs_and_symbols/map_symbols/aiga_currency_exchange1.svg
transportation/aiga-symbols/aiga_currency_exchange1.svg
unsorted/blokken_arjen_meijer_01.svg
unsorted/eye_01.svg
unsorted/woman_eye_01.svg
unsorted/world_in_eye_01.svg
I have fixed the problematic files. Of the 44 files, 7 are duplicates, so there are 37 fixed files. They are now in incoming/37_fixed_files.zip. The broken files still need to be deleted from the collection. Wow, sharp work. The path-related errors are probably not caused by bugs in our tools and process, which do not, I think, make any changes to the SVG at that level. So the best we can do for those is repair them and roll the repaired versions into the release. I've got this on my radar now. Looks like most of the others at this point are the unbound prefix, so naturally I wanted to check if that could be caused by something our tools or process does. So I was going to check the upload log to see if the files as uploaded already had the problem, but at first glance it appears that the files with this particular problem have been in the collection since before the upload log was instituted. This is probably a good sign, as it probably indicates a past bug that has since been fixed. I'm not sure yet, though, and am still investigating to see if there are any recent additions with this bug, but tentatively it looks like this is an old problem that may not bother us any more once we fix these extant files. This bug used to have more info in it, before the RAID failure. I'm going to try to restore some of it, from my mail archives... ****************** Summary: broken files in collection Product: openclipart.org Version: unspecified Platform: PC OS/Version: Windows XP Status: NEW Severity: normal Priority: P2 Component: clipart AssignedTo: clipart@lists.freedesktop.org ReportedBy: sas00003@btinternet.com There are a number of SVG files in the collection that have various problems. A list of those I have found in release 0.15 is given below (at the end). Five types of problem occur here: * Unbound prefix. This means that a namespace prefix is used without having been declared in the file. If the prefix is 'xlink' then adding the attribute xmlns:xlink="http://www.w3.org/1999/xlink" to the root 'svg' element will fix the problem. If the prefix is 'inkscape' (caused by an old Inkscape bug) then the file can be fixed by removing all attributes that have the 'inkscape' prefix. * Path has no d attribute. These files (most of which seem to be produced by Sodipodi) can be fixed by removing the offending path elements (of which there are often several), since these elements are presumably not intended to render anyway. * Mismatched tag. These files were trashed by a bug in one of the scripts, and need to be recovered from an old version of the collection. * No element found at line 1. These files are empty and should be deleted. (Corresponding good files already exist in the collection.) * Undefined entity. This is usually caused by HTML entities being used instead of UTF-8. animals/birds/jonathon_s_duck_01.svg: parse error (unbound prefix at line 22 column 4) animals/fish/altum_angelfish_01.svg: parse error (unbound prefix at line 24 column 4) animals/fish/brown_fish_01.svg: parse error (unbound prefix at line 36 column 4) animals/fish/clown_loach_01.svg: parse error (unbound prefix at line 35 column 4) animals/fish/giraffe_cichlid_01.svg: parse error (unbound prefix at line 26 column 4) animals/horse_1_rotkevich_konsat_01.svg: parse error (unbound prefix at line 11 column 1) animals/mammals/horse_1_rotkevich_konsat_01.svg: parse error (unbound prefix at line 11 column 1) computer/etiquette_cd-rom_01.svg: parse error (unbound prefix at line 6 column 4) computer/hardware/etiquette_printer_01.svg: parse error (unbound prefix at line 6 column 4) computer/hardware/etiquette_scanner_01.svg: parse error (unbound prefix at line 6 column 4) computer/icons/etiquette_printer_01.svg: parse error (unbound prefix at line 6 column 4) computer/icons/etiquette_scanner_01.svg: parse error (unbound prefix at line 6 column 4) computer/icons/lemon-theme/actions/reload.svg: path has no d attribute computer/icons/lemon-theme/apps/browser.svg: path has no d attribute computer/icons/lemon-theme/filesystems/home1.svg: path has no d attribute computer/icons/lemon-theme/filesystems/home13.svg: path has no d attribute computer/icons/lemon-theme/filesystems/home3.svg: path has no d attribute computer/icons/lemon-theme/filesystems/home5.svg: path has no d attribute computer/icons/lemon-theme/filesystems/home6.svg: path has no d attribute computer/icons/otto_02.svg: parse error (unbound prefix at line 13 column 6) education/otto_02.svg: parse error (unbound prefix at line 13 column 6) food/beverages/coffe_tea_01.svg: parse error (unbound prefix at line 35 column 4) logos/OpenClipArtLibrary/open_clip_art_librarylogo_02.svg: parse error (unbound prefix at line 12 column 6) logos/OpenClipArtLibrary/open_clip_art_librarylogo_03.svg: parse error (unbound prefix at line 9 column 6) logos/OpenClipArtLibrary/open_clipart_library_proposal_02_global_01.svg: parse error (unbound prefix at line 6 column 4) logos/linux/tux_bulgarian_licho_lich_01.svg: parse error (unbound prefix at line 17 column 2) logos/linux/tux_is_chilean_01.svg: parse error (unbound prefix at line 11 column 4) shapes/blokken_arjen_meijer_01.svg: parse error (unbound prefix at line 13 column 6) signs_and_symbols/AIGA_Currency_Exchange_2.svg: path has no d attribute signs_and_symbols/flags/america/canada/canada_new_brunswick.svg: path has no d attribute signs_and_symbols/flags/europe/denmark/denmark_jutland.svg: path has no d attribute signs_and_symbols/flags/europe/germany/germany_eastfrisia.svg: path has no d attribute signs_and_symbols/flags/tux_bulgarian_licho_lich_01.svg: parse error (unbound prefix at line 17 column 2) signs_and_symbols/map_symbols/AIGA_Currency_Exchange_2.svg: path has no d attribute signs_and_symbols/usb_logo_philipp_e._imho_01.svg: parse error (unbound prefix at line 47 column 8) unsorted/blokken_arjen_meijer_01.svg: parse error (unbound prefix at line 13 column 6) unsorted/cake_etienne_bersac_01.svg-repaired.svg: parse error (no element found at line 1 column 0) unsorted/dcplusplus_icon_gergely__01.svg: parse error (undefined entity at line 231 column 32) unsorted/dcplusplus_icon_gergely__02.svg: parse error (undefined entity at line 233 column 32) unsorted/eiffel_tower_michael_jas_01.svg-repaired.svg: parse error (no element found at line 1 column 0) unsorted/eye_01.svg: parse error (mismatched tag at line 58 column 2) unsorted/interlaced_ribbons_celt_01.svg: parse error (unbound prefix at line 84 column 4) unsorted/kubuntu_logo_yogesh_kani_01.svg-repaired.svg: parse error (no element found at line 1 column 0) unsorted/mr_lakshman_s_poonyth_02.svg: parse error (unbound prefix at line 29 column 2) unsorted/sprint_cell_phone_joel_m_01.svg-repaired.svg: parse error (no element found at line 1 column 0) unsorted/ubuntu_linux_logo_yogesh_01.svg-repaired.svg: parse error (no element found at line 1 column 0) unsorted/usb_logo_philipp_e._imho_01.svg: parse error (unbound prefix at line 8 column 8) unsorted/woman_eye_01.svg: parse error (mismatched tag at line 58 column 2) unsorted/world_in_eye_01.svg: parse error (mismatched tag at line 66 column 2) ****************** ------- Additional Comments From horkana@maths.tcd.ie 2005-07-11 03:06 ------- > * Unbound prefix. I'm pretty sure every file I've ever created for OpenClipart.org has this problem. I really do not want to see extra markup added to ever file for a few crappy renders but I would love to see an SVG Tidy script to allow people to post process their files to work with crappy software. One of our ancilliary goals is to promote the SVG standard and the last thing we want to do is allow it to decend into a mess of really bad markup like HTML did. If we continue to use Batik and Adobe as our yardstick and be very conservative about what mistakes we tolerate from other crappy renderers I'd be fairly happy about it. ------- Additional Comments From holger@treebuilder.de 2005-07-11 04:00 ------- ive fixed these errors for the 0.14 release, and i found this error (unbound namespace prefix) is in fact an xml error, not an svg error. these files must be considered not well formed. they not only break svg implementations: Batik, mozilla, librsvg2 but also xml implementations: msxml, saxon also , if there is no default namespace, there will be no error, since xml alows a document to be in the NULL namespace. but these files will not be SVG files, since SVG requires a default namespace. so i suggest we should care about this as well. ------- Additional Comments From sas00003@btinternet.com 2005-07-11 04:50 ------- Alan Horkan writes: > > * Unbound prefix. > > I'm pretty sure every file I've ever created > for OpenClipart.org has this problem. No, not at all. Of the 95 files in your pattern collection, 23 are broken due to containing the null character (U+0000), which is not allowed in XML, but none have any problem with unbound prefixes. > I really do not want to see extra markup added > to ever file for a few crappy renders It's not every file that has this problem; in fact it's less than 1%. And some of the files with the unbound prefix problem cannot be displayed by Batik or Adobe, which I don't consider "crappy" renderers. Moreover, any problem that causes reasonable XML parsers to choke on the files needs to be fixed anyway, regardless of what SVG renderers do. > One of our ancilliary goals is to promote the > SVG standard and the last thing we want to do > is allow it to decend into a mess of really bad > markup like HTML did. Yes, indeed. So we should fix bad markup when we find it. > If we continue to use Batik and Adobe as our > yardstick and be very conservative about what > mistakes we tolerate from other crappy renderers > I'd be fairly happy about it. I'm not suggesting that we should tolerate outright mistakes in renderers. But we certainly shouldn't present renderers with bad SVG. Of the five types of problem that I found, four cause the XML parser to fail (with good reason), while the other one (path elements without a d attribute) is a violation of the SVG spec. ****************** Here are two further comments that were lost. These come after the four that Jonadab has reposted, and before the one that survived the RAID failure. ------- Additional Comments From sas00003@btinternet.com 2005-08-04 00:56 ------- Created an attachment (id=3235) --> (https://bugs.freedesktop.org/attachment.cgi?id=3235&action=view) list of problems found in release 0.16 Below is the list of problematic files updated for release 0.16. One new type of problem is now detected: invalid path data (but due to a gross inefficiency in my path parser, only short paths have been tested at present). All invalid paths that have been found appear to be caused by an old Sodipodi/Inkscape bug, where ellipses were incorrectly written. These can fixed by replacing the invalid <path> elements with <ellipse> elements. It's also possible to fix the files by opening them in Inkscape 0.42 and saving as Plain SVG (not Inkscape SVG, as this doesn't fix the paths), but you then need to restore the metadata. There are only 44 files in this list, but some of them have multiple problems (usually all of the same type) - see the attached file problems.txt for details. animals/birds/jonathon_s_duck_01.svg animals/fish/altum_angelfish_01.svg animals/fish/brown_fish_01.svg animals/fish/clown_loach_01.svg animals/fish/giraffe_cichlid_01.svg animals/horse_1_rotkevich_konsat_01.svg animals/mammals/horse_1_rotkevich_konsat_01.svg computer/etiquette_cd-rom_01.svg computer/hardware/etiquette_printer_01.svg computer/hardware/etiquette_scanner_01.svg computer/icons/etiquette_printer_01.svg computer/icons/etiquette_scanner_01.svg computer/icons/lemon-theme/actions/reload.svg computer/icons/lemon-theme/apps/browser.svg computer/icons/lemon-theme/filesystems/home1.svg computer/icons/lemon-theme/filesystems/home13.svg computer/icons/lemon-theme/filesystems/home3.svg computer/icons/lemon-theme/filesystems/home5.svg computer/icons/lemon-theme/filesystems/home6.svg computer/icons/otto_02.svg decorations/sakura_01.svg education/otto_02.svg food/beverages/coffe_tea_01.svg logos/OpenClipArtLibrary/open_clip_art_librarylogo_02.svg logos/OpenClipArtLibrary/open_clip_art_librarylogo_03.svg logos/OpenClipArtLibrary/open_clipart_library_proposal_02_global_01.svg plants/flowers/sakura_01.svg recreation/games/dice.svg shapes/blokken_arjen_meijer_01.svg signs_and_symbols/flags/africa/saint_helena.svg signs_and_symbols/flags/america/argentina.svg signs_and_symbols/flags/america/british_virgin_islands.svg signs_and_symbols/flags/america/canada/canada_new_brunswick.svg signs_and_symbols/flags/europe/denmark/denmark_jutland.svg signs_and_symbols/flags/europe/france/france_st_pierre_and_miquelon.svg signs_and_symbols/flags/europe/germany/germany_eastfrisia.svg signs_and_symbols/flags/europe/united_kingdom/south_georgia_and_south_sandwich_islands.svg signs_and_symbols/flags/oceania/polynesia/pitcairn_islands.svg signs_and_symbols/map_symbols/aiga_currency_exchange1.svg transportation/aiga-symbols/aiga_currency_exchange1.svg unsorted/blokken_arjen_meijer_01.svg unsorted/eye_01.svg unsorted/woman_eye_01.svg unsorted/world_in_eye_01.svg ------- Additional Comments From sas00003@btinternet.com 2005-08-05 02:27 ------- I have fixed the problematic files. Of the 44 files, 7 are duplicates, so there are 37 fixed files. They are now in incoming/37_fixed_files.zip. The broken files still need to be deleted from the collection. For the 0.17 release, I ran svgscan over the whole thing, and put the log in the special directory. I also attempted to repair (in an automated fashion) the one problem that seemed most prominent, the missing prefix on the space attribute. So how are we now, error-wise, with the 0.17 release? Jonadab writes: > For the 0.17 release, I ran svgscan over the whole thing, and put > the log in the special directory. OK, so it occurs to me that I should really have the log include a list of the warnings that were switched on, otherwise the absence of warnings of a particular type doesn't mean much. So the latest version of SVGscan http://www.argentum.freeserve.co.uk/svgscan.zip includes this. It also includes a completely rewritten test for invalid paths, which is now fast enough to be usable on the entire OCAL collection (but still takes about 20 minutes for release 0.16 on my machine, so it's off by default). There are also some new tests, and some warning levels intended for use with OCAL (-ocalmild, -ocal, -ocalsevere). By the way, it looks like you ran svgscan.py -most * instead of svgscan.py -most . because svgscan.log lists parse errors for the non-SVG files in top directory, which wouldn't normally be tested. > I also attempted to repair (in an automated fashion) the one problem > that seemed most prominent, the missing prefix on the space attribute. This isn't fixed in the copy of release 0.17 that I downloaded (openclipart-0.17-svgonly.tar.bz2). > So how are we now, error-wise, with the 0.17 release? We no longer have any files giving parse errors, or with paths lacking 'd' attributes, and we have only 3 files (all the same) with invalid path data. Overall, we have more warnings than before. But this is mainly because of the 1375 star*step.svg files (suggested fix: rm star*step.svg), and the increase in the amount of xml:space corruption (which was expected, since the bug causing it hasn't been fixed). I've been looking at the files in 0.18 and found some that are broken. Here's a list of those that I noticed: contour_baboon.svg contour_elephant.svg contour_fox.svg contour_kangaroo.svg contour_orangutan.svg contour_cheetah.svg bread_and_wine_mark_near_.svg carte_de_france_01.svg world_map_saint_.svg blueman_101_02.svg .............. and all the similarly named files there treble_clef_01.svg trefoil_architectural_e_01.svg coat_of_arms_of_anglica_01.svg trefoil_architectural_e_01.svg star_05pt02step.svg ............. and all the similarly named files there aids_ribbon_saint_.svg recycle_water_saint_.svg stop_sign_miguel_s_nchez_.svg treble_clef_01.svg pattern-arrows-reverse-4.svg pattern-checkers-1.svg ................................. actually most files under special/patterns nodding_donkey_kevin_cow_01.svg Motorway_on.svg lam_arn_01.svg media_as_wmd_saint_.svg not sure what your definition of "broken" is, special/patterns (and gradients) are working as originally intended. if you put them in the right folder inkscape should detect them and generate previews as needed. there is a chance the markup I used is only valid but not pedantic enough pass all tests. They include Defines (prefined objects, <defs>) an did not include an instance of the predefined object which would have been needed for a preview. The program I created these with is no longer available (Jasc Webdraw) and the added XMP metadata already breaks any compatibility I had hoped for, so there is no harm in adding more markup if people feel it is really necessary. If someone could write a script to add object instances and make it easier to preview files that would be good but it needs to be automated (as it would be far too tedious to do it any other way. Some post processing work might still be necessary). Jasc Webdraw is dead? :-( When I say "broken" I mean files such as: http://openclipart.org/clipart/special/patterns/pattern-checkers-2.svg are not valid because they don't bind the required namespaces. The specification requires that they do, and some multi-namespace applications such as Mozilla won't render the SVG without them. Unfortunately some tools and viewers (including ASV) don't create or require the namespace bindings, so awareness of this issue is low. I've written up a document to explain this and other common problems in SVG files and how to fix them at: http://jwatt.org/svg/authoring/ The section that's relevant here is: http://jwatt.org/svg/authoring/#namespace-binding The doc has been reviewed by members of the SVG WG and other leading SVG figures, so I'm not making this up. ;-) Yes, we need to address this majorly bad! encompassed by the new feature request - https://bugs.freedesktop.org/ show_bug.cgi?id=8627 Mass reopen. The "LATER" resolution is lame, I'm deleting it. Consider LATER to have arrived. Closing all openclipart bugs as openclipart is now on launchpad, as per request from Jon Philips. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.