Bug 4743

Summary: Many SVGs are invalid XML, all should be validated on submission
Product: openclipart.org Reporter: Eric Seidel <eseidel>
Component: clipartAssignee: default user for a product <clipart>
Status: RESOLVED NOTOURBUG QA Contact:
Severity: normal    
Priority: high CC: esigra
Version: unspecified   
Hardware: PowerPC   
OS: Mac OS X (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 8627    

Description Eric Seidel 2005-10-11 03:22:48 UTC
Many SVGs are invalid XML, all should be validated on submission

I have discovered while writing Safari+SVG, that many of the SVG files on OpenClipart.org are in fact 
invalid xml.  Two recent examples come to mind:

http://www.openclipart.org/incoming/jerusalem_cross_with_cr_01.svg

<?xml version="1" standalone="no"?> is invalid, only 1.0 and 1.1 are allowed.

As well as openclipart's own logo:

http://www.openclipart.org/logo/openclipartlibrary-logo-only-5colors.svg

the prefix "rdf" is never defined yet it's used.

My suggestion would be that the submission script should use either libTidy or validator.w3c.org to 
check every svg before allowing an author to submit.
Comment 1 Nicu Buculei 2005-10-11 04:07:57 UTC
Please have a look at this wiki page to read more about the usage of the "rdf"
prefix: http://openclipart.org/cgi-bin/wiki.pl?MetadataDiscussion
Comment 2 Stephen Silver 2005-10-11 10:09:08 UTC
Eric Seidel writes:

> I have discovered while writing Safari+SVG, that many of
> the SVG files on OpenClipart.org are in fact invalid xml.

If you mean valid in the sense of the XML spec, then probably all of them are
invalid. XML validity isn't really a useful concept for SVG with embedded RDF.

The SVG files ought to be conforming SVG, however, which in particular means
that they ought to be well-formed XML and ought to conform to the Namespaces in
XML spec. There are currently hundreds that are not conforming SVG.

> <?xml version="1" standalone="no"?> is invalid, only 1.0 and 1.1 are allowed.

Yes, version="1" is wrong. This is a matter of well-formedness. In fact, I think
only XML version 1.0 should be allowed for SVG, as this is what the SVG 1.0 and
SVG 1.1 specs refer to.

All the SVG files in release 0.17 that have an XML declaration specify XML
version 1.0, so this problem does not appear to be widespread. (I haven't
downloaded release 0.18 yet.)

> http://www.openclipart.org/logo/openclipartlibrary-logo-only-5colors.svg
> 
> the prefix "rdf" is never defined yet it's used.

Yes, that's wrong too, because it doesn't conform to the Namespaces in XML spec.

None of the SVG files in release 0.17 use undefined prefixes. There were some in
a previous release, but they were fixed.

> My suggestion would be that the submission script should use
> either libTidy or validator.w3c.org to check every svg before
> allowing an author to submit.

I'm not familiar with libTidy, but validator.w3c.org checks for valid XML and so
would reject everything.

I don't know of any tool that checks for conforming SVG. My SVGscan script
checks for various problems, but it wouldn't have spotted that version="1"
(though I've added a test for that now).

Andrew Archibald has suggested validating incoming files against a RELAX NG
schema, but nobody has proposed a suitable schema yet.

Rather than just rejecting bad files, it would better for the incoming script to
fix them whenever possible. For example, it could set the XML version to 1.0,
add xmlns="http://www.w3.org/2000/svg" to the root element if needed, change
'textpath' elements (an Inkscape 0.42 bug) to 'textPath', etc.
Comment 3 Eric Seidel 2005-10-11 14:06:01 UTC
Good to know.

The tidy, that I was refering to is HTML tidy, which is available in various incarnations, some of which 
respect xml, probably none of them do so well.

http://tidy.sourceforge.net/

It's good to hear that you have a special script for this. I guess the best thing to do with this bug, is 
simply fix the two svgs I mentioned, and close.  I'll check out svg_validate:

http://search.cpan.org/~bryce/SVG-Metadata-0.20/scripts/svg_validate

and post any further problems which I feel should be covered by that script as part of other bugs.  
Thanks!
Comment 4 Eric Seidel 2005-10-11 14:13:32 UTC
Actually, I tried using:
http://validator.w3.org/

With a couple svgs, and had surprisingly good results.  It seems to have trouble with xmlns: definitions, 
but is still worth at least looking at.
Comment 5 ryanlerch 2006-10-12 22:03:45 UTC
encompassed by the new feature request - https://bugs.freedesktop.org/
show_bug.cgi?id=8627
Comment 6 Adam Jackson 2008-02-24 18:22:18 UTC
Mass reopen.  The "LATER" resolution is lame, I'm deleting it.  Consider LATER to have arrived.
Comment 7 Tollef Fog Heen 2010-08-18 03:24:11 UTC
Closing all openclipart bugs as openclipart is now on launchpad, as per request from  Jon Philips.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.