Many SVGs are invalid XML, all should be validated on submission
I have discovered while writing Safari+SVG, that many of the SVG files on OpenClipart.org are in fact
invalid xml. Two recent examples come to mind:
<?xml version="1" standalone="no"?> is invalid, only 1.0 and 1.1 are allowed.
As well as openclipart's own logo:
the prefix "rdf" is never defined yet it's used.
My suggestion would be that the submission script should use either libTidy or validator.w3c.org to
check every svg before allowing an author to submit.
Please have a look at this wiki page to read more about the usage of the "rdf"
Eric Seidel writes:
> I have discovered while writing Safari+SVG, that many of
> the SVG files on OpenClipart.org are in fact invalid xml.
If you mean valid in the sense of the XML spec, then probably all of them are
invalid. XML validity isn't really a useful concept for SVG with embedded RDF.
The SVG files ought to be conforming SVG, however, which in particular means
that they ought to be well-formed XML and ought to conform to the Namespaces in
XML spec. There are currently hundreds that are not conforming SVG.
> <?xml version="1" standalone="no"?> is invalid, only 1.0 and 1.1 are allowed.
Yes, version="1" is wrong. This is a matter of well-formedness. In fact, I think
only XML version 1.0 should be allowed for SVG, as this is what the SVG 1.0 and
SVG 1.1 specs refer to.
All the SVG files in release 0.17 that have an XML declaration specify XML
version 1.0, so this problem does not appear to be widespread. (I haven't
downloaded release 0.18 yet.)
> the prefix "rdf" is never defined yet it's used.
Yes, that's wrong too, because it doesn't conform to the Namespaces in XML spec.
None of the SVG files in release 0.17 use undefined prefixes. There were some in
a previous release, but they were fixed.
> My suggestion would be that the submission script should use
> either libTidy or validator.w3c.org to check every svg before
> allowing an author to submit.
I'm not familiar with libTidy, but validator.w3c.org checks for valid XML and so
would reject everything.
I don't know of any tool that checks for conforming SVG. My SVGscan script
checks for various problems, but it wouldn't have spotted that version="1"
(though I've added a test for that now).
Andrew Archibald has suggested validating incoming files against a RELAX NG
schema, but nobody has proposed a suitable schema yet.
Rather than just rejecting bad files, it would better for the incoming script to
fix them whenever possible. For example, it could set the XML version to 1.0,
add xmlns="http://www.w3.org/2000/svg" to the root element if needed, change
'textpath' elements (an Inkscape 0.42 bug) to 'textPath', etc.
Good to know.
The tidy, that I was refering to is HTML tidy, which is available in various incarnations, some of which
respect xml, probably none of them do so well.
It's good to hear that you have a special script for this. I guess the best thing to do with this bug, is
simply fix the two svgs I mentioned, and close. I'll check out svg_validate:
and post any further problems which I feel should be covered by that script as part of other bugs.
Actually, I tried using:
With a couple svgs, and had surprisingly good results. It seems to have trouble with xmlns: definitions,
but is still worth at least looking at.
encompassed by the new feature request - https://bugs.freedesktop.org/
Mass reopen. The "LATER" resolution is lame, I'm deleting it. Consider LATER to have arrived.
Closing all openclipart bugs as openclipart is now on launchpad, as per request from Jon Philips.