tatiyants.com XML Databases

In spite of some notable opposition, NoSQL has been all the rage lately. In particular, JSON-oriented document stores like Mongo and Couch have really become the darlings of the web application crowd.

Of course, JSON (or BSON) isn’t the only game in town. When it comes to document store formats, the other white meat if you will is XML.

As far as I could find, XML databases have actually been around longer than JSON based ones (the first XML database eXist was introduced in 2000, whereas the first JSON based one Couch DB came on the scene about 5 years later). Yet in spite of this head start, XML databases appear to be the red-headed step child. I was curious as to why this is the case, and here’s what I found.

XML vs. JSON

Let’s first consider the two formats in question. While it’s debatable which format is more commonly used for storage, it’s a lot less debatable which is considered to be the hipper of the two. Spoiler alert: it’s JSON.

XML is Worse

So, why is XML bad? Well, a big knock against XML is that it’s too “heavy weight” and “enterprisy”.

First, it’s obviously more verbose:

XML

JSON

<complaintsAgainstXML>
   <complaint>it's too verbose</complaint>
   <complaint>it's too complex</complaint>
   <complaint>XSLT sucks</complaint>
</complaintsAgainstXML>

{
    complaintsAgainstXml:{
        complaint:[
            'it\'s too verbose',
            'it\'s too complex',
            'XSLT sucks'
        ]
    }
}

It is essential for you cialis in to be able for managing a perfect relationship. So, this sample generic viagra secretworldchronicle.com will make the medicine cheaper where the production cost of the medicine is lower. buy levitra in canada Another effective way is generic medicines that help to the improvement of blood circulation Mega Protect 4Life contains Gingko Biloba and powerful antioxidants that contribute to the improvement of blood circulation. They will love it, and it is marketed by the company Pfizer, and US based company of medicine producing. http://secretworldchronicle.com/?s=%EF%BC%BB%EC%98%A8%EB%9D%BC%EC%9D%B8%EC%B9%B4%EC%A7%80%EB%85%B8%EF%BC%BD%E2%99%AA-%EC%95%84%EB%B0%94%ED%83%80%EA%B2%8C%EC%9E%84-%E2%87%9F%EB%8F%84%EB%B0%95+%ED%95%A9%EB%B2%95+%EA%B5%AD%EA%B0%80%E2%87%96%E3%80%90%E3%80%91 levitra 20 mg
Not counting white space, XML takes 171 characters whereas JSON takes 86. This amounts to almost 50% fewer characters for JSON, which makes it a much better format for transporting data over a distributed network (at least in uncompressed form).

XML is also more complex because it allows both attributes and elements, whereas JSON limits it to just elements. Some say that JSON parsers are available more languages than XML. And of course JSON can be natively processed in the browser, which makes it for a much better “X” in Ajax (Doug Crockford’s quote, note mine).

XML is Better

On the other hand, XML has a bunch of useful supporting technologies around it:

validations against a predefined schema using XML Schemas, Schematrons, and DTDs
traversal using XPath
transformations with XSLT
searching with XQuery
referencing other XML with XLink or XInclude

Now, there is no doubt that working with some of them can be painful (I’m looking at you XSLT). The tooling isn’t great, debugging is awkward, testability is questionable, etc.

Moreover, similar versions of some of these also exist for JSON. For instance there is JSON Path and JSON Schema. That said, I’m not sure how widely utilized they are.

XML Databases

Ok, let’s finally get back to the main point of this post: XML databases. Here’s a small sample of the capabilities you typically get with them:

XML CRUDS (create, retrieve, update, delete, and search via XQuery)
Document validation (using XML Schema)
Document references (via XInclude or XLink)
Library services (versioning, diffing, branching)
Storing non-xml but meta-tagged content (like images)

Of these, only CRUDS operations are well represented in JSON-based stores. Mongo DB, for example, has pretty advanced querying capabilities supported by database indexes.

Other capabilities are much more common (if not unique) to XML databases. Consider for example document references. In XML, you can reference one document from another using XInclude:

<menu>
   <menuItem>
      <xi:include href="menuItems/BeefStroganof.xml" />
   </menuItem>
   <menuItem>
      <xi:include href="menuItems/RasperryIceTea.xml" />
   </menuItem>
</menu>

XML databases which have support for XInclude (like Mark Logic) will automatically resolve the reference and return to you a complete document using basically a single line of code.

Final Thought

JSON based document stores certainly have their appeal, especially for web applications built on a complete JavaScript stack (some browser library + Node.js + Mongo). The fact that your data can flow effortlessly through the entire stack is really really nice.

That said, XML databases do have unique and useful capabilities which can save you a lot of effort, if you need them. Hence, don’t dismiss XML databases off-hand just because XML isn’t cool.

This post got 4 comments so far. Care to add yours?

Will Rubin says:

April 22, 2012 at 5:58 pm

Where are the links to the various XML DBs. I see lots of links to JSON stuff but only one like to an XML DB. Which XML DBs do you like and recommend and for each one, what are the pros and cons?
- Alex Tatiyants says:
  
  April 22, 2012 at 6:20 pm
  
  Hi Will,
  
  I’ve looked at eXist, Mark Logic, Documentum’s xDB, IXIAsoft’s TextML, and a couple of others. Of the bunch, Mark Logic stands out as the most feature rich and most performant. It is, however, pretty expensive. If cost is a factor, eXist is a pretty solid open source alternative.
  
  Hope this helps.
David Lee says:

June 14, 2012 at 10:00 pm

One comment … your statement that “XML is Obviously more verbose” is entirely FUD. Its a meme that has been spread around without serious thought and everyone repeats it as if it were fact.
The fact is one can choose an XML or a JSON representation which is larger or shorter then the other.
Simplistic translations of JSON to XML produce larger XML. But the fact most people dont (choose to) understand is that the reverse is also true. Simplistic translations of XML to JSON produce bigger JSON!
Just take a look at the translator at json.org and put your XML sample through it.
Their default converter produces this JSON:

{
“childNodes”: [
{
“childNodes”: [“it’s too verbose”],
“tagName”: “complaint”
},
{
“childNodes”: [“it’s too complex”],
“tagName”: “complaint”
},
{
“childNodes”: [“XSLT sucks”],
“tagName”: “complaint”
}
],
“tagName”: “complaintsAgainstXML”
}

which is larger then the XML. Of course one would complain “Thats not how I would write it !”.
Of course ! But neither is the XML sample how I would write XML for the corresponding JSON if conciseness was the main constraint. For example this representation would work just fine (and so would a million others)

it’s too verbose
it’s too complex
XSLT sucks

Or perhaps using attributes

The conciseness argument is just baseless, period.
But more importantly, its unimportant. Even for net transmissions, modern HTTP servers and browsers support in flight compression. This makes up for any extra verbose that is composed of replication of sub strings. And then there is EXI (http://www.w3.org/XML/EXI/) which is implementing a spec for efficient XML. Furthermore, when is the text representation of JSON or XML really relevant ? Most of the processing time is spent on the internal data model, not the text representation.

In the end … if you are going to discuss XML vs JSON please don’t pull in the conciseness BS. Its simply untrue, biased, too easily misrepresented and mostly irrelevant. There are good and bad reasons to use JSON or XML in different situations but text representational conciseness is not one one of them.

-David Lee
- Alex Tatiyants says:
  
  June 15, 2012 at 12:04 am
  
  Hi David, thank you very much for your comment.
  
  I wish I could see exactly what you intended to show with your XML example, because it didn’t come across well. You point about creating more concise XML if conciseness is the goal is valid, but I think it misses a larger issue. In the example I gave, both representations are the most natural (at least in my opinion) way of representing a collection of elements using each format. In other words, if you aren’t explicitly worried about conciseness, JSON representations are naturally more concise.
  
  That said, I completely agree that the conciseness doesn’t matter much, especially these days.