I have been in technology long enough (since the early 1980’s) to see a number of “Silver Bullets” that were going to change the world. The question I’d like to examine a bit more closely here is, whether or not XML is the silver bullet for data exchange?
Early “Silver Bullets”
My first experience with silver bullets was the object oriented database! They were the new wave that were going to leave relational databases in the dust. Anyone remember or use any of those databases? For me the best thing that came out of object oriented databases was that Dale was on the same project and this is where we met and went on to found Safe Software! Thank you OODB!
I also remember very well when Java was introduced. C++ was said to become obsolete and so everyone was told to start learning Java or they would find themselves relegated to legacy software maintenance tasks. It goes without saying that Java was and is very successful; however it didn’t by any means kill C++. C++ is still widely used and isn’t going away any time soon. Here at Safe we use C++ for the FME Engine, Workbench, and other desktop technology. Meanwhile, we use Java for many of the FME Server components. It goes without saying that we would not be able to get the same high performance levels out of our FME Engine; which powers both FME Desktop and FME Server, if it was written in Java.
In short, neither the object oriented database nor Java were silver bullets. One remains a specialty product and the other while, widely successful, was not suitable for all programming tasks. Indeed the number of languages continues to grow with Python being just one example of another popular and growing language.
What about XML? Is it the Silver Bullet for Data Exchange?
With my love of XML, the question I’ve been pondering is, whether or not XML is the silver bullet for data exchange?” XML has a lot going for it and there is a lot of data from many industries that are using it or are moving towards using it.
There are also lots of tools for working with XML. For exchanging data between disparate systems it seems like it is the perfect solution. So, what are the weaknesses of XML?
XML and Performance
At Safe when it comes to building product we are obsessed with three things: quality, usability and performance. If we can improve these with each release then we are doing a lot of things right. I have never met a user who said that they do not care about any of the above.
Aside from XML I also spend a great deal of my time working alongside the FME Server team. As mentioned above, FME Server consists of a number of Java components which exchange data between them using XML messages. With FME Server one of the performance measures we watch is the number of jobs per second the core pass off to FME Engines. We want this number as high as possible and to increase with each release of FME Server.
It turns out that while beautiful (which XML always is!) using XML for this data exchange is not cheap. During FME Server stress-testing we found that much of the time was being spent on creating and parsing these XML messages! Doing more research we discovered that if we replaced the XML messages with Google Protocol Buffer technology, that we could process these messages 20 to 100 faster.
Using Google Protocol Buffer technology also meant that the messages themselves are 3 to 10 times smaller (higher information density) meaning we could exchange more data in the same time. In comparison, XML has a very low information density. To get a feeling for this simply grab an XML/GML file and compress it.
Doing this makes a huge difference in the number of jobs/second that our FME Server core can handle, increasing the maximum number of jobs that we can process from hundreds per second to thousands per second.
Anyone who has worked with XML knows that it is many things, but it wasn’t designed to be compact or quick to parse.
What Does this Mean?
Like anything, it means that you should really evaluate what you are trying to accomplish. If your primary concern is to share data in an easy to understand structure with many other loosely coupled systems, then XML is a great choice.
However, it should be recognized that nothing is free and that you are giving up performance, both in terms of cost to process XML data messages and the size of the XML data that is sent.
If your goal is to build distributed systems in which moderate to large amounts of data are to be passed around with tight time constraints, then the choice of XML really needs to be questioned. Personally, I get really nervous when folks talk about moving huge amounts of XML with very tight time constraints in interactive systems.
Even though XML continues to grow, and its future looks bright, you should always first define the goals of your system and only then select the best technology to support those goals. Selecting technology first and then defining the goals of the system always has been and always will be a very dangerous approach to building successful solutions.
As someone who loves XML, my feeling is that when it comes to the silver bullet question, XML is more of a Java than an object oriented database. Just like Java there are places where XML is not the right choice.
So while XML is not the silver bullet for all data exchange, that hasn’t dampened my enthusiasm for it. If you have any XML that you’re having problems working with; please do send it to me at firstname.lastname@example.org. I’d also be curious to hear about your experiences with any other silver bullets.
Don MurrayDon is the co-founder and President of Safe Software. Safe Software was founded originally doing work for the BC Government on a project sharing spatial data with the forestry industry. During that project Don and other co-founder, Dale Lutz, realized the need for a data integration platform like FME. When Don’s not raving about how much he loves XML, you can find Don working with the team at Safe to take the FME product to the next level. You will also find him on the road talking with customers and partners to learn more about what new FME features they’d like to see.
Thanks for this refreshing article. I fully agree to what you said.
My fav thing about XML is the performance of xpath and other query tools for reformatting/ETL from one doc to another on a local machine. I have used it to juggle data structures before/after using FME – for example plucking parts of a complex (ie nested elements) XML doc into a simple features GML doc that I can load to a target Geodatabase using FME. It’s not easy to set this kind of process up but it can be wicked fast. Looking forward to working with FME 2012 to learn how to do this kind of thing (when I get a chance).
But agreed – for transfer over networks and use server-server the transfer time is silly. Thanks for the protobuf tips.
Thanks for the comments. I am glad that it struck a chord.
Michael, I suspect that like me you have been on a project in your life where you were given requirements and also told what technology you are to use regardless of whether it is actually appropriate! Then of course later folks wonder why the project wasn’t a run away success!
Steve, you truly are what we call one of the “XML Men”. These XML tools you are talking about are amazingly powerful in the hands of an expert and as you mention “It’s not easy to set this kind of process up but it can be wicked fast”. The flip side of this too is that for a production environment it is not easy to maintain these things.”
What we are working to do with FME 2012 is to enable many of the operations such as “plucking parts out of a complex XML document” and make them easy to do.
I hope the protobuf reference is helpful. It definitely has helped us produce code that is both faster and of higher quality.
I think its too simplistic to expect a silver bullet, it is far more constructive to view xml as an evolutionary development in data exchange, and in my view its simplistic non semantic approach will allow it to last for some time to come. The major limitation is that it can get bloated but JSON is filling the gap here.
I agree whole heartedly that XML is an evolutionary development in data exchange. To see this a person has only to remember the old days in which many applications had their own binary format. Oh yeah, today I talked to a client who’s trying to move from an old proprietary format binary system. I would take XML over this anyday!
You also used the word evolutionary here which is refreshing as it implies that we will learn from the XML experience and get smarter. Walter Perry makes a great point here (http://bit.ly/q8gtyC) where he describes the principle of “You are what you produce, not what you consume.” Definitely worth a read.
To expand a bit more on “data exchange”. I would say that XML is the best we have if I am to produce a document for unknown consumers who are using unknown systems. What else is there? JSON. That is about it.