Monday, March 30, 2009

So What Exactly is XML?

By Alan Spencer

Extensible Mark-up Language (XML) has very quickly established itself as a viable technology with a huge range of real-world applications. One of the main reasons for its importance and wide acceptance is the fact that it offers a working solution to one of the key problems faced by software developers and computer users alike: the exchange of incompatible data. Each software environment creates its own unique type of binary file which only it can understand. When data is exported in XML format, it becomes a known quantity, independent of the environment in which it was originated.

Adobe's PDF format is another example of a platform-independent data format which has gained wide acceptance. When a document is saved as a PDF file, its format is set in stone, it can viewed and printed with its layout and formatting intact, without the need for the software which created the original file. However, whereas the PDF format concerns itself primarily with the presenting information, XML is used to describe and encapsulate the information itself.

Though XML itself is still fairly new, the idea behind goes back a long way. In the 1970s, Standard Generalized Markup Language (SGML) was developed in an attempt to create an application-independent method of describing data. SGML is a text-based language which employs the concept of adding mark-up to data which describes the data itself. An SGML document contains both the original data and a set of rules defining the structure of that data. SGML is a fairly complex language and, unlike XML, has never become mainstream. In the early 1990s, SGML was used to develop and specify the rules of HTML and in the late 1990s, SGML was again called upon, this time as the basis for the development of XML. In many ways XML is really a restricted form of SGML.

XML has already proved itself to be an excellent medium for storing, describing and transporting data, particularly over the internet. It provides flexibility, clarity and simplicity. An XML document may look similar to an HTML document and consists of the same human-readable tags. However, the tags used to markup HTML documents are pre-defined: only a limited set of tags can legitimately be included. XML allows you to create your own markup language and define the tags which are legitimate for your data. It does this via a schema document, which can itself be an XML document. The schema document specifies the vocabulary and grammar which can be used within the XML document which contains your information.

The fact that, when creating or generating XML documents, you can invent all the rules, means that you never have to twist and force your data into a container which was not designed to hold it. You design tags which reflect the nature of your information; you create a schema document which defines the hierarchical structure of that information; and you specify the type of data each element within your document is permitted to contain. In short, if you end up with an XML documents which is not suitable for holding your information, you have only yourself to blame!

About the Author:

0 comments: