XML is everywhere. Computers, Mobiles, Bank Systems, Internet, TVs, Microwaves, all use XML as an Information Wrapping and Information Xchange System. We will tell you all the basics in a simplest possible way.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
A comment by Tim Bray of Sun Microsystems on Celebration of
10th Anniversary of XML in Feb 2008. "There is essentially no computer in the world, desk-top, hand-held, or back-room, that doesn't process XML sometimes. This is a good thing, because it shows that information can be packaged and transmitted and used in a way that's independent of the kinds of computer and software that are involved. XML won't be the last neutral information wrapping system; but as the first, it's done very well."
Outline
XML Eye-opener. What is XML? HTML vs. XML. Basic XML Syntax. Constituents. Some XML Rules. Element Vs. Attribute. Node Naming Principles. Advanced Concepts related to XML Future of XML
XML Eye Opener
SIMPLE: So simple that you would wonder, why you were not trying to understand it till date.
SUCCESSFUL: Most successful data storage format till date that even big brand who were strong believers of proprietary formats for commercial reasons have started using it.
SOLID: Most solid ageless concept that this generation will pass-on to other future generations and they will keep the baton moving.
What is XML-1
XML is abbreviation of eXtensible Markup Language.
XML evolved from more general purpose ISO standard SGML (Standard Generalised Markup Language).
All Data needs Description to make it some useful Information. XML provides a neat solution.
XML looks like normal English but it has been designed to be machine readable.
What is XML-2
XML can store data
XML can help standardization in exchange of data.
User defined markup tags to name dataitems.
Library Functions are available in most programming languages to parse XML.
Five predefined Entities to allow for special charaters in the PCDATA area. > to > < to < & to & ' to ' " to "
CDATA section (Character Data Not to be parsed). This is meant for putting lot of code like or general purpose data. Even HTML data can be put here. <![CDATA[ ... ]]>
Processing Instructions (PI) or Directives given betweem <? ?><?xml-stylesheet type="text/css" href="mySheet.css"?> or even initial declaration like below is a PI<?xml version="1.0" encoding="UTF-8" standalone="no"?>
Parsable Character data (PCDATA) between element <address> start and end tags.
Attribute has a name and a value in quotes.
Some XML Rules - 1 All elements to have closing tags.
</Book></Library>This is because ISBN is a book related property while ID may be related to a storage place.
XML Node Naming – Begins with
Node (elements or attributes) names shall begin with a letter or _ (underscore).<1STLINE></1STLINE> invalid element naming <LINE1></LINE1> valid naming <BOOK 1Ver="1.00"></BOOK> invalid attribute naming<BOOK _Ver="1.00"></BOOK> valid attribute naming
XML Node Naming – Consists of
Name can consist of Any English Character or even any foreign
language character as allowed by the encoding set given in the declaration.
Tabs and Spaces are not allowed in XML Node Names.
XML Node Naming – Based on Namespace Name can belong to a namespace
Table may be used in html or furniture. One can resolve this problem by using namespaces as follows<h:table> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr></h:table>
(elements and attributes) e.g. <H1>Heading1</H1> or <font face="Verdana"></font>.
Both use entities e.g. < > etc.
Both are derived from SGML
HTML Vs XML - 2
Differences.HTML has predefined tags,
XML tags are user defined.HTML is for Humans and
errors are ignored. XML is for computers as data storehouse or definitions so errors can not be ignored.
HTML is usually not updated by programs while XML is meant for program based writing.
HTML has large number of entities. XML has just five.
XSL (Extensible Stylesheet Language)
Unlike HTML styling using CSS (Cascade Style Sheet) it has tags that are user defined.
It has three parts XSLT (XSL Transformation): for showing
XML data as transformed XHTML onto a webpage.
Xpath: a way to reach a particular data-item in an XML file. This is very often useful in reading XML based configuration files.
XSL-FO (XSL Formatting Objects): Provides a display/print formatting mechanism for XML data.
DTD (Document Type Definition)
A DTD is referred within a DOCTYPE declaration in an XML file such as.<!DOCTYPE note SYSTEM "Note.dtd">
This DTD file will have the format as follows.
<!DOCTYPE note[<!ELEMENT note (to,from,heading,body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>]>
XML file has the root node named note with four sub-elements.
The sub-elements have the PCDATA format.
Parsing XML
Process of reading XML file and extracting valid data out of it is called "PARSING".
Parsers are of two typesNon-Validating Parser: When the
document doesn't check against a validating DTD.
Validating Parser: When a document is checked against its DTD
Some Advanced Concepts Related to XML
XML Schema: Relates to defining validation rules in form of XSD (XML Schema Definition) files that too are in the XML format.
XQuery: This is a way to search within an XML file and get the selected nodes that match the criteria.
Where to View/Edit
Browsers: Most Browsers are good at viewing XML. Internet Explorer is particularly good at it.
Editors: Special Editors are available that allow good XML views/editing facilities. Microsoft's XML Editor, Peter's XML editor are good at it.
Office Tools: MS-Word, Frontpage like tools provide good XML Editing. Even MS-Excel support XML file opening.
Visual Studio/WebDeveloper: They provide excellent environment for XML editing and viewing along with validation support.
Let's Quickly Revise
2 Types of Nodes: Elements and Attributes. Elements are repeatable. Attributes can always be put up like elements, reverse may not be true.
Special syntax for non-parsable data as CDATA.
5 Entities for special symbols( <, >, ', ", &).
HTML style Comments Allowed. <!-- comments -->
Case-Sensitive. Closing Required
One can apply other Processing Instructions (PI) that is enclosed with in <? ?>. First line is usually a Version declaration line which is also a PI.
Always have a single root node.
Future of XML
All websites may one day be written in XML. HTML has already been re-standardised as XHTML which provides better syntax checking and browser compatibility.
XML promises to be the most open system for storage of information from all IT gadgets like Desktops to Mobile phones to ipods to ipads to DVD players to microwave-ovens etc. It is already being used and it is expected to be used in more and more devices.
All office documents/e-books offline and online shall ultimately be in XML as it is the sole non-proprietary format that is simple and is able to meet the needs well.