Top Banner

of 36

Session02 XML Syntax

May 30, 2018

Download

Documents

Neeraj Singh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/14/2019 Session02 XML Syntax

    1/36

    2008 MindTree Consulting

    XML Syntax

    Rules of XML Language

    Sep-2009

  • 8/14/2019 Session02 XML Syntax

    2/36

    Slide 2

    Agenda

    Need for XML

    Quiz

    XML Syntax - Rules of XML language

  • 8/14/2019 Session02 XML Syntax

    3/36

    2008 MindTree Consulting

    Need For XMLRevision of previous session

    Quiz

  • 8/14/2019 Session02 XML Syntax

    4/36

  • 8/14/2019 Session02 XML Syntax

    5/36

    Slide 5

    XML

    How can you have both more tags and fewer tags in a singlelanguage?

    To resolve this dilemma, XML makes essentially two changes to HTML:

    It predefines no tags.

    It is stricter.

  • 8/14/2019 Session02 XML Syntax

    6/36

    Slide 6

    What is Markup

    In an electronic document, the markup is the codes, embeddedwith the document text, which store the information required for

    electronic processing, like font name, boldness or, in the case of

    XML, the document structure. This is not specific to XML. Every

    electronic document standard uses some sort of markup.

  • 8/14/2019 Session02 XML Syntax

    7/36Slide 7

    Applications of XML

    PublishingXML is being used by an increasing number of publishers as the format

    for documents.

    Example XML document for a monthly newsletter. As you can see, it uses

    elements for the title, abstract, paragraphs, and other concepts commonin publishing.

    Business Document Exchange

    For example placing the order in XML rather than on paper. Advantage is

    that software can process it. An application could read this order andautomatically fulfill it.

    RSS / Atom

    Eg Bloglines

  • 8/14/2019 Session02 XML Syntax

    8/36 2008 MindTree Consulting

    XML Introduction - QuizBasic questions on XML Introduction

  • 8/14/2019 Session02 XML Syntax

    9/36Slide 9

    XML Introduction - Quiz

    XML stands forXML is about the description of data, and not its presentation.

    XML allows us to define your own tags, so we can create our own

    markup languages.

    The XML specification is owned by W3C

    XML is designed to be both machine readable and human readable.

    XML provides a platform-neutral, language-independent means of

    describing data.Obviously, its the markup that differentiates the XML document

    from plain text.

  • 8/14/2019 Session02 XML Syntax

    10/36 2008 MindTree Consulting

    The XML SyntaxStart & End Tags, Elements, Element nesting XML Names, Attributes,

    XML Declaration, Entities, CDATA, Comments, Processing Instructions,Well formed XML

  • 8/14/2019 Session02 XML Syntax

    11/36Slide 11

    XML - Example

    Listing 2.1: An Address Book in XML

    John Doe

    34 Fountain Square Plaza

    OH45202

    CincinnatiUS

    513-555-8889

    513-555-7098

    JackSmith

    513-555-3465

  • 8/14/2019 Session02 XML Syntax

    12/36Slide 12

    Elements Start and End Tags

    The building block of XML is the element, as thats what comprisesXML documents. Each element has a name and a content.

    513-555-7098

    The content of an element is delimited by special markups known

    as start tag and end tag.

    Unlike HTML, both start and end tags are required. The following is

    not correct in XML:

    513-555-7098

  • 8/14/2019 Session02 XML Syntax

    13/36Slide 13

    Names in XML

    Element names must follow certain rules. As we will see, there are othernames in XML that follow the same rules.

    Names in XML must start with either a letter or the underscore character

    (_). The rest of the name consists of letters, digits, the underscore

    character, the dot (.), or a hyphen (-). Spaces are not allowed in

    names.

    Finally, names cannot start with the string xml, which is reserved for the

    XML specification itself.

    Unlike HTML, names are case sensitive in XML.

    By convention, XML elements are frequently written in lowercase. When a

    name consists of several words, the words are usually separated by a

    hyphen, as in address-book or written as AddressBook. Choose the

    convention that works best for you but try to be consistent.

  • 8/14/2019 Session02 XML Syntax

    14/36Slide 14

    Names in XML - Quiz

    The following are examples of valid or invalid element names inXML:

  • 8/14/2019 Session02 XML Syntax

    15/36Slide 15

    Attributes

    It is possible to attach additional information to elements in the form ofattributes.

    Attributes have a name and a value. The names follow the same rules as

    element names.

    The syntax is similar to HTML. Elements can have one or more attributes in

    the start tag, and the name is separated from the value by the equalcharacter.

    The value of the attribute is enclosed in double or single quotation marks.

    For example, the tel element can have a preferred attribute:

    513-555-8889Unlike HTML, XML insists on the quotation marks. The XML processor would

    reject the following:

    513-555-8889

  • 8/14/2019 Session02 XML Syntax

    16/36

  • 8/14/2019 Session02 XML Syntax

    17/36Slide 17

    Empty Element

    Elements that have no content are known as empty elements.Usually, they are enclosed in the document for the value of their

    attributes.

    There is a shorthand notation for empty elements: The start and

    end tags merge and the slash from the end tag is added at the endof the opening tag.

    For XML, the following two elements are identical:

    Quiz

    An empty element tag can have attributes. ( Yes / no)

  • 8/14/2019 Session02 XML Syntax

    18/36Slide 18

    Nesting of Elements

    Element content is not limited to text; elements can contain otherelements that in turn can contain text or elements and so on.

    An XML document is a tree of elements. There is no limit to the depth ofthe tree, and elements can repeat. As you see in Listing 2.1, there are twoentry elements in the address-book element. The entry for John Doe hastwo tel elements. Figure 2.1 is the tree of Listing 2.1. [Refer: XML Example

    slide]An element that is enclosed in another element is called a child. The

    element it is enclosed into is itsparent.

    Jack

    Smith

    Start and end tags must always be balanced and children are alwayscompletely enclosed in their parents. Following is legal or illegal?

    JackSmith

  • 8/14/2019 Session02 XML Syntax

    19/36Slide 19

    Root

    At the root of the document there must be one and only oneelement. In other words, all the elements in the document must be

    the children of a single element.

    Quiz: Following example is legal or illegal?

    John Doe

    JackSmith

  • 8/14/2019 Session02 XML Syntax

    20/36

    Slide 20

    XML Declaration

    TheXML declaration is the first line of the document. Thedeclaration identifies the document as an XML document. The

    declaration also lists the version of XML used in the document.

    The declaration can contain other attributes to support otherfeatures such as character set encoding.

    The XML declaration is optional.

    If the declaration is included however, it must start on the first

    character of the first line of the document. The XMLrecommendation suggests you include the declaration in every XML

    document.

  • 8/14/2019 Session02 XML Syntax

    21/36

    Slide 21

    XML Declaration Stand-alone document

    If an XML document can be read with no reference to external sources, it is said to

    be a stand-alone document. Such documents can be annotated with a standaloneattribute with a value of yes in the XML declaration. If an XML document requiresexternal sources to be resolved to parse correctly and/or to construct the entiredata tree (for example, a document with references to external general entities),then it is not a stand-alone document. Such documents may be markedstandalone='no', but because this is the default, such an annotation rarely appears in

    XML documents.XML declarations

  • 8/14/2019 Session02 XML Syntax

    22/36

    Slide 22

    Comments

    To insert comments in a document, enclose them between .

    Comments are used for notes, indication of ownership, and more.

    They are intended for the human reader and they are ignored by

    the XML processor.

    Comments cannot be inserted in the markup. They must appear

    before or after the markup.

  • 8/14/2019 Session02 XML Syntax

    23/36

    Slide 23

    Unicode

    Characters in XML documents follow the Unicode standard.

    XML uses the 16 bit Unicode character set.XML processor must recognize the UTF-8 and UTF-16 encodings.

    Most processors support other encodings. In particular, for WesternEuropean languages, they support ISO 8859-1 (the official name for Latin-1).

    Documents that use encoding other than UTF-8 or UTF-16 must start withan XML declaration. The declaration must have an attribute encoding toannounce the encoding used. For example, a document written in Latin-1(such as with Windows Notepad) could use the following declaration:

    Jos Dupont

  • 8/14/2019 Session02 XML Syntax

    24/36

    Slide 24

    XML Declaration - Quiz

    How the XML processor can read the encoding parameter. Indeed,to reach the encoding parameter, the processor must read the

    declaration. However, to read the declaration, the processor needs

    to know which encoding is being used.

    What about those documents that have no declaration (since thedeclaration is optional)?

  • 8/14/2019 Session02 XML Syntax

    25/36

  • 8/14/2019 Session02 XML Syntax

    26/36

    Slide 26

    Predefined Entities in XML

    XML predefines entities for the characters used in markup (angle brackets,

    quotes, and so on). The entities are used to escape the characters from

    element or attribute content. The entities are

    < left angle bracket must be escaped with > in the combination ]]> inCDATA sections (see the following)

    ' single quote can be escaped with ' essentially in parameter

    value

    " double quote can be escaped with " essentially in parameter

    valueQuiz Correct / Incorrect?

    Mark & Spencer

    Mark & Spencer

  • 8/14/2019 Session02 XML Syntax

    27/36

    Slide 27

    Character references

    XML also supports character references where a letter is replaced by its

    Unicode character code.

    DecimalUnicodeValue;

    Character references that start with provide a decimal representation of the character

    code.

    HexadecimalUnicodeValue;

    Character references that start with provides a hexadecimal representation of the

    character code.

    Example - Character references

    Martin

    Franais

  • 8/14/2019 Session02 XML Syntax

    28/36

    Slide 28

    Processing Instructions

    Processing instructions (abbreviated PI) is a mechanism to insertnon-XML statements, such as scripts, in the document.

    The processing instruction is enclosed in .

    The first name is the target. It identifies the application or the

    device to which the instructions are directed. The rest of theprocessing instructions are in a format specific to the target. It

    does not have to be XML.

  • 8/14/2019 Session02 XML Syntax

    29/36

    Slide 29

    CDATA Sections

    As you have seen, markup characters (left angle bracket and ampersand)that appear in the content of an element must be escaped with an entity.

    For some applications, it is difficult to escape markup characters, if only

    because there are too many of them. Also, it is difficult to include an XML

    document in an XML document.

    CDATA (Character DATA) sections are intended for these cases. CDATA

    sections are delimited by . The XML processor ignores

    all markup except for]]>

    PCDATA stands for parsed character data and means the element can

    contain text. #PCDATA is often (but not always) used for leaf elements.The difference between CDATA and PCDATA is that PCDATA cannot contain

    markup characters.

  • 8/14/2019 Session02 XML Syntax

    30/36

  • 8/14/2019 Session02 XML Syntax

    31/36

    Slide 31

    CDATA Section - Example

    The following example uses a CDATA section to insert an XMLexample into an XML document:

  • 8/14/2019 Session02 XML Syntax

    32/36

    Slide 32

    Well Formed XML

    The end tag matches the corresponding start tag, and there is:

    No overlapping in element definitions.

    No instances of multiple attributes with the same name for one element

    Syntax conforms to the XML Specifications

    Start-tags all have matching end-tags (or are empty-element tags).

    Element tags do not overlap.Attributes have unique names.

    Markup characters are properly escaped.

    Elements form a hierarchical tree, with a single root node.

    There are no references to external entities, except if a DTD is

    provided.

  • 8/14/2019 Session02 XML Syntax

    33/36

    Slide 33

    Well formed XML - example

    Suraj

    Kumar

    Verma

    IT Services

    C2

    , even these symbols don't bother it.]]>

    AbhiDhar

    R&D Services

  • 8/14/2019 Session02 XML Syntax

    34/36

    Slide 34

    Four Common Errors in XML Syntax

    Forget End TagsForget That XML Is Case Sensitive

    Introduce Spaces in the Name of Element

    John Doe

    Forget the Quotes for Attribute Value

    513-555-8889

  • 8/14/2019 Session02 XML Syntax

    35/36

  • 8/14/2019 Session02 XML Syntax

    36/36

    Thank you

    XML Technology, Semester 4

    SICSR Executive MBA(IT) @ MindTree, Bangalore, India

    By Neeraj Singh (toneeraj(AT)gmail(DOT)com

    )

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]