Top Banner

of 34

Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

Feb 27, 2018

Download

Documents

Duy
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    1/34

    Copyright IBM Corporation 2004

    Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

    Welcome to:

    3.1

    What Is XMLWhat Is XML??

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    2/34

    Copyright IBM Corporation 2004

    Unit Objectives

    After completing this unit, you should be able to:

    Describe the basic rules of XML

    Describe what it means for an XML document to be well-formed

    List the components that make up an XML document

    Differentiate between XML and HTML

    Describe the internationalization support in XML

    Define some best practices for XML

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    3/34

    Copyright IBM Corporation 2004

    What Is XML?

    At its core XML is text formatted to follow a well-defined set of rules.

    XML documents consist primarily of tags and text.

    If you've ever seen the source to an HTML document, then theXML structureshould look familiar

    This text may be stored/represented in:A normal file stored on disk

    A message being sent over HTTP

    A character string in a programming language

    A CLOB (character large object) in a database

    Any other way textual data can be used

    XML documents do notneed to exist as documents --they may be:

    Byte streams sent between applications

    Fields in a database record

    Collections of XML Infoset information items

    For simplicity they will be referred to as though they aredocuments and files.

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    4/34 Copyright IBM Corporation 2004

    XML documents should be thought of as a hierarchical tree

    structure.

    Example Tree Representation of XML

    "Tom

    Wolfe"

    "$6.00""TheRight

    Stuff"

    ROOT

    =

    Tom Wolfe

    The Right Stuff

    $6.00

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    5/34 Copyright IBM Corporation 2004

    "Optional" first line; only required ifencoding IS NOT UTF-8 or UTF-16*

    Root element start tag

    Alphabet from A to Z

    First child element with data

    Empty element (no data)

    Begin element tag

    Boreng Riter Nested child elements

    End element tag

    The letter A is the first in the alphabet. It is also the first of five vowels.

    Element containing an attribute andparsed character data (PCDATA) [TBD]

    Comment

    The letter Z is the last letter in the alphabet.

    Last element in document

    Root element end tag

    A Simple XML Document - Basic Structure

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    6/34 Copyright IBM Corporation 2004

    A Simple XML Document -Basic Nomenclature

    The XML instance on the previous page consists of:

    One main elementbook

    Subelementstitle, isbn, author, chapter, and comment

    Authorcontains other subelements firstNameand lastNameISBNand chaptercontain attributesnumberand title, respectively

    Title, firstName, and lastNamecontain only strings:

    Elements that contain numbers, strings, dates, and so forth (TBD) but no

    subelements (or attributes) are said to have simple types

    ISBN and chapter carry attributes; author has subelements:

    Elements that contain subelements or carry attributes are said to havecomplex types

    Attributes always have simple types (that is, they are numbers, strings,dates, and so forth.

    TBD -- In a later chapter we describe XML Schemas which have access to

    a collection of built-in simple types

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    7/34 Copyright IBM Corporation 2004

    Basics of Well-formedXML (1 of 2)

    XML documents are considered to be well-formedwhen they

    adhere to a set of five rules that define basic XML syntax andstructure + a sixth for worldwide conformity.

    1. There must be a single root element:

    All other elements are nested inside the root element

    2. Elements must be properly terminated:For every opening tag"" there must be a matching closing tag

    ""The exception is an empty (no content or body) tag ""

    3. Elements must be properly nested underneath a parent tag

    (except for the single, root element):A nested tag-pair may not overlap another tagThere is no limit to the nesting level of children elements

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    8/34 Copyright IBM Corporation 2004

    Basics of Well-formedXML (2 of 2)

    4. Tag names are case sensitive:

    All tag and attribute names, attribute values, and data must complywith XML naming rules.

    5.Attributes, extra information that can be provided for elements,

    must be properly quoted:That is, all attributevalues must be in quotes.

    6. The first line should/must contain the special tag that identifies

    the version of the XML specification to apply:XML 1.0 is currently the most common.

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    9/34 Copyright IBM Corporation 2004

    Element Rules - Rule 1. Single Root Element

    All XML documents must have a single root element.

    Legal: Not legal:

    red green

    red

    green

    Colors is the root element forthis XML.

    Color represents multiple rootelements.

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    10/34 Copyright IBM Corporation 2004

    Element Rules - Rule 2. Element Tag Rules

    Elements consist of start and end tags.

    End tag is identified by the /.Example: red

    Elements may contain attributes within the start tag.

    Example: Note: The attribute is isbn.

    Empty elements contain no child elements or data.

    These elements can be represented with a special shorthandnotation.

    Example:

    Can be shortened to:(preferred)

    Or, if the element has no data as:

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    11/34 Copyright IBM Corporation 2004

    Element Rules - Rule 3. Element Nesting

    Elements must be properly nested.

    The end tags of inner elements must occur before the end tags ofouter elements.

    Any number of child elements or data may be nested within the startand end tags of an element.

    El t N ti E l

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    12/34 Copyright IBM Corporation 2004

    Element Nesting Example

    Legal: Not legal: Polo

    red large

    large red Polo

    All elements are properly nested.

    The element tags are mixed up

    and not ordered.

    Best Practice:Use indentation to represent the document's hierarchy.

    Important if your document will likely be read by humans.Computers and programs don't usually care.

    El t R l R l 4 XML N i R l

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    13/34

    Copyright IBM Corporation 2004

    Element Rules - Rule 4. XML Naming Rules

    XML name construction:

    The first character must be A-Z, a-z, or _ (underscore)Any number of subsequent letters, numbers, hyphens,periods, colons, and underscore characters.

    XML names are case sensitive.Names cannot contain spaces.

    Names must not have a prefix of xml in any case combination

    (such names are reserved).Best Practice:Brevity in tag names is not necessary.

    Use descriptive names for elements and attributes.

    oris far better than.Best Practice:Maintain standard naming conventions andquoting.

    Camelback, dot and underscore notation are all common(For example, camelBackNotation, dot.notation, andunderscore_notation).

    R l 4 T N i S l

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    14/34

    Copyright IBM Corporation 2004

    Rule 4. Tag Naming - Samples

    Legal Not Legal Commentstitle, book.isdn,lastName, _street,addrLine1, name:first

    1name, -street,&name

    Examples of legal andillegal element names.

    red

    small

    red

    small

    Element names arecase sensitive andstart and end tagsmust match.

    John

    John

    Element names must

    not contain spaces.

    John

    John

    Elements must notcontain any W3Creserved words.

    R l 4 El t C t t (1 f 2) G l

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    15/34

    Copyright IBM Corporation 2004

    Rule 4. Element Content (1 of 2): General

    An XML instance is composed of elementsexpressed in tag pairs

    (except for empty tags) plus optional attributesthat always havequoted values and optional datathat appears between the elementstart tag and the element end tag.

    Mixed content - element content that contains data (PCDATA is

    shown) and other elements.

    Example (snippet):XMLExample

    Chapter informationWhat is XML

    What is HTMLMore chapter information

    Rule 4 Element Content (2 of 2): Data

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    16/34

    Copyright IBM Corporation 2004

    Rule 4. Element Content (2 of 2): Data

    Element data content is handled in one of two ways:

    1. Parsed Character Data (PCDATA): is examined by the XMLparser to discover XML content embedded within it.

    2. Character Data (CDATA): is delimited by the special syntax and is not processed by the parser.

    Rule 4 PCDATA Parsed Character Data

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    17/34

    Copyright IBM Corporation 2004

    Rule 4. PCDATA - Parsed Character Data

    Predefined entities exist to address ambiguous syntax situations,

    situations where the literal would be interpreted as part of theXML document syntax rather than its content.

    Examples:> 6& < 20

    Entity Description Character

    < "less than" "greater than" >

    & "ampersand" &' "apostrophe" '" "quote" "

    Rule 4 CDATA Character Data

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    18/34

    Copyright IBM Corporation 2004

    Rule 4. CDATA - Character Data

    Syntax:

    Note: Anything except the literal string "]]>";to embed "]]>" use "]]>"

    CDATA is not parsed and is treated as-is.Useful for embedding other languages within the XML.

    HTML documents.

    XML documents.

    JavaScript source.

    Or any other text with a lot of special characters.

    Generally speaking the escaping rules inside a CDATA section are

    those of the embedded languageFor example, to escape an ampersand in Javascript use &.

    Rule 4 CDATA Examples

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    19/34

    Copyright IBM Corporation 2004

    Rule 4. CDATA Examples

    These script elements contain JavaScript:

    This nameXML element stores actual XML to be treated as text:

    { return 1 } else { return 0 }}

    ]]>

    { return 1 } else { return 0 }}]]>

    Sir Frederick of Ledyard's End

    ]]>

    Element Rules - Rule 5 Element Attributes

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    20/34

    Copyright IBM Corporation 2004

    Element Rules - Rule 5. Element Attributes

    Attributes are used to attach information to elements.

    Attributes consist of a name="value" pair, where the name is a legalXML name. This is often referred to as a "key-value" pair.

    Attributes are placed in the start tag of the element to which they

    apply.An element may have several attributes, each uniquely named.

    Examples:XML overview

    Yacht

    Notice the different usage of the attribute "type" in the two elements;semantically they are not the same.

    Attributes must have a value.

    Values must be quoted with either double or single quotes.

    Convention is to stick with one or the other.

    Element Rules - Rule 6

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    21/34

    Copyright IBM Corporation 2004

    Element Rules - Rule 6.XML Declaration (1 of 2)

    The XML Declaration is an optional first line in all XML documents:

    If this declaration is used, the version attribute is mandatory.The encodingattribute indicates the character encoding used in thedocument; if UTF-8 or UTF-16 is used it may be omitted.

    ASCII is a subset of UTF-8 and need not be declared.

    Comments are notallowed before this statement.The XML Declaration follows the syntax of a Processing Instructionor PI,which is described on a subsequent chart, but it is considered to beunique and is treated separately in the 1.0 XML specification.

    GENERAL NOTE OF CAUTION: You can not always rely on a browser ortool to completely/correctly enforce the specifications. Nor are thespecifications alwayswritten in language that, to a particular reader, is

    unambiguous. Still, the best advice is when in doubt, refer to thespecification, which for XML is www.w3.org/XML.

    Element Rules - Rule 6

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    22/34

    Copyright IBM Corporation 2004

    The stand-aloneattribute is included here for completeness: it is used toindicate if this XML document depends on information declared externally tothis document (in a DTD or XSL file (TBD), for examples); value may be yesor no.

    A value of "yes" indicates there are no external markup declarations; ifthere are no external markup declarations, the declaration has nomeaning.

    A value of "no"indicates there are or may be such external markupdeclarations; if there are such declarations but there is no standalone

    declaration, "no" is assumed.. . . so it is typically not used.

    In any event, the inclusion in the XML instance of references to externalentities, such as those in an embedded DTD, does not change its

    standalonestatus.

    A bigger issue associated with the stand-aloneattribute is that of defining orsetting values in anyentity that may be external to the XML instance.

    Arguably, the principal reason for using XML is that it explicitly defines theelements it includes. If attribute values are overridden then the XMLinstance before us is no longer declarative.

    Element Rules - Rule 6.XML Declaration (2 of 2)

    Comments

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    23/34

    Copyright IBM Corporation 2004

    Comments

    Defines a comment.A space after the beginning and before the trailing hyphens isrecommended but not required.

    A is the first letterZ is the last letter

    Improper usage:

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    24/34

    Copyright IBM Corporation 2004

    Internationalization and Encoding (1 of 2)

    Support for different character encodings is provided through theencoding attribute of the XML Declaration.

    The encoding attribute indicates the set of characters that are

    permitted in the document.In the absence of an encoding declaration, Unicode UTF-8 orUTF-16 characters may be used.

    Documents exchanged via network may be presented to theprocessor in an encoding format other than the specified encodingas long as the transport protocol (for example, HTTP) indicates theencoding used.

    Internationalization and Encoding (2 of 2)

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    25/34

    Copyright IBM Corporation 2004

    Internationalization and Encoding (2 of 2)

    It is very important that the editor and operating system used to

    write and save an XML document support the encoding specified inthe XML Declaration.

    Sample encoding declarations:

    ASCII (subset of UTF-8)

    16 bit UNICODE...

    Japanese

    ...

    Note: Encoding names are case-insensitive

    Processing Instruction

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    26/34

    Copyright IBM Corporation 2004

    Processing Instruction

    Syntax

    Processing Instruction is often abbreviated as PI indocumentation.

    A feature inherited from SGML.

    Used to embed application-specific instructions in documents.

    The target name immediately follows "

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    27/34

    Copyright IBM Corporation 2004

    e o ed e sus a d

    A well-formed XML document:

    Consists of XML elements that are nested within another.Has a unique root element.

    Follows the XML naming conventions.

    Follows the XML rules for quoting attributes.

    Has tags that are properly terminated.All XML parsers check for well-formedness.

    A valid XML document has an associated vocabularyand obeys the

    structural rules specified by that vocabulary.Associated vocabulary is typically defined by either a DTD or anXML Schema.

    XML parsers may be validating or non-validating depending upon

    whether or not they can apply an associated grammar.Studio is an example of a tool whose XML capabilities includevalidation.

    HTML versus XML (1 of 2)

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    28/34

    Copyright IBM Corporation 2004

    XML is about structured informationinterchange

    HTML is about presentation andbrowsing

    ( )

    Java ProgrammingEECS

    Paul Thompson

    Ron Jones

    Uma Abingdon

    Lindsay Garmon

    HTML versus XML (2 of 2)

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    29/34

    Copyright IBM Corporation 2004

    ( )

    HTML XMLCourse Roster Course Roster

    XML Programming Department: EECS

    Teacher Paul Thompson Student
    List Ron Jones
    Uma Abingdon

    Lindsay Garmon

    Java Programming EECS

    Paul Thompson Ron Jones Uma Abingdon Lindsay Garmon

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    30/34

    Checkpoint Questions (1 of 3)

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    31/34

    Copyright IBM Corporation 2004

    p ( )

    1. Basic XML can be described as:

    A. A hierarchical structure of tagged elements, attributes and text.B. All the HTML tags plus a set of new XML only tags.

    C. Object-oriented structure of rows and columns.

    D. Processing instructions (PIs) for text data.

    E. Textual data with tags for visual presentation.

    2. Which of these XML fragments is not well-formed?

    A. XMLB. XML

    C.

    D. XMLXML

    E. XML

    Checkpoint Questions (2 of 3)

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    32/34

    Copyright IBM Corporation 2004

    3. XML Comments are allowed (Select all that apply):

    A. Before the XML DeclarationB. Anywhere

    C. Between element tags

    D. Before the root element

    E. All of the Above

    4. Which of these XML elements with attributes is not well-formed?

    A. B.

    C.

    D.E.

    F. All of the Above

    Checkpoint Questions (3 of 3)

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    33/34

    Copyright IBM Corporation 2004

    5. Which of these comments regarding HTML and XML is not true?

    A. HTML markup is focused on presentation.B. XML markup is based on defining the data.

    C. XML is based on HTML.

    D. HTML tags are not case sensitive.

    E. XML tags are case sensitive.F. Both XML and HTML support attributes.

    Unit Summary

  • 7/25/2019 Lecture Introduction to XML Lecture Introduction to XML What is XML.pdfWhat is XML

    34/34

    Copyright IBM Corporation 2004

    Having completed this unit, you should be able to:

    Describe the basic rules of XML

    Describe what it means for an XML document to be well-formed

    List the components that make up an XML document

    Describe the differences between XML and HTML

    Describe the internationalization support in XML

    Describe some best practices in XML