Top Banner
ISOM Standards in Information Management: XML Arijit Sengupta
60

ISOM Standards in Information Management: XML Arijit Sengupta.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Standards in Information Management: XML

Arijit Sengupta

Page 2: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Learning Objectives

• Learn what XML is

• Learn the various ways in which XML is used

• Learn the key companion technologies

• See how XML is being used in industry as a meta-language

Page 3: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Agenda

• Overview

• Syntax and Structure

• The XML Alphabet Soup

• XML as a meta-language

Page 4: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

OverviewWhat is XML?

• A tag-based meta language• Designed for structured data representation• Represents data hierarchically (in a tree)• Provides context to data (makes it meaningful)

Self-describing data

• Separates presentation (HTML) from data (XML)• An open W3C standard• A subset of SGML

vs. HTML, which is an implementation of SGML

Page 5: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

OverviewWhat is XML?

• XML is a “use everywhere” data specification

DocumentsConfiguration

Database

Application X

Repository

XML XML

XML XML

Page 6: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

OverviewDocuments vs. Data

• XML is used to represent two main types of things:Documents

• Lots of text with tags to identify and annotate portions of the document

Data• Hierarchical data structures

Page 7: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

OverviewXML and Structured Data

• Pre-XML representation of data:

• XML representation of the same data:“PO-1234”,”CUST001”,”X9876”,”5”,”14.98”

<PURCHASE_ORDER><PO_NUM> PO-1234 </PO_NUM><CUST_ID> CUST001 </CUST_ID><ITEM_NUM> X9876 </ITEM_NUM><QUANTITY> 5 </QUANTITY><PRICE> 14.98 </PRICE>

</PURCHASE_ORDER>

Page 8: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

OverviewBenefits of XML

• Open W3C standard• Representation of data across

heterogeneous environmentsCross platformAllows for high degree of interoperability

• Strict rulesSyntaxStructureCase sensitive

Page 9: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

OverviewWho Uses XML?

• Submissions byMicrosoft IBMHewlett-PackardFujitsu LaboratoriesSun MicrosystemsNetscape (AOL), and others…

• Technologies using XMLSOAP, ebXML, BizTalk, WebSphere, many

others…

Page 10: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Agenda

• Overview

• Syntax and Structure

• The XML Alphabet Soup

• XML as a meta-language

Page 11: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Syntax and StructureComponents of an XML Document

• Elements Each element has a beginning and ending tag

• <TAG_NAME>...</TAG_NAME> Elements can be empty (<TAG_NAME />)

• Attributes Describes an element; e.g. data type, data range, etc. Can only appear on beginning tag

• Processing instructions Encoding specification (Unicode by default) Namespace declaration Schema declaration

Page 12: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Syntax and StructureComponents of an XML Document

<?xml version=“1.0” ?><?xml-stylesheet type="text/xsl” href=“template.xsl"?><ROOT>

<ELEMENT1><SUBELEMENT1 /><SUBELEMENT2 /></ELEMENT1><ELEMENT2> </ELEMENT2><ELEMENT3 type=‘string’> </ELEMENT3><ELEMENT4 type=‘integer’ value=‘9.3’> </ELEMENT4>

</ROOT>

Prologue (processing instructions)

Elements

Elements with Attributes

Page 13: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Syntax and StructureRules For Well-Formed XML

• There must be one, and only one, root element• Sub-elements must be properly nested

A tag must end within the tag in which it was started

• Attributes are optional Defined by an optional schema

• Attribute values must be enclosed in “” or ‘’• Processing instructions are optional• XML is case-sensitive

<tag> and <TAG> are not the same type of element

Page 14: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Syntax and StructureWell-Formed XML?

• No, CHILD2 and CHILD3 do not nest properly

<xml? Version=“1.0” ?><PARENT>

<CHILD1>This is element 1</CHILD1><CHILD2><CHILD3>Number 3</CHILD2></CHILD3>

</PARENT>

Page 15: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Syntax and StructureWell-Formed XML?

• No, there are two root elements

<xml? Version=“1.0” ?><PARENT>

<CHILD1>This is element 1</CHILD1></PARENT><PARENT>

<CHILD1>This is another element 1</CHILD1></PARENT>

Page 16: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Syntax and StructureWell-Formed XML?

• Yes

<xml? Version=“1.0” ?><PARENT>

<CHILD1>This is element 1</CHILD1><CHILD2/><CHILD3></CHILD3>

</PARENT>

Page 17: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Syntax and StructureAn XML Document

<?xml version='1.0'?><bookstore> <book genre=‘autobiography’ publicationdate=‘1981’ ISBN=‘1-861003-11-0’> <title>The Autobiography of Benjamin Franklin</title> <author> <first-name>Benjamin</first-name> <last-name>Franklin</last-name> </author> <price>8.99</price> </book> <book genre=‘novel’ publicationdate=‘1967’ ISBN=‘0-201-63361-2’> <title>The Confidence Man</title> <author> <first-name>Herman</first-name> <last-name>Melville</last-name> </author> <price>11.99</price> </book></bookstore>

Page 18: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Syntax and Structure Namespaces: Overview

• Part of XML’s extensibility• Allow authors to differentiate between tags of

the same name (using a prefix)Frees author to focus on the data and decide how to

best describe itAllows multiple XML documents from multiple authors

to be merged

• Identified by a URI (Uniform Resource Identifier)When a URL is used, it does NOT have to represent

a live server

Page 19: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Syntax and Structure Namespaces: Declaration

xmlns: bk = “http://www.example.com/bookinfo/”

xmlns: bk = “urn:mybookstuff.org:bookinfo”

Namespace declaration examples:

Namespace declaration Prefix URI (URL)

xmlns: bk = “http://www.example.com/bookinfo/”

Page 20: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Syntax and Structure Namespaces: Examples

<BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE currency=‘US Dollar’>19.99</bk:PRICE>

<bk:BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”xmlns:money=“urn:finance:money”> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE money:currency=‘US Dollar’> 19.99</bk:PRICE>

Page 21: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Syntax and Structure Namespaces: Default Namespace

• An XML namespace declared without a prefix becomes the default namespace for all sub-elements

• All elements without a prefix will belong to the default namespace:

<BOOK xmlns=“http://www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR>

Page 22: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Syntax and Structure Namespaces: Scope

• Unqualified elements belong to the inner-most default namespace.BOOK, TITLE, and AUTHOR belong to

the default book namespacePUBLISHER and NAME belong to the

default publisher namespace<BOOK xmlns=“www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> <PUBLISHER xmlns=“urn:publishers:publinfo”> <NAME>Microsoft Press</NAME> </PUBLISHER></BOOK>

Page 23: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Syntax and Structure Namespaces: Attributes

• Unqualified attributes do NOT belong to any namespaceEven if there is a default namespace

• This differs from elements, which belong to the default namespace

Page 24: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Syntax and Structure Entities

• Entities provide a mechanism for textual substitution, e.g.

• You can define your own entities• Parsed entities can contain text and markup• Unparsed entities can contain any data

JPEG photos, GIF files, movies, etc.

Entity Substitution&lt; <

&amp; &

Page 25: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Agenda

• Overview

• Syntax and Structure

• The XML Alphabet Soup

• XML as a meta-language

Page 26: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’

• XML itself is fairly simple

• Most of the learning curve is knowing about all of the related technologies

Page 27: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’

XML Extensible Markup Language

Defines XML documents

Infoset Information Set Abstract model of XML data; definition of terms

DTD Document Type Definition

Non-XML schema

XSD XML Schema XML-based schema language

XDR XML Data Reduced An earlier XML schema

CSS Cascading Style Sheets Allows you to specify styles

XSL Extensible Stylesheet Language

Language for expressing stylesheets; consists of XSLT and XSL-FO

XSLT XSL Transformations Language for transforming XML documents

XSL-FO XSL Formatting Objects

Language to describe precise layout of text on a page

Page 28: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’

XPath XML Path Language A language for addressing parts of an XML document, designed to be used by both XSLT and XPointer

XPointer XML Pointer Language

Supports addressing into the internal structures of XML documents

XLink XML Linking Language

Describes links between XML documents

XQuery XML Query Language (draft)

Flexible mechanism for querying XML data as if it were a database

DOM Document Object Model

API to read, create and edit XML documents; creates in-memory object model

SAX Simple API for XML API to parse XML documents; event-driven

Data Island XML data embedded in a HTML pageData Binding

Automatic population of HTML elements from XML data

Page 29: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’ Schemas: Overview

• DTD (Document Type Definitions)Not written in XMLNo support for data types or namespaces

• XSD (XML Schema Definition)Written in XMLSupports data typesCurrent standard recommended by W3C

Page 30: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’ Schemas: Purpose

• Define the “rules” (grammar) of the document Data types Value bounds

• A XML document that conforms to a schema is said to be valid More restrictive than well-formed XML

• Define which elements are present and in what order

• Define the structural relationships of elements

Page 31: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’ Schemas: DTD Example

• XML document:

• DTD schema:<!DOCTYPE BOOK [<!ELEMENT BOOK (TITLE+, AUTHOR) ><!ELEMENT TITLE (#PCDATA) ><!ELEMENT AUTHOR (#PCDATA) >]>

<BOOK> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR></BOOK>

Page 32: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’ Schemas: XSD Example

• XML document:

<CATALOG> <BOOK> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> </BOOK> …</CATALOG>

Page 33: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’ Schemas: XSD Example

<xsd:schema id="NewDataSet“ targetNamespace="http://tempuri.org/schema1.xsd" xmlns="http://tempuri.org/schema1.xsd" xmlns:xsd="http://www.w3.org/1999/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"> <xsd:element name="book"> <xsd:complexType content="elementOnly"> <xsd:all> <xsd:element name="title" minOccurs="0" type="xsd:string"/> <xsd:element name="author" minOccurs="0" type="xsd:string"/> </xsd:all> </xsd:complexType> </xsd:element> <xsd:element name=“Catalog" msdata:IsDataSet="True"> <xsd:complexType> <xsd:choice maxOccurs="unbounded"> <xsd:element ref="book"/> </xsd:choice> </xsd:complexType> </xsd:element></xsd:schema>

Page 34: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’ Schemas: Why You Should Use XSD

• Newest W3C Standard• Broad support for data types• Reusable “components”

Simple data types Complex data types

• Extensible• Inheritance support• Namespace support• Ability to map to relational database tables• XSD support in Visual Studio.NET

Page 35: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’ Transformations: XSL

• Language for expressing document styles

• Specifies the presentation of XML More powerful than CSS

• Consists of:XSLTXPathXSL Formatting Objects (XSL-FO)

Page 36: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’ Transformations: Overview

• XSLT – a language used to transform XML data into a different form (commonly XML or HTML)

XML,HTML,

XML

XSLT

Page 37: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’ Transformations: XSLT

• The language used for converting XML documents into other forms

• Describes how the document is transformed• Expressed as an XML document (.xsl)• Template rules

Patterns match nodes in source documentTemplates instantiated to form part of result

document

• Uses XPath for querying, sorting, etc.

Page 38: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’ XPath (XML Path Language)

• General purpose query language for identifying nodes in an XML document

• Declarative (vs. procedural)

• Contextual – the results depend on current node

• Supports standard comparison, Boolean and mathematical operators (=, <, and, or, *, +, etc.)

Page 39: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’ XPath Operators

Operator Usage Description/ Child operator – selects only immediate children

(when at the beginning of the pattern, context is root)

// Recursive descent – selects elements at any depth (when at the beginning of the pattern, context is root)

. Indicates current context

.. Selects the parent of the current node

* Wildcard

@ Prefix to attribute name (when alone, it is an attribute wildcard)

[ ] Applies filter pattern

Page 40: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’ XPath Query Examples

./author (finds all author elements within current context)

/bookstore (find the bookstore element at the root)

/* (find the root element)

//author (find all author elements anywhere in document)

/bookstore[@specialty = “textbooks”] (find all bookstores where the specialty

attribute = “textbooks”)

/book[@style = /bookstore/@specialty] (find all books where the style attribute = the specialty attribute of the bookstore element at the root)

Page 41: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

More XPath Examples

Path Expression Result

/bookstore/book[1] Selects the first book element that is the child of the bookstore element

/bookstore/book[last()] Selects the last book element that is the child of the bookstore element

/bookstore/book[last()-1] Selects the last but one book element that is the child of the bookstore element

/bookstore/book[position()<3] Selects the first two book elements that are children of the bookstore element

//title[@lang] Selects all the title elements that have an attribute named lang

//title[@lang='eng'] Selects all the title elements that have an attribute named lang with a value of 'eng'

/bookstore/book[price>35.00] Selects all the book elements of the bookstore element that have a price element with a value greater than 35.00

/bookstore/book[price>35.00]/title Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00

Page 42: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

XPath Functions

• Accessor functions:node-name, data, base-uri, document-uri

• Numeric value functions:abs, ceiling, floor, round, …

• String functions:compare, concat, substring, string-length,

uppercase, lowercase, starts-with, ends-with, matches, replace, …

• Other functions include functions on boolean values, dates, nodes, etc.

Page 43: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

• XML embedded in an HTML document• Manipulated via client side script or data

binding

<XML id=“XMLID”> <BOOK> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> </BOOK></XML>

<XML id=“XMLID” src=“mydocument.xml”>

The XML ‘Alphabet Soup’ Data Islands

Page 44: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’ Data Islands

• Can be embedded in an HTML SCRIPT element

• XML is accessible via the DOM:

<SCRIPT language=“xml” id=“XMLID”><SCRIPT type=“text/xml” id=“XMLID”><SCRIPT language=“xml” id=“XMLID” src=“mydocument.xml”>

Page 45: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

The XML ‘Alphabet Soup’ XML-Based Applications

• Microsoft SQL ServerRetrieve relational data as XMLQuery XML dataJoin XML data with existing database tablesUpdate the database via XML UpdategramsNew XML data type in SQL 2005

• Microsoft Exchange ServerXML is native representation of many types of dataUsed to enhance performance of UI scenarios (for

example, Outlook Web Access (OWA))

Page 46: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Agenda

• Overview

• Syntax and Structure

• The XML Alphabet Soup

• XML as a meta-language

Page 47: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

XML as a Meta-Language

XML/DTD

CSS

XSL

DSSL

XSLT

DOMSAX

XLL

XSchema

XPath

XPointer

MathML

BeanML

CML

WMLXQL

A Language to

create Languages

GO

Page 48: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Gene Ontology (GO)

• Describing and manipulating information about the molecular function, biological process and cellular component of gene products.

• Gene Ontology website: http://www.geneontology.org

• GO DTD: ftp://ftp.geneontology.org/pub/go/xml/dtd/go.dtd

• GO Browsers and tools: http://www.geneontology.org/#tools

• GO Resources and samples: http://www.geneontology.org/#annotations

Page 49: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Math ML

• Describing and manipulating mathematical notations

• MathML website www.w3.org/Math

• MathML DTD www.w3.org/Math/DTD

• MathML Browser www.w3.org/Amaya

• MathML Resources www.webeq.com/mathml see sample documents here

Page 50: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Chemical ML

• Representing molecular and chemical information• CML website

www.xml-cml.org

• CML DTD www.xml-cml.org/dtdschema/index.html

• CML Browser and Authoring Environment www.xml-cml.org/jumbo.html

• CML Resources www.xml-cml.org/chimeral/index.html see sample documents here some require plug-in downloads, can be slow

Page 51: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Wireless ML

• Allows web pages to be displayed over mobile devices

• WML works with WAP to deliver the content

• Underlying model: Deck of Cards that the User can sift through

• WAP/WML website www.wapforum.org

• WML DTD www.wapforum.org/DTD/wml_1.1.xml

• WAP/WML Resources www.oasis-open.org/cover/wap-wml.html www.w3scripts.com/wap Tutorial on WML, also see WAP Demo

Page 52: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Scalable Vector Graphics

• Describing vector graphics data for use over the web

• Rendering is done on the browser

• Bandwidth demands lower, scaling easier

• SVG website www.w3.org/Graphics/SVG

• SVG Plug-Ins www.adobe.com/svg

• SVG Resources www.irt.org/articles/js176 1999 article and good, brief

tutorial planet.svg An Example from Deitel

Page 53: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Bean ML

• Describing software components such as Java Beans• Defines how the components are interconnected and

can be used• Bean ML Specs and Tools

www.alphaworks.ibm.com/aw.nsf/techmain/bml

• Bean ML Resources www.oasis-open.org/cover/beanML.html With Bean ML

• You can mark-up beans using Bean ML

• And invoke different operations on Beans

• Includes BML Scripting Framework

Page 54: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

XBRL

• Extensible Business Reporting Language• Capturing and representing financial and accounting information• Variety of situations

e.g. publishing reports, extracting data for analysis, regulatory forms etc.

• Initiated under the direction of AICPA• XBRL website

www.xbrl.org

• XBRL DTDs and Schemas http://www.xbrl.org/Core/2000-07-31/default.htm

• Demos and Tools http://www.xbrl.org/Demos/demos.htm http://www.xbrl.org/Tools.htm

Page 55: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

News ML

• Designed to be media-independent• Initiated by International Press

Telecommunications Council• Enables tracking of news stories over time• NewsML website

www.newsml.org

• NewsML DTD http://www.oasis-open.org/cover/newsML.html

• SportsML DTD – Derived from NewsML DTD http://xml.coverpages.org/sportsML.html

Page 56: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

cXML

• CommerceXML from Ariba plus 40 other companies• cXML website

www.cxml.org

• Primary Set of Tools/Implementations to support cXML http://www.ariba.com/solutions/solutions_overview.cfm See also Whitepapers link explaining how these can be

used for • E-procurement• E-fulfillment• And others ..

Page 57: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

xCBL

• xCBL from Microsoft, SAP, Sun• xCBL website

www.xcbl.org Marketed as XML component library for B2B

e-commerce

• Available Resources (see internal links) DTDs and SchemasXDK: SOX Parser and an XSLT EngineExample Documents

Page 58: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

ebXML

• UN/CEFACT: the United Nations body whose mandate covers worldwide policy and technical development in the area of trade facilitation and electronic business. www.uncefact.org

• ebXML website www.ebxml.org

• Current Endorsements http://www.ebxml.org/endorsements.htm Still needs buy-in from the larger IS/IT vendors

• Related Effort: RosettaNet http://www.rosettanet.org/rosettanet/Rooms/DisplayPages/

LayoutInitial Business Processes for IT, Component and Chip companies

Page 59: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Conclusion

• Overview

• Syntax and Structure

• The XML Alphabet Soup

• XML as a meta-language

Page 60: ISOM Standards in Information Management: XML Arijit Sengupta.

ISOM

Resources

• http://www.xml.com/• http://www.w3.org/xml/• http://www.w3schools.com/• http://msdn.microsoft.com/xml/