Structuring XML Using Structuring XML Using DTD and Schema DTD and Schema Ching-Long Yeh, PhD, 葉 葉 葉 Department of Computer Science and Engineering Tatung University Taipei 104, Taiwan Email: [email protected]URL: http://www.cse.ttu.edu.tw/~chingyeh
65
Embed
Structuring XML Using DTD and Schema Ching-Long Yeh, PhD, 葉 慶 隆 Department of Computer Science and Engineering Tatung University Taipei 104, Taiwan Email:
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Structuring XML Using Structuring XML Using DTD and SchemaDTD and Schema
Ching-Long Yeh, PhD, 葉 慶 隆Department of Computer Science and EngineeringTatung UniversityTaipei 104, TaiwanEmail: [email protected]: http://www.cse.ttu.edu.tw/~chingyeh
Structuring XML 2
ContentContent
• XML Document Basics• DTD Syntax Review• An Introduction to XML Schema• Types of Interaction with Document• DTD in Electronic Business• Conclusion
XML Document BasicsXML Document Basics
Structuring XML 4
Structure, Content, and FormatStructure, Content, and Format
• Central to XML is the concept that documents have structure, content, and format.
• These three ingredients combine to form a document.
• They interrelate in subtle ways, and you can easily confuse them as you work with your documents.
Structuring XML 5
What is Structure?What is Structure?
• The structure defines how the document is laid out and in what order elements are assembled
• For example, a bicycle assembly manual might consist of of the following section in this order: – an introduction that described the document and
lists the manufacturer’s address,
– assembly instructions,
– a part list,
– instruction for order replacement parts,
– troubleshooting advice, and
– index.
Structuring XML 6
What is Content?What is Content?
• Content is the actual data within a
document.
• The words and illustrations that make up a bicycle assembly manual are its contents.
Structuring XML 7
What is Format?What is Format?
• Format consists of how the words, sentences, and paragraphs are visually presented and distinguished from one another within a document.
• Boldface for title, italics for special terms, and blank lines between sections are examples of document formats.
• People often confuse format with structure.
Structuring XML 8
Why Structure, Content, and Format Why Structure, Content, and Format Are Important in XML?Are Important in XML?
• XML defines the structure and separate the content from the delivery-specific format.
• Through this approach, the actual document — its content and structure — becomes mobile.
Structuring XML 9
Indicating Structure Through Visual Indicating Structure Through Visual CuesCues
Using Structures in XMLUsing Structures in XML<?xml version=“1.0”><!DOCTYPE ADVISORY SYSTEM "advisory.dtd"><ADVISORY><IDINFO><ADVNBR>Number: 146</ADVNBR><TYPE>Type: 146</TYPE><DATEISS>Date: 8/15/95</DATEISS><DATEREV>Revised: 9/29/95</DATEREV><PRODUCT>Model 501 Nebulation</PRODUCT></IDINFO><SUBJECT>Subject: Revised Replacement Parts List (AnyCorp Model 501)</S
UBJECT><SUBSEC><TITLE>Model 501 User-Replaceable Parts</TITLE><PARA>The parts list identified in the AnyCorp Model 501 User's Mainten
ance Guide has been superseded, effective immediately. User-Replaceable parts are identified in the revised part list below. Parts orders which reference items o12n the previous list (dated 2/5/94) will be honored up to 3/14/96.Customers are advised to order from this revised list in order that they may achieve higher reliability at a lower unit cost. Questions on this subject should be directed to the Central Spares Organization.</PARA></SUBSEC>
Well-Formed and Valid DocumentsWell-Formed and Valid Documents
• XML has two different notions of “correct.”
• Valid documents– Declaring conformance to a DTD in a document type
declaration – “Using the right words in the right place”– Type-valid
• Well-formed documents– Markup is intelligible.– “Getting the pronunciation right”– Non-type-valid
Structuring XML 14
Example Example — Table— Table
Structuring XML 15
Example Example — Table— Table
Structuring XML 16
Example Example — Table— Table
Structuring XML 17
Example Example — Database Publishing— Database Publishing
Structuring XML 18
Example: A DTD for B2B ECExample: A DTD for B2B EC
• RosettaNet PIP 3 A2 Price And Availability Query Version 1.2 Available at http://www.rosettanet.org
DTD Syntax ReviewDTD Syntax Review
Structuring XML 20
DTD SyntaxDTD Syntax
• Seven major headings:– document type declarations– element types– attributes– entities– notations– conditional sections– processing instructions
Structuring XML 21
Document Type DeclarationDocument Type Declaration
• A document type declaration defines constraints on the logical structure and to support the use of predefined storage units.
• The XML document type declaration contains or points to markup declarations that provide a grammar for a class of documents.
Structuring XML 22
Document Type DeclarationDocument Type Declaration
<?xml version “1.0”?><!DOCTYPE label[ <!ELEMENT label (name,street,city,state,country,code)> <!ELEMENT name (#PCDATA)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT country (#PCDATA)> <!ELEMENT code (#PCDATA)> ]><label><name>Rock N. Robyn</name> <street>Jay Bird Street</street> <city>Baltimore</city> <State>MD</state> <country>USA</country> <code>43214</code></label>
Structuring XML 23
Document Type DeclarationDocument Type Declaration
<?xml version “1.0”?><!DOCTYPE LABEL SYSTEM http://www.sgmlsource.com/dtds/label.dtd><LABEL>. . .</LABEL>
Structuring XML 24
Elements Type DeclarationElements Type Declaration
• Elements provide the basic logical structure for XML documents.
Element Type Declaration[45] elementdecl ::= '<!ELEMENT' S Name S contentspec S? '>' [46] contentspec ::= 'EMPTY' | 'ANY' | Mixed | children
Elements Type DeclarationElements Type Declaration
<!ELEMENT spec (front, body, back?)><!ELEMENT div1 (head, (p | list | note)*, div2*)><!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*><!ELEMENT p (#PCDATA|a|ul|b|i|em)*><!ELEMENT b (#PCDATA)>
Structuring XML 26
AttributesAttributes
• Attributes provide meta-data for elements, such as a security level, a revision status, or a unique identifier.
• Use an attribute list declaration to declare attributes for an element<!ATTLIST sample id ID #IMPLIED n CDATA #REQUIRED status (draft|final) “final”>
attribute name attribute type default value
Structuring XML 27
EntitiesEntities
• There are two types of entities:– general entities: apply within the top-level
and its attribute values.– parameter entities: apply within the internal
and external DTD subsets.
Structuring XML 28
Entities: General EntitiesEntities: General Entities
<!ENTITY xml “Extensible Markup Language”>
<para>The &xml; is derived from ISO 8879, an International Standard<index label=&xml;> <para>
<para>The Extensible Markup Language is derived from ISO 8879, an International Standard<index label=“Extensible Markup Language”> <para>
• Notations are used to include non-XML contents ─ like graphics, sounds, video , or source-code listing ─ in XML documents.
• While the XML parser knows nothing about the specific notations, it can pass them on to the processing software to let it know what kinds of data to handle.
<!NOTATION TeX PUBLIC “+//ISBN 0-201-13448-9::Knuth//NOTATION The TeXbook//EN”>
Structuring XML 31
Conditional SectionsConditional Sections
• In the external DTD subsets and external parameter entities, XML allows conditional sections that the parser can include or ignore, depending on the value of the keywords at the start.
<![IGNORE [ <!ELEMENT para (#PCDATA)>]]>
<!ENTITY %include-para “IGNORE”><![%include-para;[ <!ELEMENT para (#PCDATA)>]]>
<!DOCTYPE book SYSTEM “book.dtd”[ <!ENTITY %include-para “INCLUDE”>]>
overriding a parameter entity
Structuring XML 32
Processing InstructionsProcessing Instructions
• XML parser will pass PIs on to your application, but will be up to you to do something useful with them.
<?IS10744:arch name=“abc”>
Introduction to XML SchemaIntroduction to XML Schema
Structuring XML 34
IntroductionIntroduction
• The new XML Schema system aims at providing a rich grammatical structure for XML documents that overcomes the limitations of the DTD.
Structuring XML 35
What is a Schema?What is a Schema?
• A schema is a model for describing the structure of information.
• In the context of XML, a schema describes a model for a whole class of documents.
• A schema might also be viewed as an agreement on a common vocabulary for a particular application that involves exchanging documents.
Structuring XML 36
What is a Schema?What is a Schema?
• In schemas, models are described in terms of constraints.
• Two kinds of constraints that you can give:– content model constraints describe the order and sequen
ce of elements and – datatype constraints describe valid units of data.
Structuring XML 37
What is a Schema?What is a Schema?
• For example, a schema might describe a valid <address> with the content model constraint that – it consists of a <name> element, followed by – one or more <street> elements, followed by – exactly one <city>, <state>, and <zip> element.– The content of a <zip> might have a further datatype cons
traint that it consist of either a sequence of exactly five digits or a sequence of five digits, followed by a hyphen, followed by a sequence of exactly four digits. No other text is a valid ZIP code.
<address> <name>Namron H. Slaw</name> <street>256 Eight Bit Lane</street> <city>East Yahoo</city> <state>MA</state> <state>CT</state> <zip>blue</zip></address>
invalid
Structuring XML 38
Limitations of DTDLimitations of DTD
• XML inherited DTDs from SGML. • DTDs can be used to define content models and, to
a limited extent, the datatypes of attributes, but they have a number of obvious limitations:– different (non-XML) syntax– no support for namespaces– extremely limited datatyping– a complex and fragile extension mechanism based on littl
e more than string substitution (no explicit relationship)
Structuring XML 39
Features of SchemaFeatures of Schema
• Richer datatypes – booleans, numbers, dates and times, URIs, integers, deci
mal numbers, real numbers, intervals of time, etc.• User defined types• Attribute grouping• Refinable archetypes• Namespace support
Structuring XML 40
ValidityValidity
• Reasons why need to validate documents: – EC: received is exactly what you expect.– B2B: validating before inserting into your database. – XML document for control purpose
• Content model validity tests whether the order and nesting of tags is correct.
• Datatype validity is the ability to test whether specific units of information are of the correct type and fall within the specified legal values.
Structuring XML 41
Illustrations of XML SchemaIllustrations of XML Schema
An XML document fragment<InvoiceNo>123456789</InvoiceNo><ProductID>J123456</ProductID>
DTD fragment describing the above elements <!ELEMENT InvoiceNo (#PCDATA)><!ELEMENT ProductID (#PCDATA)>
XML Schema fragment describing the above elements<element name='InvoiceNo' type='positive-integer'/><element name='ProductID' type='ProductCode'/><simpleType name='ProductCode' base='string'> <pattern value='[A-Z]{1}d{6}'/></simpleType>
Structuring XML 42
Using Namespaces in XML SchemaUsing Namespaces in XML Schema
• One person may be processing documents from many other parties and the different parties may want to represent their data elements differently.
• Moreover, in a single document, they may need to separately refer to elements with the same name that are created by different parties.
• How can you distinguish between such different definitions with the same name?
• XML Schema allows the concept of namespaces to distinguish the definitions.
Structuring XML 43
Using Namespaces in XML Using Namespaces in XML SchemaSchema
• A given XML Schema defines a set of new names. The names defined in a schema are said to belong to its target namespace.
• Definitions and declarations in a schema can refer to names that may belong to other namespaces. We refer to those namespaces as source namespaces.
• Each schema has one target namespace and possibly many source namespaces.
• In fact, every name in a given schema belongs to some namespace.
• The names for the namespaces can be fairly long, but they can be abbreviated with the syntax of xmlns declaration in the XML Schema document.
Structuring XML 44
Using Namespaces in XML Using Namespaces in XML SchemaSchema
• Target and source namespaces
<!--XML Schema fragment in file schema1.xsd--><xsd:schema targetNamespace='http://www.SampleStore.com/Account' xmlns:xsd='http://www.w3.org/1999/XMLSchema' xmlns:ACC='http://www.SampleStore.com/Account'><xsd:element name='InvoiceNo' type='xsd:positive-integer'/><xsd:element name='ProductID' type='ACC:ProductCode'/><xsd:simpleType name='ProductCode' base='xsd:string'> <xsd:pattern value='[A-Z]{1}d{6}'/></xsd:simpleType>
Structuring XML 45
Using Namespaces in XML Using Namespaces in XML SchemaSchema
• Expressing sophisticated constraints on elements– XML Schema offers greater flexibility than DTD for express
ing constraints on the content model of elements. – At the simplest level, as in DTD, you can associate attribut
es with an element declaration and indicate that a sequence of one only (1), zero or more (*), or one or more (+) elements from a given set of elements can occur in it.
– You can express additional constraints in XML Schema using, for example, minOccurs and maxOccurs attributes of element element and using choice, group, and all elements.
DTD in Electronic BusinessDTD in Electronic Business
Structuring XML 63
RosettaNet: An EB StandardRosettaNet: An EB Standard
• RosettaNet is a consortium of major information technology (IT), electronic components (EC) and semiconductor manufacturing (SM) companies working to create and implement industry-wide EB process standards. – Perfect real-time information. – Efficient e-business processes. – Dynamic trading-partner relationships. – New business opportunities.