This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Modul 2:
XML Schemadefinition
a.Univ.-Prof. Dr. Werner Retschitzegger
Vorlesu
ng
IFS in der B
ioinformatik
SS 2010
Johannes Kepler University Linzwww.jku.ac.at
Johannes Kepler University Linzwww.jku.ac.at
Institute of Bioinformaticswww.bioinf.jku.at
Institute of Bioinformaticswww.bioinf.jku.at
IFSIFSInformation Systems Group
www.ifs.uni-linz.ac.at
IFSIFSIFSIFSInformation Systems Group
www.ifs.uni-linz.ac.at
www.univie.ac.at
M2-2
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
IntroductionMotivation for XMLDocument Markup LanguagesApplication Areas for XML
XML 1.0NamespacesXML Schema
The following slides are based (among others) on:Elliotte Rusty Harold, W. Scott Means, XML in a Nutshell: A Desktop Quick Reference, 3rd Edition, O'Reilly & Associates, 2005
M2-3
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
Brian Kerningham: "The problem with HTML-WYSIWYG is thatwhat you see is all you've got"
HTML (HyperText Markup Language) is the "Lingua Franca" for representing Hypertext Documents at the WebStandardized 1989 by W3C (World Wide Web Consortium)Basic concept: "Markup" in terms of "Tags"
DrawbacksRestricted number of pre-defined tags
permanent extensions with proprietary tags
Tags primarily describe layout aspectshardens Web search
M2-5
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
Layout IndependenceSeparation of structure and semantics of the content from its layout
Platform and Vendor IndependenceEndorsed by the W3C
InternationalityBased on the UNICODE-Standard
ExtensibilityTags can be defined and named arbitrarily – meta language
StructurabilityTags can be nested arbitrarily
Semi-structuredContent can contain fully structured parts and fully unstructured parts
Self-describingTags describing structure and semantics of the content are... for humans: relatively easy to read and edit... for machines: easy to generate and parse
X-Technology InfrastructureW3C provides a set of XML-based standards – „XML Standards Family“
Correctness ProofOptionally, XML documents can be proofed for correctness
M2-7
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
At least 1 tag per documentExactly 1 root tagTags have to be none-overlappingEach tag has to havean end tag....
XML-Processors parse XML documents and checkeither solely well-formedness (non-validating processors)or also validity (validating processors)
Can be called from within an application (e.g., browser)Decompose an XML document into its parts forming a tree, which allows to access its parts from within an application
ValidityXML document is well-formedand corresponds to a schemaSchema defines vocabulary and grammarAlternatives: DTD orXML Schema-StandardApplication
DocumentpartsErrors
Catalog.DTD
XML Processor
ParserEntityManagerPDACatalog1.XML
PDA
XML-Document
FeaturesEntities
Motivation for XML 5/5
Properties of XML Documents and XML Processors
M2-8
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
Language - ISO 8879)Tim Berners-Lee (CERN) 1989 HTML (Hypertext Markup Language)Mark Andreessen (NCSA) 1993 HTML-Forms (XMosaic)Netscape, Microsoft 1994 HTML-DerivationsJon Bosak, Tim Bray, 1996 XML Working Group James Clark et al. (W3C)
10. 2. 1998 XML 1.029. 9. 2006 XML 1.1, 2nd Edition
M2-9
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
... SGMLXML vs. SGML (60 pages vs. 600 pages)XML has 20% of SGML’s complexity, but 80% of its functionalityXML documents are conform to an ISO revision of SGML -WebSGML (Annex to the SGML-Standard ISO8879)
... HTMLXML is complementary to HTML (semantic and structure vs. layout)XML is not backward compatible to HTMLSimple conversion from HTML documents to XML
... XHTML= Extensible HTMLW3C Recommendation Aug. 2002 (2nd edition)HTML 4.01 as an „XML application“, i.e. HTML was described bymeans of a XML-DTD
M2-12
XML SchemadefinitionXML SchemaNamespacesXML 1.0Introduction
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
A DTD defines vocabulary and grammar for a set of XML documentsAn XML document is allowed to reference a single DTD only("document type declaration - DOCTYPE")
A DTD has to be referencedAFTER the prologuebut BEFORE the root element
A DTD does NOT DEFINE the rootelement of a XML document
The root element is rather definedwithin the XML document itselfusing the DOCTYPE-DeclarationCan be an arbitrary element of the DTD
Excursus – URL vs. URI:An URL (Uniform Resource Locator) identifies Internet resources on basis of their location using the Domain Name Service (DNS)An URI (Uniform Resource Identifier) identifies arbitraryresources on basis of their names (z.B. ISBN#) or otherproperties of the resourceEach URL is a valid URI
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
CDATAString<!ATTLIST Producer name CDATA #REQUIRED>
ID, IDREF(S)ID ensures uniqueness of attribute values within a documentPer element 1 attribute of type ID allowed onlyIDREF is a reference to an attribute of type ID
„Referential integrity“ (untyped!) is checked by XML processorValues of ID- and IDREF(S)-attributes must be valid XML names, i.e., starting numbers are not allowed
DTD 7/8Attribute Declaration – 10 Types
<!ATTLIST Exampleidentity ID #IMPLIEDreference IDREF #IMPLIED>
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
DTD 8/8Attribute Declaration – 10 Types
Enumeration TypeA pre-defined set of values consisting of XML name tokens<!ATTLIST Price contract (yes|no) "no">
ENTITY, ENTITIESAttribute value is the name of a declared non-parsed Entity<!ATTLIST Image filename ENTITY #REQUIRED>
NMTOKEN(S)"XML name tokens” are an extended form of XML namesIn addition, they can start with "0..9 ", ". " and "-"<!ATTLIST journal year NMTOKEN #REQUIRED>
NOTATIONAttribute value is the name of a declared notation – seldomlyused<!ATTLIST image type NOTATION (gif | tiff) #REQUIRED>
Alternative: CDATA-SectionExample:<formular>x <![CDATA[<]]> y</formular>“Within” CDATA only its end is recognized (']]>')CDATA-Sections cannot be nested
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
PurposeDecomposition of the XML document (similar to SSI – Server Side Include-mechanism) Because of the document’s size or for reuse
Declaration within the DTD
CharakteristicaIn principal well-formed, but may contain multiple root elementsReference to a DTD not allowed
UsageSyntax analogous to internal entitiesAs element values of the XML document and within entities themeselvesCyclic references forbiddenNOT within attribute values
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
PurposeReferences to files with arbitrary formats, e.g. ASCII, not-wellformed XML, GIF, JPEG, QuickTime Movies
NDATA defines a "non-parsed" Entity and specifies an arbitrary file formata NOTATION-declaration is necessary to identify a corresponding application (via an URI), which is able to process files of thisformat
UsageOnly as attribute value of type ENTITYSyntax: entity name within quotation marks (Note: NO &...;)Processor informs the application only that there exists a non-parsed entity at a certain location – no expansion!
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
Entities 7/9User-Defined Entities – Example
<?XML version="1.0"?><!DOCTYPE PDACatalog SYSTEM ”Catalog.dtd" [<!ENTITY linkNokia "http://www.nokia.de/8210"><!ENTITY address "<town>Linz</town>"><!ENTITY features SYSTEM "feat8210.XML"><!ENTITY bildNokia SYSTEM "/pictures/8210.jpg"
NDATA jpeg><!NOTATION jpeg SYSTEM "image/jpeg">…<!ATTLIST Image filename ENTITY #REQUIRED>]>…<PDA name="8210">
Namespaces XML SchemaXML 1.0Introduction XML Schemadefinition
Entities 9/9Parameter Entities – Overriding
<!ENTITY % residental_content"address,rooms">
External DTD
Internal DTD of a XML document<!ENTITY % residental_content
"address,rooms,baths">
A Parameter Entity defined within an external DTD can bearbitrarily overriden within the internal DTD of a XML documentThis allows to adapt the external DTD to the requirementsof single XML documents without having to change theexternal DTDThus, the Parameter Entity is used as a kind of "Customization Hook"
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Namespaces 1/5
A XML namespace (NS) allows a unique global identification of elments and attributes
W3C-REC "Namespaces in XML", 14th Jan. 1999 (13 pages)
For this, elements and attributes of a domain (e.g. MathML) are assigned to one or more NS
XSL uses, e.g., different namespaces for XSLT and XSL-FO
A NS is represented by an URINeeds not directly refer to the corresponding vocabularyThus, provides a level of indirection which allows to decouple thelocation of the vocabulary from the unique identifier – the URI
The associated elements and attributes have to be qualifiedby means of this URI in case of usage, thus being madeglobaly unique
This allows the reuse and especially the combination(„mixture“) of different vocabularies
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Namespaces 2/5NS with Prefix vs. Default NS
BUT: URIs cannot be used for direct qualificationThis is since URIs normally contain characters which are not allowed as part of valid XML names (e.g., " / ", " & ")
Instead, user-defined prefixes have to be used
One ore more NS are declared on basis of the pre-definedattribute xmlns
This attribute can be defined in the context of any element of the DTD
The name of the element itself where the NS has been declared as well as direct and indirect subelements and attributes can be qualified withthe NS – „NS-inheritance“
Default NSAlso declared via the pre-defined attribute xmlns – BUT – only 1 per element, and without declaring any prefixNone-qualified subelements are automatically associated with thedefault NS, attributes NOT Can be overriden within subelements
NS Prefix (optional) URI of the NSPre-defined Attributefor NS Declaration
Default-NS(no Prefix)
The NS of the element edi:price is http://ecommerce.org/schemaThe NS of the elements model and price is the default NShttp://www.mobildev.com/schemaThe attributes name and währung have NO NS associated with
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Namespaces 4/5... and DTDs
NS are in principle independent of DTDsCan be used in documents with or without DTDs
BUT:All elements and attributes which are qualified in the XML document must also be declared appropriately within the DTDHuge Overhead – this is since DTD’s are not aware of NS<edi:HC> ... <!ELEMENT edi:HC (....)><edi:price> ... <!ELEMENT edi:price (#PCDATA)>
What can be done is to specify a default NS within the DTD<!ATTLIST edi:HC xmlns
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
IntroductionDTD versus XML Schema 1/2
Drawbacks DTDsProprietary syntaxFew datatypes, in fact onlyone – StringGlobal definition of elementsParameter Entities for modularization & overridingID, IDREF(S): Severe restrictions
Advantages XML SchemaXML as syntaxNumerous pre-defineddatatypesUser-defined simple andcomplex datatypesInheritanceKeys, references:flexible concept
XML SchemaDefinition of the structure of XML documentsW3C REC May 2001, approx. 420 pagesW3C REC 2nd edition October 2004
Normalized String with whitespace replacement. Each Tab, Linefeed and CR is replaced by Blank.
"Tokenized" String – all whitespace characters are replaced by blanks, all starting and ending blanks are deleted and multiple consecutive blanks are replacedby a single one.
Standardized language codes (e.g. en, en-US, de, de-DE)
Name token: String without blanks (z.B. "CMS", "234234")
XML-Name: must start with letter, ":" or "-" (e.g., "CMS", "-1")
Name without prefix
String-Datatype withoutWhitespace-Replacement
Binary string-encodedDatatypes
Qualified name: supports the usageof names with NS-prefix
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Alternative Definition PossibilitiesReferencing an existing datatype via the attribute baseLocal definition from scratch by using simpleType as subelement of the restriction-Element
12 Possible Restrictions, depending on the base datatypelengthminLengthmaxLengthpatternenumerationminInclusivemaxInclusiveminExclusivemaxExclusivewhiteSpacetotalDigitsfractionDigits
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Characteristics of a key (key) Value (combination) must be uniqueValue must existKey must be defined as subelement of another element –following the type definition
Candidates for keys (field)Elements with simple datatypes only!AttributesCombinations of elements and attributes
Scope can be defined (selector)
Reference to key can be defined (keyref)
Elements, Attributes and Combinations thereof can bedefined to be unique (unique)
Value (combination) must be uniqueValue need NOT exist
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Incorporation of other schemata via include, redefine and import
include, redefine and import elements must be subelementsof schema prior to any other declaration
Include of a Schema – includeNS of included schema must be equal to the NS of the including schema or no NS at allThe included schema can be used as if it were declared directly within the including schema
XML SchemaNamespacesXML 1.0Introduction XML Schemadefinition
Including and Redefining a Schema – redefineSame functionality as includeIn addition, included components (simpleType, complexType, group, attributeGroup) can be newly definedNew definitions replace the original ones
Import of a Schema – importImported schema can have an arbitrary NS (could be unequal to the current one) or none
BooksXML in a Nutshell: A Desktop Quick Reference, 3rd EditionElliotte Rusty Harold, W. Scott Means, O'Reilly & Associates, 2005
O’Reilly XML.com: http://www.xml.com
XML 1.1 Bible, Elliotte Rusty Harold, 2nd Edition, John Wiley & Sons, 2004Elliotte Rusty Harold. Cafe con Leche XML News and Resources: http://www.ibiblio.org/xml
ConferencesXML Europe (XTech Conference Series)
http://www.xmleurope.com
XML Conference & Expositionhttp://www.xmlconference.org
Online ResourcesCommented XML-Standard – Tim Bray
http://www.xml.com/axml/testaxml.htm
W3Schoolshttp://www.w3schools.com/xml/
XML & DTD Patternshttp://www.xmlpatterns.com/
Overview XML Editorshttp://www.perfectxml.com/soft.asp?cat=6
Java and XML. Sun Microsystems, Inchttp://java.sun.com/xml/
IBM XML Zonehttp://www.ibm.com/developer/xml/
Microsoft XML Developer Centerhttp://msdn.microsoft.com/xml/default.asp
XML Schema Test Suites vom W3Chttp://www.w3.org/2001/05/xmlschema-test-collection.html