Top Banner
eXtensible Markup Language eXtensible Markup Language (XML) (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems
37

EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

Jan 05, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

eXtensible Markup Language (XML)eXtensible Markup Language (XML)

Spring Technology WorkshopsMarch 1998

MacKenzie Smith

Office for Information Systems

Page 2: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML DefinitionXML Definition

• XML: eXtensible Markup Language

• New standard developed by the World Wide Web Consortium (W3C)

• Issued as a “recommendation” February 10, 1998 after broad industry review

• Meant as a replacement standard for HTML to render documents on the WWW

Page 3: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML DefinitionXML Definition

• Sponsored by major software companies (MicroSoft, Netscape, Adobe)

• Hardware companies (Sun, HP, Fuji Xerox)

• Electronic publishing companies (Inso, ArborText, Texcel, SoftQuad)

• And other organizations involved in WWW development (W3C, NCSA)

Page 4: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML DefinitionXML Definition

• Both XML and HTML are based on SGML

(Standard Generalized Markup Language)

Page 5: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and SGMLXML and SGML

What is SGML?

• ISO 8879, published in 1986, and based on IBM’s GML markup language

• Widely used in Government (especially defense industry), publishing industry, and academia for publishing materials both online and in print.

Page 6: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and SGMLXML and SGML

• Separates a document’s structure from its display (rendering in a particular medium)

• Markup a document once, publish it many times, in many media, with different “style sheets” customized for each medium.

• Encode intellectual aspects of the document, which facilitates indexing and database applications

Page 7: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and SGMLXML and SGML

• Each type of document has a Document Type Definition (DTD) which defines its structure, and what tags may be used to encode it, e.g.– Encoded Archival Description (EAD)– Text Encoding Initiative (TEI)– AAP (American Association of Publishers)

Page 8: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and SGMLXML and SGML

• Tag sets (from the DTD) should identify the structural components of the document

– e.g. in a letter, they could include: <recipient>, <return address>, <salutation>, <paragraph>, <closing>, <signature>

– but also <date>, <name>, <company>, etc.

Page 9: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and SGMLXML and SGML

Example of an SGML-encoded document<!DOCTYPE EAD PUBLIC "-//Society of American Archivists//DTD ead.dtd (Encoded Archival Description

(EAD))//EN" "ead.dtd">

<EAD>

<EADHEADER><EADID>bak00006</EADID>

<FILEDESC><TITLESTMT><TITLEPROPER>Ezra f. Beal Diary, 1850-1862</TITLEPROPER></TITLESTMT></FILEDESC>

<PROFILEDESC><CREATION><DATE>12/27/1996</DATE>Anna Koch</CREATION></PROFILEDESC>

</EADHEADER>

<FRONTMATTER>

<TITLEPAGE>

<TITLEPROPER>Ezra F. Beal. Diary, 1850- 1862.</TITLEPROPER>

<AUTHOR>Baker Library</AUTHOR>

<PUBLISHER>Harvard Business School, Boston, MA 02163</PUBLISHER>

<P>&copy; The President and Fellows of Harvard College</P>

</TITLEPAGE>

</FRONTMATTER>....

Page 10: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and SGMLXML and SGML

• SGML-encoded documents are published today in print, on CDROM, or various networked media, including the WWW

• Each of these media requires a separate “style sheet” to decide how best to render each part of the document

Page 11: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and SGMLXML and SGML

• To publish an SGML document on the Web now you need an SGML “viewer” defined for your web browser, such as SoftQuad’s Panorama software

• Panorama downloads the document, it’s DTD and a style sheet from a web server, then renders the document on your monitor

Page 12: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and SGMLXML and SGML

• It’s clunky, slow, and requires extra work by readers to configure their web browsers correctly... In other words we don’t want to use it

• Adding SGML support directly into web browsers is considered far too difficult (parsing “real” SGML is very complicated)

Page 13: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and HTMLXML and HTML

What about HTML?

• HTML is a simple SGML DTD, but• HTML’s tag set is a mix of structural and

display aspects of documents, with each tag tied to exactly one way of displaying it in a particular browser

Page 14: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and HTMLXML and HTML

• Some examples of this:– <H1>, <H2>, and <H3> are different sizes of

headings– <BlockQuote> indents the current paragraph– <HR> puts a line across the page– Tables are used for spacing text legibly more

often than for tabular information

Page 15: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and HTMLXML and HTML

• HTML has been repeatedly “extended” to make it more useful, but also making it non-standard across different browser types. E.g.– “Blinking” text (Netscape 1.0)– Frames (Netscape 2.0)– Forms (HTML 3.0)– javascript– MetaHTML

Page 16: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and HTMLXML and HTML

• So HTML is at the end of its extensibility (though it probably won’t go away)

• HTML doesn’t allow– different tag sets for different domains (like

libraries vs. the defense industry vs. the medical industry)

– complex document structures (e.g. nesting)– ability to support DTDs when you want to (for

databases of web documents, etc.)

Page 17: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and HTMLXML and HTML

• but as we’ve seen, SGML is too complex for the WWW

• and HTML isn’t flexible enough

• enter XML...

Page 18: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XMLXML

What is XML?

• Based on SGML, but greatly simplified

• Disallows some things from SGML that made it very difficult to use on the Web

• Allows for a DTD, but doesn’t require it for rendering a document on the web

Page 19: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XMLXML

• Web browsers (Netscape, Microsoft) will support XML directly– no need for a separate “viewer” or “plugin”– will have a default style sheet for common tag

sets like HTML

• MS Internet Explorer already supports XML (with a few tricks)

Page 20: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XMLXML

• Documents will be marked up with a tag set from a DTD, as is currently done with SGML and HTML (although a DTD is not required)

• But they’ll also come with a style sheet and hyperlinks using two related standards:– XSL (eXtensible Stylesheet Language)– XLL (eXtensible Linking Language)

Page 21: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and XSLXML and XSL

eXtensible Stylesheet Language (XSL)

• Based on ISO/IEC 10179, the Document Style Semantics and Specification Language (DSSSL) standard

• Build on the current Cascading Style Sheets (CSS) mechanism for displaying HTML, and will likely coexist with it

Page 22: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and XSLXML and XSL

• XSL Supports display based on things like:– formatting of source elements based on

ancestry/descendency, position, and uniqueness – creation of formatting constructs including generated

text and graphics – definition of reusable formatting macros and

extensible set of formatting objects– writing-direction independent stylesheets (for

internationalization)

Page 23: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and XLLXML and XLL

eXtensible Linking Language (XLL)

• based on ISO/IEC 10744-1992, the HyTime standard for hypermedia (an extension to SGML)

• the XLL working group has split in two: Xpointer and Xlink. Their draft recommendations will be out within the next few days, but the final version won’t be available for awhile yet.

Page 24: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and XLLXML and XLL

• specifies the mechanism for supporting true hyperlinks in XML documents, including:– links to other documents (to replace the HTML tag

<A HREF=“URL”>)– links within documents– location-independent naming (i.e. URNs)– Bi-direction linking– Link management outside of documents to which

they apply (i.e. in a database)

Page 25: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

HTML exampleHTML example

Example HTML for a simple document:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN//2.0">

<HTML>

<HEAD><TITLE>Home Page Title

<BODY>

<H1 align=center>Home Page Title

<P>Introduction to the home page

<H1><A HREF=”anotherpage.html">Current List</A>

<P>Next part of the document

</HTML>

Page 26: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML exampleXML example

Example XML for the document, using a new DTD:

<!DOCTYPE MyDTD PUBLIC "-//IETF//DTD MyDTD Strict//EN//2.0">

<MyDTD>

<HEAD><TITLE>Home Page itle</TITLE> <AUTHOR>Webmaster</AUTHOR></HEAD>

<CONTENT>

<PAGE HEAD>Home Page title</PAGE HEAD>

<PARA>Introduction to the home page</PARA>

<A XML-LINK=“SIMPLE” HREF=“http://server.harvard.edu/document”>

Current List</A>

<PARA>Next part of the document</PARA>

</CONTENT>

</MyDTD>

Page 27: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XSL ExampleXSL Example

<xsl>

<rule><root/>

<HTML><BODY font-family="Arial, helvetica, sans-serif" font-size="12pt" background color="#EEEEEE">

<children/>

</BODY></HTML>

</rule>

<rule><!-- Hides the eadheader --><target-element type="eadheader"/>

<empty/>

</rule>

<rule><target-element type="p"/>

<p color="black">

<children/>

</p>

</rule>

Page 28: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and LibrariesXML and Libraries

• So, other than using XML when HTML isn’t enough, why else should you care about XML?

Page 29: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and MetadataXML and Metadata

• Resource Description Framework – new standard from the W3C RDF Model and

Syntax Working Group– chaired by Eric Miller from OCLC– supercedes the “Warwick Framework” (mostly)– currently a draft specification, in final review– also supported by web industry software giants

(MicroSoft, Netscape, IBM, etc.)

Page 30: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and MetadataXML and Metadata

• RDF– defines a way to put searchable metadata on

the web (with or without the web resources that are described by the metadata)

– and for structural metadata to relate groups of digital objects into logical constructs (like a bunch of scanned image files into an article)

Page 31: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and MetadataXML and Metadata

• RDF– specifies a generalized data model to handle all kinds

of metadata, including Dublin Core, MARC, etc., and a transportation syntax based on XML

– encodes sets of properties of web resources• properties describe resources using various

attributes

• and also relate resources to each other

Page 32: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and MetadataXML and Metadata

RDF core record example, using Dublin Core:

<?xml:namespace name=“http://purl.org/metadata/dublin_core#” as=“DC”?>

<?xml:namespace name=“http://www.w3.org/TR/WD-rdf-syntax#” as=“RDF”?>

<RDF:RDF>

<RDF:Description RDF:HREF=“http://www.somewhere.edu/some.doc”>

<DC:Creator>John Smith</DC:Creator>

<DC:Title>John’s document</DC:Title>

<DC:Date>03/12/98</DC:Date>

</RDF:Description>

</RDF:RDF>

Page 33: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and MetadataXML and Metadata

RDF aggregates record example:

<?xml:namespace name=“http://purl.org/metadata/dublin_core#” as=“DC”?>

<?xml:namespace name=“http://www.w3.org/TR/WD-rdf-syntax#” as=“RDF”?>

<RDF:RDF>

<RDF:Description RDF:HREF=“http://www.somewhere.edu/.html”>

<DC:Creator>

<RDF:Seq ID=“CreatorsAlphabeticallyBySurname”>

<RDF:LI>John Jones</RDF:LI>

<RDF:LI>John Smith</RDF:LI>

</RDF:Seq>

</DC:Creator>

</RDF:Description>

</RDF:RDF>

Page 34: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and HarvardXML and Harvard

• Digital Finding Aids Project– began in the summer of 1995– using SGML to encode archival finding aids with

the Encoded Archival Description (EAD) DTD– EAD version 1 just released, will be fully XML

compliant– Publishing both HTML and SGML for display

today

Page 35: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and HarvardXML and Harvard

• Digital Finding Aids Project

– Susan Von Salis, Manuscripts Department, Schlesinger Library will talk about using SGML (and XML) in the library

Page 36: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XML and LibrariesXML and Libraries

• So, other than using XML when HTML isn’t enough, why else should you care about XML?

• Resource Description Framework (RDF)

• Library finding aids encoded with the Encoded Archival Description (EAD)

• Web-accessible full-texts encoded with the TEI

• Lots of other possibilities (MARC records, A&I databases, etc.)

Page 37: EXtensible Markup Language (XML) Spring Technology Workshops March 1998 MacKenzie Smith Office for Information Systems.

XMLXML

• Stay tuned for (many) more developments...

for more info see:

http://www.w3.org/XML/