Top Banner
XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen http://users.jyu.fi/~airi/ 12.1.-16.1. 2009 XML for Information Management
23

XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi

Dec 18, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

XML for Information Management – Day 2Airi Salminen

University of Erlangen-NurembergComputational Linguistics

Instructor: Professor Airi Salminenhttp://users.jyu.fi/~airi/

12.1.-16.1. 2009

XML for Information Management

Page 2: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

2XML for Information Management – Day 2Airi Salminen

1. Markup languages2. Structured documents3. World Wide Web Consortium

Day 2: Background of XML

Outline

Page 3: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

3XML for Information Management – Day 2Airi Salminen

1. Markup languages

•intended for human readers

•intended for computers

Markup

Page 4: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

4XML for Information Management – Day 2Airi Salminen

•punctuational

•presentational

Markup for human readers

Texthasalwaysincludedsomekindofmarkupalsobeforethetimeofcomputers

to clarify the written expression

Text has always included some kind of markup, also before the time of computers.

Text has always included some kind of markup, also before the time of computers.

1. Markup languages

Page 5: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

5XML for Information Management – Day 2Airi Salminen

• presentational

• procedural

• descriptive

Markup for computers

to provide information for a software module

In markup languages clear separation of markup and primary content. Markup is metadata, adding some information to the primary data.

1. Markup languages

Page 6: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

6XML for Information Management – Day 2Airi Salminen

Presentational markup

information about the way the software module should present the primary content to the human perceiver

In <i>markup languages</i> there is clear separation of <i>markup</i> and <i>primary content</i>. Markup is <i>metadata</i>, adding some information to the primary data.

The tags <i> and </i> represent presentational markup in HTML.

1. Markup languages

The markup in an HTML file

Page 7: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

7XML for Information Management – Day 2Airi Salminen

Procedural markup

a processing instruction for the software module

<![CDATA[<element>Example of an XML element</element>]]>

The strings <![CDATA[ and ]]> represent procedural markup in XML.

<![CDATA[ instructs the XML processor to regard all text before ]]> as character data

]]> instructs the XML processor to to continue normal identification of markup

<![CDATA[<element>Example of an XML element</element>]]>

1. Markup languages

The markup in an XML file

Page 8: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

8XML for Information Management – Day 2Airi Salminen

Declarative markup

describes the content of a piece of primary content, what it is, or declares that the piece is a member of a particular class<student><first_name>Steve</first_name><last_name>Chung</last_name><email>[email protected]</email></student>

XML is primarily for declarative markup.

1. Markup languages

The markup in an XML file

Page 9: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

9XML for Information Management – Day 2Airi Salminen

Markup in XML

‣All markup delivers information to XML Processor. DTD represents metamarkup, facilitating the definition of the markup vocabulary.

‣Markup in an XML document is usually classified in respect to the application.

‣Processing instructions represent procedural markup.

‣Element tags represent declarative markup.

‣ In the specification of an XML application different kinds of meanings can be given to element names, they can be processing instructions to the application or instructions about the way the content should be presented by the application.

1. Markup languages

Page 10: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

10XML for Information Management – Day 2Airi Salminen

Example of HTML markup

<html><head><title>University of Jyv&auml;skyl&auml; </title></head><body><h2>Faculties</h2><ul><li>Humanities<li>Information Technology <li>Social Sciences</ul><br><address>[email protected]</address></body></html>

The element markup describes the structure for WWW publishing.

1. Markup languages

Page 11: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

11XML for Information Management – Day 2Airi Salminen

<university><name>University of Jyväskylä</name><faculties>Faculties<faculty>Humanities</faculty><faculty>Information

Technology</faculty><faculty>Social Sciences</faculty></faculties><contact_email>[email protected]</

contact_email></university>

The same primary content with markup describing the content of elements by means of XML markup.

1. Markup languages

Page 12: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

12XML for Information Management – Day 2Airi Salminen

1. Markup languages

Logical structure of the HTML document

html

body

Faculties

University of Jyväskylä

Humanitieshead

[email protected]

br

title

h2

ul

Social Sciences

Information Technology

li

li

li

address

Logical structure of the XML document

university

faculties

Faculties

University of Jyväskylä

Humanitiesname

[email protected]

Social Sciences

Information Technology

faculty

contact_email faculty

faculty

Page 13: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

13XML for Information Management – Day 2Airi Salminen

2. Structured documents

Structured document

‣ structure, content, and external presentation can be separated from each other and processed separately

‣ structural components have names

‣ structural components can be recognized by software modules

‣ possible to define the structure

Page 14: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

14XML for Information Management – Day 2Airi Salminen

Structured document

Structure

Content

Layout

2. Structured documents

an open language standard,

e.g. SGML, XML

different languages for defining the layout, e.g., CSS and XSL for XML

different languages for defining the structure,

e.g., DTD, XML Schema, RELAX NG for XML

Page 15: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

15XML for Information Management – Day 2Airi Salminen

Structured document

Structure

Content

Layout

2. Structured documents

Example

DTD.txt

rhymes.txt rhymes.xml

style.txt style.css

rhymes with style attachment.xml

rhymes with style attachment.txt

Page 16: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

16XML for Information Management – Day 2Airi Salminen

Management of structured documents

‣ document management

‣ management of the data contained in documents

2. Structured documents

Page 17: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

17XML for Information Management – Day 2Airi Salminen

Characteristics in the management of structured documents

‣ Design. Adopting the approach of structured document management in an environment often requires careful planning before the creation of documents. Includes schema design and layout design.

‣ Content production. Content can be produced by different types of software, e.g. by a syntax-directed editor. Checking the validity against the schema.

‣ Evolution. Schema versioning, layout versioning.

‣ Operations. Most typical operation is some kind of transformation.

‣ Software. Many kinds of software systems used.

2. Structured documents

Page 18: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

18XML for Information Management – Day 2Airi Salminen

2. Structured documents

Traditional document management

Structured document management

- No schema design.

- Processing applied to a document.

- Content, structure, and layout together.

- Schema design important. Also layou designed.

- Schemas can be utilized in various ways. Semantic information attached in the schemas.

- Processing of document parts.

- Content, structure, and layout can be processed separately.

- Management required for content schema, and stylesheet items and their different versions.

Page 19: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

19XML for Information Management – Day 2Airi Salminen

2. Structured documents

Database management Structured document management

- Database often the information repository of one software system called Database Management System (DBMS), data processed by the operations of the DBMS.

- Design divided into schema design and view design.

- Content produced gradually, by the operations of the DBMS.

- Queries are the most important operations.

- Different software systems used to manipulate data.

- Schema design often related to extensive sectoral standard development. Layout requires design as well.

- Content produced by different kinds of programs, e.g. interactively by structure editors or automatically.

- Transformations most important operations.

Page 20: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

20XML for Information Management – Day 2Airi Salminen

Database languages

‣ definition languages‣ query languages

Structured document languages

‣ definition languages‣ style languages‣ various manipulation, transformation

and query languages

2. Structured documents

Page 21: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

21XML for Information Management – Day 2Airi Salminen

3. World Wide Web Consortium

‣W3C developes specifications to support the use of the web, publicly available at http://www.w3.org/TR/

‣Development is systematic

‣Development process is specified and published

Page 22: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

22XML for Information Management – Day 2Airi Salminen

‣Working Draft: represents work in progress.

‣Candidate Recommendation: has received significant review from its immediate technical community, explicit call for implementation and technical feedback.

‣Proposed Recommendation: represents consensus in the development group, proposed to the Advisory Committee for review.

‣Recommendation: represents consensus within W3C, widespread implementation encouraged.

Phases of the development process

3. World Wide Web Consortium

Page 23: XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen airi/

23XML for Information Management – Day 2Airi Salminen

3. World Wide Web Consortium

‣Remains as a Recommendation indefinitely.

‣W3C rescinds the recommendation. A report called Rescinded Recommendation is published.

‣A new version of the Recommendation is developed.

‣Minor modifications are done. A report called Proposed Edited Recommendation is published.

What happens to a W3C Recommendation?