Top Banner
1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002
35

1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

1

XML and QUERY

Shilpi Ahuja

CSE 591 - Data Mining

4th April 2002

Page 2: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

2

What is XML? (Extensible Markup Language )

A Markup language for structured documentation.

A Structural and Semantic language, not a formatting language

Not just for Web pages

Page 3: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

3

HTML vs. XML

External Presentation

Xaver Roe

Wikingerrufer 7

10555 Berlin

XML

<Address><name> Xaver Roe </name><street> Wikingerufer 7 </street><Town> Berlin </Town></Address>

HTML <em>Xaver Roe</em> <br> Wikingerufer 7 <br> <strong> Berlin </strong>

Page 4: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

4

Why Extensible Markup Language )

Language It has a grammarIt has a vocabulary (sort of)It can be parsed by machines

Markup Language A mechanism to identify structures in a document. It says what things are; not what they do

It is not a programming languageIt is not compiled

Extensible You can add words to the language

Page 5: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

5

XML describes structure and semantics, not formatting

XML documents form a tree Element and attribute names reflect the

kind of the element Formatting can be added with a style

sheet

Page 6: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

6

So Is XML Just Like HTML?

Discussion Question ?

Page 7: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

7

Answer : No In HTML, both the tag semantics and the tag

set are fixed. XML specifies neither semantics nor a tag set XML lets you define your own tags HTML describes lay-out XML describes the structure of a document XML separates content from presentation

Page 8: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

8

So IS XML Just Like SGML?

No. Well, yes, sort of ! XML is a much-restricted form of SGML It is defined as an application profile of

SGML. SGML is not well suited to serving

documents over the web

Page 9: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

9

So Why XML ?

XML was created so that richly structured documents could be used over the web

HTML -- Bound with a set of semantics , no arbitrary structure

SGML provides arbitrary structure, but is too difficult to implement just for a web browser

Page 10: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

10

What is the advantage of using XML ?

Discussion Question ?

Page 11: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

11

A Simple XML Document

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE NEWSPAPER SYSTEM "newspaper.dtd"><NEWSPAPER>

<ARTICLE EDITOR="Ernie Pyle" DATE="11/15/98" EDITION="Evening" AUTHOR="Jane Doe"><HEADLINE>Extensible Markup Language Proposed</HEADLINE><BYLINE>Jane Doe, Staff Writer</BYLINE><LEAD>The newly proposed XML Specification has been making a splash in the

community.</LEAD><BODY>The newly proposed XML draft stands to revolutionize the exchange of

easily.</BODY><NOTES>No Notes</NOTES>

</ARTICLE><ARTICLE AUTHOR="John Doe" EDITOR="Ernie Pyle" DATE="02/15/98" EDITION="Morning">

<HEADLINE>XML 1.0 Recommendation Released</HEADLINE><BYLINE>John E. Doe, Reporter</BYLINE><LEAD>The W3C today released the final recommendation for XML</LEAD><BODY>XML Developers, are already using the released recommendation </BODY><NOTES>See www.w3c.org for more information</NOTES>

</ARTICLE></NEWSPAPER>

Page 12: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

12

Characteristics The document begins with a processing

instruction: <?XML ...?>. Open and close all tags Empty tags end with /> There is a unique root element Elements may not overlap Attribute values are quoted < and & are only used to start tags and entities

Page 13: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

13

Elements Most common form of markup. Example: Article, Headline, Byline are all

elements Delimited by angle brackets, most elements

identify the nature of the content they surround. Some elements may be empty i.e they’ve no

content. A non-empty element always begins with a

start-tag, <element>, and ends with an end-tag, </element>.

Page 14: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

14

Attributes

Attributes are name-value pairs that occur inside tags after the element name.

For example, < ARTICLE EDITOR="Ernie Pyle "> is the

Article element with the attribute Editor having the value Ernie Pyle.

In XML, all attribute values must be quoted.

Page 15: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

15

Entity References

Entities are used to represent special characters like left angle bracket, “<”

They’re also used to refer to often repeated or varying text and to include the content of external files.

Every entity must have a unique name Entity references begin with the ampersand

and end with a semicolon.

Page 16: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

16

Declaring & Referencing Entities <!ENTITY NEWSPAPER "Vervet Logic

Times"> Using &NEWSPAPER anywhere in the

document inserts “Vervet Logic Times” at that location.

Internal entities allows you to define shortcuts for frequently typed text or text that is expected to change, such as the revision status of a document.

Page 17: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

17

Comments

Comments begin with “<!--” and end with “-->”.

Comments can contain any data except the literal string “--”.

Comments are not part of the textual content of an XML document. An XML processor is not required to pass them along to an application

Page 18: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

18

DTD ( Document Type Definition )

Formally identifies the relationships between the various elements that form the document.

Can express constraints on the sequence and nesting of tags.

Can express constraints on attribute values and their types and defaults

The names of external files that may be referenced , the formats of some external (non-XML) data that may be included, and entities that may be encountered.

Page 19: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

19

<!-- A Sample Newspaper Article DTD -->

<!ENTITY NEWSPAPER "Vervet Logic Times">

<!ENTITY PUBLISHER "Vervet Logic Press">

<!ENTITY COPYRIGHT "Copyright 1998 Vervet Logic Press">

<!ELEMENT NEWSPAPER (ARTICLE+)>

<!ELEMENT ARTICLE (HEADLINE, BYLINE+, LEAD, BODY, NOTES?)>

<!ATTLIST ARTICLE AUTHOR CDATA #REQUIRED

EDITOR CDATA #IMPLIED

DATE CDATA #IMPLIED

EDITION CDATA #IMPLIED>

<!ELEMENT HEADLINE (#PCDATA)>

<!ELEMENT BYLINE (#PCDATA)>

<!ELEMENT LEAD (#PCDATA)>

<!ELEMENT BODY (#PCDATA)>

<!ELEMENT NOTES (#PCDATA)>

ELEMENT symbols

* as many times as need

+ at least once? once or not at all, must be in listed order| either one or other, any order

ATRIBUTE option

# REQUIRED – must be

# IMPLIED – can be

Attribute Data Type

CDATA – character dataENUMARATED – list of valuesID – Unique IDIDREF, IDREFS – referred valueENTITY, ENTITIES – binary dataNMTOKEN, NMTOKENS, NOTATION

ELEMENT Data Type

# PCDATA – any characters

The Newspaper DTD

Page 20: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

20

Types of declarations in XML

Element declarations Attribute list declarations Entity declarations Notation declarations.

Page 21: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

21

Element Declarations

Identifies the names of elements and the nature of their content

Example

<!ELEMENT ARTICLE (HEADLINE, BYLINE+, LEAD, BODY, NOTES?)>

An Article must contain Headline,Byline,Lead, Body and may contain Notes

Page 22: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

22

ELEMENT DATA TYPE ( PCDATA ) Parseable character data Example : <!ELEMENT Byline (#PCDATA | quote)*>

<!ELEMENT Body (#PCDATA)*> The vertical bar indicates an “or” relationship The asterisk indicates that the content is optional

(may occur zero or more times) Byline may contain zero or more characters and

quote tags.

Page 23: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

23

Attribute Declarations

Identify which elements may have attributes

What attributes they may have What values the attributes may hold What default value each attribute has.

Page 24: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

24

Attributes : Example <!ATTLIST ARTICLE

AUTHOR ID #REQUIRED

EDITOR CDATA #IMPLIED

STATUS ( funny | notfunny ) 'funny'>

• Author, which is an ID and is required;

• Editor, which is a string is not required

• Status, which must be either funny or notfunny and defaults to funny if not specified.

Page 25: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

25

Types of Attributes

CDATA ID IDREF or IDREFS ENTITY or ENTITIES NMTOKEN or NMTOKENS A list of names

Page 26: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

26

Types of Default Values

#REQUIRED #IMPLIED "value" #FIXED "value"

Page 27: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

27

Notation Declarations

Identify specific types of external binary data.

This information is passed to the processing application, which may make whatever use of it .

A typical notation declaration is:

<!NOTATION GIF87A SYSTEM "GIF">

Page 28: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

28

XML-QL: A Query Language for XML

Designed in the AT&T Labs XML-QL has SELECT-WHERE construct, like

SQL It borrows features of query languages recently

developed by the database research community for semi-structured data.

XML-QL can express queries, which extract pieces of data from XML documents

Page 29: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

29

Features of XML-QL

Declarative : like SQL. Relational complete : It can express joins. Easy implementation Data Extraction: XML-QL can extract data from

existing XML documents and construct new XML documents.

Views: Supports both ordered and unordered views on an XML document.

Availability : XML-QL is implemented as a prototype and is freely available in a Java version.

Page 30: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

30

Features of XML-QL Path Expressions: Supports partially specified path

expressions . Building new Elements: Supports creation of new

elements Combining Data Sources: Supports querying several

data sources at the same time Negation: XML-QL doesn’t support negation Aggregation: Doesn’t support aggregate functions like

min, max, sum, count and avg . Update Language: XML-QL doesn’t provide any support

for insert, delete and update of elements

Page 31: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

31

Queries in XML-QL

Query 1: Produce all editors of the articles where author is John Doe

Feature Exploited: Selection, Projection and Data Extraction on element values

Page 32: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

32

Query Function query() {CONSTRUCT<result> { WHERE <NEWSPAPER.ARTICLE> <AUTHOR><NAME>"John Doe"</></> <EDITOR>$b</> </> IN "newspaper.xml" CONSTRUCT $b } </result>}

Page 33: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

33

Query Output

 

OUTPUT: <?xml version="1.0" encoding="UTF-8"?><result> <NAME>Ernie Pyle</NAME></result> 

Page 34: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

34

Explanation

This query matches every <ARTICLE> element in the XML document newspaper.xml that has atleast one <author> element and a <editor> element and author name is “John Doe”. For each such match, it binds the variable b to the editor. The result is the list of editors bound to b.

Page 35: 1 XML and QUERY Shilpi Ahuja CSE 591 - Data Mining 4 th April 2002.

35

Discussion Question ?

Can XML be used for things besides the Internet?