Physical and Logical Structure SNU IDB Lab.. XML Documents 1 : structure Peeping into XML document at Physical view : Entity at logical view : DTD.
Post on 21-Dec-2015
234 Views
Preview:
Transcript
Physical and Logical Struc-ture
SNU IDB Lab.
2
XML Documents 1 : structure Peeping into XML document at Physical view : Entity at logical view : DTD
3
Peeping into XML document(1/5)
<?xml version=“1.0” standalone=“yes”?>
<GREETING> Hello, XML!! <!--this is greeting--></GREETING>
Mark-updata
Mark-up and character data
4
Peeping into XML document(2/5)
<? xml version=“1.0” standalone=“yes” ?>
<!DOCUMENT DATE [ <!ELEMENT DATE (#PCDATA)>] >
<DATE> 001224</DATE>
XML document : date.xml
XML declarationxml 문서임을 선언 .<? 로 시작하여 ?> 로 끝난다 .
DTD(Document Type Defini-tion)user 가 사용할 tag 를 정의한다 .여기서는 DATE tag 를 정의 .
Content
<!--This is date --> Comment : parser 는 이를 무시 .
5
Peeping into XML document(3/5) Structure of XML document
– physical structure : allows components of the document, called entities
– logical structure : allows a document to be divided into named units and sub-units,
called elements
Sub-unit
Unit
Document
elements
Logical Structure
entities(internal)(separate)
Physical Structure
5
Peeping into XML document(4/5)
7
Peeping into XML document(5/5)
<person>
<name> kim </name>
<ID>771224</ID>
<office>301-453</office>
<phone>1830</phone>
<photo source=“k.jpg”/>
</person>
<person>
<name> kim </name>
<ID>771224</ID>
<office>301-453</office>
<phone>1830</phone>
<photo source= />
</person>
“k.jpg”
element
entity
8
XML Documents 1 : structure Peeping into XML document at Physical view : Entity at logical view : DTD
9
Content of Physical structure Entity Figures of Document Entity Defining an entity Grammar in Declaring Entity Examples of EntityDeclaration URL format
Entity (1/3)– unit of physically isolating and storing any part of a docu-
ment ( 정보저장단위 )– Each unit of information is called an entity
entities(internal)(separate)
Physical Structure
<person><name> kim </name>
<ID>771224</ID>
<office>301-453</office>
<phone>1830</phone>
<photo source= />
</person>
“k.jpg”
entity
SNUOOPSLA Lab.
11
Entity (2/3) Purpose of Entity
– contain all the information– (well-formed XML data , other text file, binary data…)
<person><name> kim </name> <ID>771224</ID>
<office>301-453</office><phone>1830</phone>
<photo source= />
</person>“k.jpg”
Document entity
Image entity
12
Entity (3/3) Internal Entity
– 해당 document 안에서 완전하게 정의되는 entity
External Entity– URL 을 통해 알려진 외부의 source 로부터 그들의 content 를 받아
오는 entity
13
Figures of Document Entity
document entity(no entities)
document entity(main content)
A
A
B
C
D
document entity(framework file)
14
Defining an entity Entity must be defined before the first reference to
them in the data stream Declared in the DTD(Document Type Definition)
<!DOCTYPE DOCUMENT [
<!ENTITY EMAIL “sjlee@oopsla.snu.ac.kr”> <!ENTITY TEXT “(#PCDATA)”>
]>
Entity definition in DTD
15
Example : EntityDeclaration(1/3) Internal text entities
– <!ENTITY XML “eXtensible Markup Language”>– <!ENTITY DemoEntity ‘The rule is 6” long.’>
Built-in entities ( 내장 entity)– <!ENTITY sample “Use " and ‘as delimiters.”>
&li; >&'"
for ‘<‘for ‘>’for ‘&’for ‘ ’ ’for ‘ ” ’;
16
Example : EntityDeclaration(2/3) External text entities
– <!ENTITY myent SYSTEM “/EMTS/MYENT.XML”>– <!ENTITY myent PUBLIC “-//MyCorp//ENTITY Syperscript
Chars//EN”….>
Binary entities– <!ENTITY Jsphoto SYSTEM “/ENTS/Jsphoto.tif” NDATA “TIFF”>
Example : EntityDeclaration(3/3) URL format
<!ENTITY ent9 SYSTEM “en-tities/entity9.xml”> /xml/document.xml/enti-ties/entity9.xml
<!ENTITY ent9 SYSTEM “../entities/entity9.xml”>
/xml/docs/document.xml/ entities/en-tity9.xml
xml
document.xmlentities
entity9.xml
xml
entities
entity9.xml
docs
document.xml
18
XML Documents 1 : structure Peeping into XML document at Physical view : Entity at logical view : DTD
19
Content of Logical structure Concepts DTD Structure Element Declaration Attribute Declarations Parameter Entities Conditional Sections Notation Declarations DTD Processing Issues
20
Concepts of DTD(1/3) DTD(Document Type Definition)
– An optional but powerful feature of XML– Comprises a set of declarations that define a document
structure tree– XML processors read the DTD and check whether the docu-
ment is valid and use it to build the document model in memory
– Describes user’s own tag set as meta markup language
21
Concepts of DTD(2/3) DTD describes..
– Element , attribute , notation , relation between each ele-ments
Establishes formal document structure rules
22
Concepts of DTD(3/3) Declare Vs. Define
– Declare “This document is a concert poster”– Define “A concert poster must have the following features”
DTD define– Element type + Attribute + Entities
Valid Vs. Invalid– Valid conforms to DTD– Invalid fail to conform to DTD
Well formed XML Document
Valid XML Document
23
Valid & Invalid Documents
– Valid:– <GREETING>– various random text but no markup– </GREETING>– Invalid: anything else including– <GREETING>– <sometag>various random text</sometag>– <someEmptyTag/>– <GREETING>
Example: <!DOCTYPE GREETING[<ELEMENT GREETING (#PCDATA)>]>
24
DTD structure DTD is composed of a number of declarations
– ELEMENT (tag definition)– ATTLIST (attribute definitions)– ENTITY (entity definition)– NOTATION(data type notation definition)
DTD can be stored in an external subset or an inter-nal subset
25
Internal and External Subset(1/3) Internal subset
– Form : – <!DOCTYPE … [– <!-- Internal Subset --> – …– ]>– Pros
Easy to write XML
– Cons Editing two files without moving Other document can’t reuse without copying internal subset
26
Internal and External Subset(2/3) External subset
– better to use external DTDs– Reason why?
Many benefits– document management– updating– editing
Few reasons– If you use an external DTD, you can use public DTDs(capability)– External DTDs provide for better document management– External DTDs make it easier to validate you document
27
Internal and External Subset(3/3)
internal
external
Internal subset
external subset
full parsing path
28
Element Declarations Used to define a new element, specify its allowed
content and gives the name and content model of the element
Each tag must be declared in a <!ELEMENT> declara-tion.
The content model uses a simple regular expression-like grammar to precisely specify what is and isn't al-lowed in an element
ELEMENT Type declaration ‘<!ELEMENT’ S Name S Contentspec S? ‘>’
29
Content Specifications ANY #PCDATA Sequences Choices Mixed Content Modifiers Empty
30
ANY A SEASON can contain any child element and/or raw
text (parsed character data)
Rarely used in practice, due to the lack of constraint on structure it encourages.
<!ELEMENT SEASON ANY>
31
#PCDATA Parsed Character Data; i.e. raw text, no markup Represent normal data and preceded by the hash-
symbol, ‘#’, to avoid confusion with an identical ele-ment name, when used within a model group( for example, ‘(#PCDATA | PCDATA)’)
<!ELEMENT YEAR (#PCDATA)>
32
Use of #PCDATA in XML
Valid: Invalid:<YEAR>1999</YEAR><YEAR>99</YEAR><YEAR>1999 .E.</YEAR><YEAR> The year of our Lord one thousand, nine hundred, and ninety-nine</YEAR>
<YEAR><MONTH>January</MONTH><MONTH>February</MONTH><MONTH>March</MONTH><MONTH>April</MONTH><MONTH>May</MONTH><MONTH>June</MONTH><MONTH>July</MONTH><MONTH>August</MONTH><MONTH>September</MONTH><MONTH>October</MONTH><MONTH>November</MONTH><MONTH>December</MONTH></YEAR>
33
Child Elements To declare that a LEAGUE element must have a
LEAGUE_NAME child:
<!ELEMENT LEAGUE (LEAGUE_NAME)> <!ELEMENT LEAGUE_NAME (#PCDATA)>
34
Sequences(1/2) Separate multiple required child elements with com-
mas; e.g.
One or More Children +
<!ELEMENT SEASON (YEAR, LEAGUE, LEAGUE)><!ELEMENT LEAGUE (LEAGUE_NAME, DIVISION, DIVISION, DIVISION)>
<!ELEMENT DIVISION_NAME (#PCDATA)><!ELEMENT DIVISION (DIVISION_NAME, TEAM+)>
35
Sequences(2/2) Zero or More Children *
Choices
<!ELEMENT TEAM (TEAM_CITY, TEAM_NAME, PLAYER*)><!ELEMENT TEAM_CITY (#PCDATA)><!ELEMENT TEAM_NAME (#PCDATA)>
<!ELEMENT PAYMENT (CASH | CREDIT_CARD)>
<!ELEMENT PAYMENT (CASH | CREDIT_CARD | CHECK)>
36
Grouping With Parentheses Parentheses combine several elements into a single
element. Parenthesized element can be nested inside other
parentheses in place of a single element. The parenthesized element can be suffixed with a
plus sign, a comma, or a question mark.
<!ELEMENT dl (dt, dd)*><!ELEMENT ARTICLE (TITLE, (P | PHOTO |GRAPH | SIDEBAR | PULLQUOTE | SUBHEAD)*, BYLINE?)>
37
Mixed Content
Both #PCDATA and child elements in a choice
#PCDATA must come first #PCDATA cannot be used in a sequence
<!ELEMENT TEAM (#PCDATA | TEAM_CITY | TEAM_NAME | PLAYER)*>
Empty elements <!ELEMENT BR EMPTY>
38
Attribute Declarations Consider this element:
It is declared like this:
<GREETING LANGUAGE="Spanish"> Hola!</GREETING>
<!ELEMENT GREETING (#PCDATA)><!ATTLIST GREETING LANGUAGE CDATA "English">
<!ATTLIST Element_name Attribute_name Type Default_value>
39
Multiple Attribute Declarations
Consider this element
With two attribute declarations:
With one attribute declaration Indentation is a convetion, not a requirement
<RECT LENGTH="70px" WIDTH="85px"/>
<!ELEMENT RECTANGLE EMPTY><!ATTLIST RECTANGLE LENGTH CDATA "0px"><!ATTLIST RECTANGLE WIDTH CDATA "0px">
<!ATTLIST RECTANGLE LENGTH CDATA "0px" WIDTH CDATA "0px">
40
Attribute Types
CDATA ID IDREF IDREFS ENTITY
ENTITIES NOTATION NMTOKEN NMTOKENS Enumerated
41
CDATA Most general attribute type
Value can be any string of text not containing a less-than sign (<) or quotation marks (")
42
ID Value must be an XML name
– May include letters, digits, underscores, hyphens, and peri-ods
– May not include whitespace– May contain colons only if used for namespaces
Value must be unique within ID type attributes in the document
Generally the default value is #REQUIRED
43
IDREF Value matches the ID of an element in the same doc-
ument Used for links and the like
IDREFS
A list of ID values in the same documentSeparated by white space
44
ENTITY Value is the name of an unparsed general entity de-
clared in the DTD
ENTITIES
Value is a list of unparsed general entities declared in the DTDSeparated by white space
45
NOTATION Value is the name of a notation declared in the DTD
<!NOTATION Tex SYSTEM “..\TEXVIEW.EXE”>
<!ENTITY Logo SYSTEM “LOGO.TEX” NDATA Tex>
TEXVIEW.EXE LOGO.TEX
1
2
3
4
46
NMTOKEN Value is any legal XML name
NMTOKENS
Value is a list of XML namesSeparated by white space
47
Enumerated Not a keyword Refers to a list of possible values from which one
must be chosen Default value is generally provided explicitly
<!ATTLIST P VISIBLE (TRUE | FALSE) "TRUE">
48
Attribute Default Values A literal string value One of these three keywords
– #REQUIRED– #IMPLIED– #FIXED
49
#REQUIRED No default value is provided in the DTD Document authors must provide attribute value for
each element
<!ELEMENT IMG EMPTY><!ATTLIST IMG ALT CDATA #REQUIRED><!ATTLIST IMG WIDTH CDATA #REQUIRED><!ATTLIST IMG HEIGHT CDATA #REQUIRED>
50
#IMPLIED No default value in the DTD Author may(but does not have to) provide a value
with each element
51
#FIXED Value is the same for all elements Default value must be provided in DTD Document author may not change default value
<!ELEMENT AUTHOR EMPTY><!ATTLIST AUTHOR NAME CDATA #REQUIRED><!ATTLIST AUTHOR EMAIL CDATA #REQUIRED><!ATTLIST AUTHOR EXTENSION CDATA #IMPLIED><!ATTLIST AUTHOR COMPANY CDATA #FIXED "TIC">
52
Example of Internal DTDs
<?xml version="1.0"?><!DOCTYPE GREETING [ <!ELEMENT GREETING (#PCDATA)>]><GREETING>Hello XML!</GREETING>
53
Internal DTD Subsets Internal declarations override external declarations
<?xml version="1.0"?><!DOCTYPE GREETING SYSTEM "greeting.dtd" [ <!ELEMENT GREETING (#PCDATA)>]><GREETING>Hello XML!</GREETING>
top related