Top Banner
Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents
32

Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

Dec 24, 2015

Download

Documents

Britton Carson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

Technical University of Valencia Computer Science Department

SOFSEM’07 (22/01/2007)

A Program Slicing Based Method to Filter XML/DTD documents

Page 2: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

2

Motivation

Program Slicing

XML• DTD• XSLT

Slicing XML Documents• Example

Implementation

Conclusions & Future Work

Contents

Program Slicing

Page 3: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

3

Program Slicing

• Definition:Definition: Program transformation to extract the program statements that (potentially) affect the values computed at some point of interest.

• Origin: Origin: Originally introduced by Weiser.

• Example:Example: (1) read(n); (2) i:=1;(3) sum:=0;(4) product:=1;(5) while (i<=n) do

begin(6) sum:=sum+i;(7) product:=product*i;(8) i:=i+1;

end;(9) write(sum);(10) write(product);

Slicing Criterion = (10, product)

Page 4: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

4

Program Slicing

• Definition:Definition: Program transformation to extract the program statements that (potentially) affect the values computed at some point of interest.

• Origin: Origin: Originally introduced by Weiser.

• Example:Example: (1) read(n); (2) i:=1;(3) sum:=0;(4) product:=1;(5) while (i<=n) do

begin(6) sum:=sum+i;(7) product:=product*i;(8) i:=i+1;

end;(9) write(sum);(10) write(product);

Slicing Criterion = (10, product)

Page 5: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

5

Program Slicing

• Applications:Applications: Debugging Code understanding Specialization etc.

All the applications are based on the Program Dependence Graphs (PDGs) (structure and behaviour of programs)

What would happen if Program Slicing was applied to a data structure? Would it be interesting?

Page 6: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

6

Motivation

Program Slicing

XML• DTD• XSLT

Slicing XML Documents• Example

Implementation

Conclusions & Future Work

Contents

XML

Page 7: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

7

XML

• Origin:Origin: XML was developed by an XML Working Group formed under the auspices of the World Wide Web Consortium (W3C) in 1996.

• Structure:Structure: Documents are trees composed by ‘ELEMENTS’ which contain attributes.

Example of XML document

XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)

Page 8: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

8

XML

• Objective:Objective: The purpose of a DTD is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements.

• Structure:Structure: Documents are graphs composed by ‘ELEMENTS’.

Example of DTD document

DTD DTD ((DDocument ocument TType ype DDefinition)efinition)

Page 9: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

9

<PersonalInfo><Contact>

<Status> Professor </Status> <Name> Ryan </Name><Surname> Gibson <Surname>

</Contact> <Teaching>

<Subject><Name> Logic </Name><Sched> Mon/Wed 16-18

</Sched><Course> 4-Mathematics

</Course></Subject><Subject>

<Name> Algebra </Name><Sched> Mon/Tur 11-13 </Sched><Course> 3-Mathematics

</Course></Subject> …

</Teaching><Research>

<Projectname = “SysLog’’year = “2003-2004’’budget = “16000€’’ />

...</Research>

</PersonalInfo>

<!ELEMENT PersonalInfo (Contact, Teaching, Research)>

<!ELEMENT Contact (Status, Name, Surname)>

<!ELEMENT Status ANY><!ELEMENT Name ANY><!ELEMENT Surname ANY><!ELEMENT Teaching (Subject+)><!ELEMENT Subject (Name, Sched,

Course)><!ELEMENT Sched ANY><!ELEMENT Course ANY><!ELEMENT Research (Project*)><!ELEMENT Project ANY><!ATTLIST Project

name CDATA #REQUIREDyear CDATA #REQUIREDbudget CDATA #IMPLIED

>

DTD DTD ((DDocument ocument TType ype DDefinition)efinition)XML XML (e(eXXtensible tensible MMarkup arkup LLanguage)anguage)

Page 10: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

10

XML

• Objective:Objective: XSLT is a language for transforming XML.

• Structure:Structure: An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary, such as (X)HTML or XSL-FO

• XSLT is a programming language

Example of XSLT document(Source Code)

XSLT XSLT (e(eXXtensible tensible SStylesheet tylesheet LLanguage anguage TTransformations)ransformations)

Example of XSLT document(Result)

Page 11: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

11

Motivation

Program Slicing

XML• DTD• XSLT

Slicing XML Documents• Example

Implementation

Conclusions & Future Work

Contents

Slicing XML Documents

Page 12: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

12

Slicing XML Documents

• We see XML documents and DTDs as trees.<PersonalInfo>

<Contact><Status> Professor </Status> <Name> Ryan </Name><Surname> Gibson <Surname>

</Contact> <Teaching>

<Subject><Name> Logic </Name><Sched> Mon/Wed 16-18 </Sched><Course> 4-Mathematics

</Course></Subject><Subject>

<Name> Algebra </Name><Sched> Mon/Tur 11-13 </Sched><Course> 3-Mathematics

</Course></Subject> …

</Teaching><Research>

<Projectname = “SysLog’’year = “2003-2004’’budget = “16000€’’ />

...</Research>

</PersonalInfo>

Logic Mon/Wed 16-184-Mathematics

Subject

Algebra Mon/Tur 11-133-Mathematics

Professor Ryan Gibson

Subject

Syslog 2003-2004 16000 €

Project

PersonalInfo

Contact Teaching Research

Page 13: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

13

Slicing XML Documents

• The Slicing Criterion is composed by a set of nodes in the tree.

• For each node in the slicing criterion, we extract from the tree all those nodes that are in the path from the root to the node.

Web Page(Original)

Web Page(Slice)

XML / DTD

Forward / Backward

Page 14: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

14

Name Sched Course

SubjectStatus Name Surname

Name Year Budget

Project

PersonalInfo

Contact Teaching Research

Slicing XML Documents

• DTD backward slicing criterion.<!ELEMENT PersonalInfo (Contact,

Teaching,

Research)><!ELEMENT Contact (Status,

Name,Surname)>

<!ELEMENT Status ANY><!ELEMENT Name ANY><!ELEMENT Surname ANY><!ELEMENT Teaching (Subject+)><!ELEMENT Subject (Name, Sched,

Course)><!ELEMENT Sched ANY><!ELEMENT Course ANY><!ELEMENT Research (Project*)><!ELEMENT Project ANY><!ATTLIST Project

name CDATA #REQUIREDyear CDATA #REQUIREDbudget CDATA #IMPLIED

>

Name Sched Course

SubjectStatus Name Surname

Name Year Budget

Project

PersonalInfo

Contact Teaching Research

<!ELEMENT PersonalInfo (Contact, Teaching,

Research)>

<!ELEMENT Contact (Status, Name,Surname)>

<!ELEMENT Status ANY><!ELEMENT Name ANY><!ELEMENT Surname ANY><!ELEMENT Teaching (Subject+)><!ELEMENT Subject (Name, Sched,

Course)><!ELEMENT Sched ANY><!ELEMENT Course ANY><!ELEMENT Research (Project*)><!ELEMENT Project ANY><!ATTLIST Project

name CDATA #REQUIREDyear CDATA #REQUIREDbudget CDATA #IMPLIED

>

Web Page(Original)

Web Page(Slice)

Page 15: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

15

Slicing XML Documents

• XML backward slicing criterion.<PersonalInfo>

<Contact><Status> Professor </Status> <Name> Ryan </Name><Surname> Gibson <Surname>

</Contact> <Teaching>

<Subject><Name> Logic </Name><Sched> Mon/Wed 16-18 </Sched><Course> 4-Mathematics

</Course></Subject><Subject>

<Name> Algebra </Name><Sched> Mon/Tur 11-13 </Sched><Course> 3-Mathematics

</Course></Subject> …

</Teaching><Research>

<Projectname = “SysLog’’year = “2003-2004’’budget = “16000€’’ />

...</Research>

</PersonalInfo>

Logic Mon/Wed 16-184-Mathematics

Subject

Algebra Mon/Tur 11-133-Mathematics

Professor Ryan Gibson

Subject

Syslog 2003-2004 16000 €

Project

PersonalInfo

Contact Teaching Research

<PersonalInfo><Contact>

<Status> Professor </Status> <Name> Ryan </Name><Surname> Gibson <Surname>

</Contact> <Teaching>

<Subject><Name> Logic </Name><Sched> Mon/Wed 16-18 </Sched><Course> 4-Mathematics

</Course></Subject><Subject>

<Name> Algebra </Name><Sched> Mon/Tur 11-13 </Sched><Course> 3-Mathematics

</Course></Subject> …

</Teaching><Research>

<Projectname = “SysLog’’year = “2003-2004’’budget = “16000€’’ />

...</Research>

</PersonalInfo>

Logic Mon/Wed 16-184-Mathematics

Subject

Algebra Mon/Tur 11-133-Mathematics

Professor Ryan Gibson

Subject

Syslog 2003-2004 16000 €

Project

PersonalInfo

Contact Teaching Research

Web Page(Original)

Web Page(Slice)

Page 16: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

16

Slicing XML Documents

• XML backward slicing criterion.

Logic Mon/Wed 16-184-Mathematics

Subject

Algebra Mon/Tur 11-133-Mathematics

Professor Ryan Gibson

Subject

Syslog 2003-2004 16000 €

Project

PersonalInfo

Contact Teaching Research

<PersonalInfo><Contact>

<Status> Professor </Status> <Name> Ryan </Name><Surname> Gibson <Surname>

</Contact> <Teaching>

<Subject><Name> Logic </Name><Sched> Mon/Wed 16-18 </Sched><Course> 4-Mathematics

</Course></Subject><Subject>

<Name> Algebra </Name><Sched> Mon/Tur 11-13 </Sched><Course> 3-Mathematics

</Course></Subject> …

</Teaching><Research>

<Projectname = “SysLog’’year = “2003-2004’’budget = “16000€’’ />

...</Research>

</PersonalInfo>

Web Page(Original)

Web Page(Slice)

Page 17: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

17

Slicing XML Documents

• We distinguish between DTD and XML slicing criterions.• XML slicing criterions are more fine-grained than DTD slicing criterions

• We distinguish between forward and backward slices (or a combination).

Web Page(Original)

Web Page(Slice)

XML / DTD

Forward / Backward

Page 18: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

18

Name Sched Course

SubjectStatus Name Surname

Name Year Budget

Project

PersonalInfo

Contact Teaching Research

Slicing XML Documents

• DTD backward slicing criterion.<!ELEMENT PersonalInfo (Contact,

Teaching,

Research)><!ELEMENT Contact (Status,

Name,Surname)>

<!ELEMENT Status ANY><!ELEMENT Name ANY><!ELEMENT Surname ANY><!ELEMENT Teaching (Subject+)><!ELEMENT Subject (Name, Sched,

Course)><!ELEMENT Sched ANY><!ELEMENT Course ANY><!ELEMENT Research (Project*)><!ELEMENT Project ANY><!ATTLIST Project

name CDATA #REQUIREDyear CDATA #REQUIREDbudget CDATA #IMPLIED

>

Name Sched Course

SubjectStatus Name Surname

Name Year Budget

Project

PersonalInfo

Contact Teaching Research

<!ELEMENT PersonalInfo (Contact, Teaching,

Research)>

<!ELEMENT Contact (Status, Name,Surname)>

<!ELEMENT Status ANY><!ELEMENT Name ANY><!ELEMENT Surname ANY><!ELEMENT Teaching (Subject+)><!ELEMENT Subject (Name, Sched,

Course)><!ELEMENT Sched ANY><!ELEMENT Course ANY><!ELEMENT Research (Project*)><!ELEMENT Project ANY><!ATTLIST Project

name CDATA #REQUIREDyear CDATA #REQUIREDbudget CDATA #IMPLIED

>

Web Page(Original)

Web Page(Slice)

Page 19: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

19

<PersonalInfo><Contact>

<Status> Professor </Status> <Name> Ryan </Name><Surname> Gibson <Surname>

</Contact> <Teaching>

<Subject><Name> Logic </Name><Sched> Mon/Wed 16-18 </Sched><Course> 4-Mathematics

</Course></Subject><Subject>

<Name> Algebra </Name><Sched> Mon/Tur 11-13 </Sched><Course> 3-Mathematics

</Course></Subject> …

</Teaching><Research>

<Projectname = “SysLog’’year = “2003-2004’’budget = “16000€’’ />

...</Research>

</PersonalInfo>

Logic Mon/Wed 16-184-Mathematics

Subject

Algebra Mon/Tur 11-133-Mathematics

Professor Ryan Gibson

Subject

Syslog 2003-2004 16000 €

Project

PersonalInfo

Contact Teaching Research

Slicing XML Documents

• XML forward slicing criterion.

Logic Mon/Wed 16-184-Mathematics

Subject

Algebra Mon/Tur 11-133-Mathematics

Professor Ryan Gibson

Subject

Syslog 2003-2004 16000 €

Project

PersonalInfo

Contact Teaching Research

<PersonalInfo><Contact>

<Status> Professor </Status> <Name> Ryan </Name><Surname> Gibson <Surname>

</Contact> <Teaching>

<Subject><Name> Logic </Name><Sched> Mon/Wed 16-18 </Sched><Course> 4-Mathematics

</Course></Subject><Subject>

<Name> Algebra </Name><Sched> Mon/Tur 11-13 </Sched><Course> 3-Mathematics

</Course></Subject> …

</Teaching><Research>

<Projectname = “SysLog’’year = “2003-2004’’budget = “16000€’’ />

...</Research>

</PersonalInfo>

Web Page(Original)

Web Page(Slice)

Page 20: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

20

<PersonalInfo><Contact>

<Status> Professor </Status> <Name> Ryan </Name><Surname> Gibson <Surname>

</Contact> <Teaching>

<Subject><Name> Logic </Name><Sched> Mon/Wed 16-18 </Sched><Course> 4-Mathematics

</Course></Subject><Subject>

<Name> Algebra </Name><Sched> Mon/Tur 11-13 </Sched><Course> 3-Mathematics

</Course></Subject> …

</Teaching><Research>

<Projectname = “SysLog’’year = “2003-2004’’budget = “16000€’’ />

...</Research>

</PersonalInfo>

Logic Mon/Wed 16-184-Mathematics

Subject

Algebra Mon/Tur 11-133-Mathematics

Professor Ryan Gibson

Subject

Syslog 2003-2004 16000 €

Project

PersonalInfo

Contact Teaching Research

Slicing XML Documents

• XML backward-forward slicing criterion.

Logic Mon/Wed 16-184-Mathematics

Subject

Algebra Mon/Tur 11-133-Mathematics

Professor Ryan Gibson

Subject

Syslog 2003-2004 16000 €

Project

PersonalInfo

Contact Teaching Research

<PersonalInfo><Contact>

<Status> Professor </Status> <Name> Ryan </Name><Surname> Gibson <Surname>

</Contact> <Teaching>

<Subject><Name> Logic </Name><Sched> Mon/Wed 16-18 </Sched><Course> 4-Mathematics

</Course></Subject><Subject>

<Name> Algebra </Name><Sched> Mon/Tur 11-13 </Sched><Course> 3-Mathematics

</Course></Subject> …

</Teaching><Research>

<Projectname = “SysLog’’year = “2003-2004’’budget = “16000€’’ />

...</Research>

</PersonalInfo>

Web Page(Original)

Web Page(Slice)

Page 21: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

21

Slicing XML Documents

• What happens with DTDs? Slices are well-formed, but are they valid?

• For each XML slice we produce a DTD slice and viceversa

• We guarantee that XML slices are valid with respect to DTD slices.

DTD

document

SlicerSlicerSlicerSlicer

XMLdocument

DTD Slicedocument

XML SlicedocumentSlicing CriterionSlicing Criterion

Page 22: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

22

Slicing XML Documents

• A simple slicing algorithm

Page 23: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

23

Slicing XML Documents

• In the case of a DTD criterion composed by a set of positions C = {p1…pn} Pos(D), the algorithm would be the same, except that the first loop would be:

For each v1.v2.(…).vn C do V’ := V’ {v1, v1.v2, …, v1.v2.(…).vn}; W’ := W’ {v1|i.v2|j.(…).vn|k} Where v1.v2.(…).vn v’ and v1|i.v2|j.(…).vn|k X

Both algorithms produce valid XML and DTD slices with respect to the slicing criterion

Page 24: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

24

Slicing XML Documents

The following theorem states the correctness of the technique:

Theorem Let D be a well-formed DTD and X a well-formed XML document valid with respect to D. Given a slice D’ of D and a slice X’ of X computed with an XML slicing criterion C, and given a slice D’’ of D and a slice X’’ of X computed with a DTD slicing criterion C’, then

a) D’ is well-formed and X’ is valid with respect to D’b) D’’ is well-formed and X’’ is valid with respect to D’’

If all the elements in C are of one of the types in C’, then

c) D’ = D’’d) X’ is a subtree of X’’

Page 25: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

25

Motivation

Program Slicing

XML• DTD• XSLT

Slicing XML Documents• Example

Implementation

Conclusions & Future Work

Contents

Implementation

Page 26: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

26

Implementation

We have implemented a prototype in Haskell.

Haskell provides us a formal basis with many advantages for the manipulation of XML documents.

- The HaXml library.

It allows us to automatically translate XML or HTML documents into a Haskell representation. In particular, we use the following data structures that can represent any XML/HTML document:

data Element = Elem Name [Attribute] [Content]data Attribute = (Name, Value)data Content = CElem Element

| CText String

Page 27: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

27

XML XSLT WebPage

(Data)

(Presentation)

Implementation

From XML slices to Webpage slices

XML XSLT WebPage

(Data)

(Presentation)

Page 28: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

28

Implementation

XSLT Implementation Guidelines

XSLT documents must generate the information and the presentation elements under the same conditions (i.e., the former is generated if and only if the later is generated).

Both the XML data and the presentation labels are generated together.

This does not imposes any restriction on the power of XSLT, since the same webpages can be generated. On the contrary, this way of programming forces the programmer to build transformations that canbe easily reused and maintained, because both the information and presentation data depending on the same condition are put together.

Page 29: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

29

Implementation

XSLT Implementation Guidelines

Page 30: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

30

Implementation

The implementation, some examples and other material is publicly available at:

www.dsic.upv.es/~jsilva/xml

Page 31: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

31

Motivation

Program Slicing

XML• DTD• XSLT

Slicing XML Documents• Example

Implementation

Conclusions & Future Work

Contents

Conclusions & Future Work

Page 32: Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.

32

Conclusions

We proposed the application of program slicing techniques to XML data structures

We defined an algorithm to slice XML and DTD documents

XML and DTD slices that are well-formed and valid Previous slicers can be used with a modest

implementation effort

Slicing Web Pages The slicer can use XSLT in order to slice webpages We proposed some guidelines to generate XSLT files

Future Work Migration to XML Schema New implementation based on XQuery