Top Banner
1 4 Approaches to Structuring Lists February 22, 200
41

1 4 Approaches to Structuring Lists February 22, 2009.

Mar 27, 2015

Download

Documents

Wyatt Andersen
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 4 Approaches to Structuring Lists February 22, 2009.

1

4 Approaches to Structuring Lists

February 22, 2009

Page 2: 1 4 Approaches to Structuring Lists February 22, 2009.

2

Lists are everywhere

• A list of countries

• A list of religions

• A list of weights

• A list of students

• A list of days of the week

• A list of planets

Page 3: 1 4 Approaches to Structuring Lists February 22, 2009.

3

The purpose of this document is to answer these questions

• What are the different approaches to structure lists?

• What are the pros and cons of each approach?

• Is there a way to structure lists to maximize their utility and minimize their overhead?

Page 4: 1 4 Approaches to Structuring Lists February 22, 2009.

4

Lists should be usable for multiple purposes

Page 5: 1 4 Approaches to Structuring Lists February 22, 2009.

5

Example

• We will use a country list to illustrate the four approaches.

Page 6: 1 4 Approaches to Structuring Lists February 22, 2009.

6

Some ways we mightuse a country list

• Use it as values in an XForms pick list

• Merge it with other data to create a document that contains, for each country, sales figures (or death rates, births, political leadership, religions, etc)

• Use it to validate an element's content

countrylist

<country-visited>_______</country-visited>

validate

Page 7: 1 4 Approaches to Structuring Lists February 22, 2009.

7

Approach #1Express lists using the XML Schema vocabulary

Page 8: 1 4 Approaches to Structuring Lists February 22, 2009.

8

<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.countries.org" xmlns="http://www.countries.org" elementFormDefault="qualified">

<xs:element name="countries" type="countriesType" />

<xs:simpleType name="countriesType"> <xs:restriction base="xs:string"> <xs:enumeration value="Afghanistan"/> <xs:enumeration value="Albania"/> <xs:enumeration value="Algeria"/> ... </xs:restriction> </xs:simpleType></xs:schema>

Page 9: 1 4 Approaches to Structuring Lists February 22, 2009.

9

Approach #2Express lists using the RELAX NG vocabulary

Page 10: 1 4 Approaches to Structuring Lists February 22, 2009.

10

<?xml version="1.0" encoding="UTF-8"?><grammar xmlns="http://relaxng.org/ns/structure/1.0" ns="http://www.countries.org">

<define name="countriesElement"> <element name="countries"> <ref name="countriesType" /> </element> </define>

<define name="countriesType"> <choice> <value>Afghanistan</value> <value>Albania</value> <value>Algeria</value> ... </choice> </define></grammar>

Page 11: 1 4 Approaches to Structuring Lists February 22, 2009.

11

Approach #3Express lists using domain-specific vocabularies.

The markup comes from terminology used by Subject Matter Experts (SMEs)

Page 12: 1 4 Approaches to Structuring Lists February 22, 2009.

12

<?xml version="1.0" encoding="UTF-8"?><countries xmlns="http://www.countries.org">

<country>Afghanistan</country> <country>Albania</country> <country>Algeria</country> ...</countries>

Page 13: 1 4 Approaches to Structuring Lists February 22, 2009.

13

Approach #4Express lists using a generic list vocabulary

Page 14: 1 4 Approaches to Structuring Lists February 22, 2009.

14

<?xml version="1.0" encoding="UTF-8"?><List> <Identifier>http://www.countries.org</Identifier> <li>Afghanistan</li> <li>Albania</li> <li>Algeria</li> ...</List>

Page 15: 1 4 Approaches to Structuring Lists February 22, 2009.

15

Analysis of Each Approach

Page 16: 1 4 Approaches to Structuring Lists February 22, 2009.

16

Approach #1 & Approach #2

• Approach #1 and approach #2 make it easy to use a list for validation purposes. A schema simply imports the list schema and then the lists' values are immediately available for validating element content.

• Here is an XML Schema that imports the country list XML Schema and uses its simpleType as the datatype for the <country-visited> element:

<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.example.org" xmlns:c="http://www.countries.org" elementFormDefault="qualified">

<xs:import namespace="http://www.countries.org" schemaLocation="countries.xsd" />

<xs:element name="country-visited" type="c:countriesType" />

</xs:schema>

Page 17: 1 4 Approaches to Structuring Lists February 22, 2009.

17

Approach #1 & Approach #2

• Here is a RELAX NG schema that includes the country list RELAX NG schema and uses its define element as the datatype for the <country-visited> element:

<?xml version="1.0" encoding="UTF-8"?><grammar xmlns="http://relaxng.org/ns/structure/1.0" ns="http://www.example.org">

<include href="countries.rng"/>

<start> <element name="country-visited"> <ref name="countriesType" /> </element> </start>

</grammar>

Page 18: 1 4 Approaches to Structuring Lists February 22, 2009.

18

Approach #1 & Approach #2

• If the schema doing the importing is an XML Schema then it can't use the list if it's expressed using RELAX NG. And vice versa.

countrylist

(xsd)

countrylist

(rng)

Schema(xsd)

Schema(rng)

Page 19: 1 4 Approaches to Structuring Lists February 22, 2009.

19

Approach #1 & Approach #2

• Although these two approaches enable the efficient usage of lists for validation, they are not the most efficient format for the myriad other ways that a list may be used (rendering in a pick list, merging with other lists, searching, and so forth). This is discussed further in the below analysis of approach #3.

Page 20: 1 4 Approaches to Structuring Lists February 22, 2009.

20

Approach #3

• Recall that approach #3 uses domain-specific terminology. This can be helpful to Subject Matter Experts (SMEs) as they maintain the lists.

• Validation can be accomplished using a Schematron schema. Here is a Schematron schema which validates that the content of the <country-visited> element matches one of the values in the country list:

<?xml version="1.0"?><sch:schema xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:ns uri="http://www.countries.org" prefix="c" /> <sch:ns uri="http://www.example.org" prefix="ex" />

<sch:pattern name="Country List Check"> <sch:rule context="ex:country-visited"> <sch:assert test=". = document('countries.xml')//c:country"> The value of country-visited must be one of the countries in the countries' list. </sch:assert> </sch:rule> </sch:pattern>

</sch:schema>

Page 21: 1 4 Approaches to Structuring Lists February 22, 2009.

21

Approach #3

• With approach #3 the markup used to construct the list has semantics specific to the list:

{http://www.countries.org}countries{http://www.countries.org}country

• This makes possible the creation of programs that are readily understood, as they use terminology consistent with the domain. For example, the XSLT program on the following slide uses the country list to generate an HTML list of all countries

Page 22: 1 4 Approaches to Structuring Lists February 22, 2009.

22<?xml version="1.0"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:c="http://www.countries.org" version="2.0"> <xsl:output method="html"/>

<xsl:template match="c:countries">

<html> <head> <title>Countries of the World</title> </head> <body> <ol> <xsl:apply-templates /> </ol> </body> </html>

</xsl:template>

<xsl:template match="c:country">

<li> <xsl:value-of select="." /> </li>

</xsl:template>

</xsl:stylesheet>

Note the template match values. They match on: {http://www.countries.org}countries {http://www.countries.org}country

Page 23: 1 4 Approaches to Structuring Lists February 22, 2009.

23

Contrast with Approach #1 and Approach #2

• Conversely, with approach #1 and approach #2 the markup used to construct the list has semantics that are specific to the schema language:

{http://www.w3.org/2001/XMLSchema}element {http://www.w3.org/2001/XMLSchema}simpleType {http://www.w3.org/2001/XMLSchema}enumeration

{http://relaxng.org/ns/structure/1.0}define {http://relaxng.org/ns/structure/1.0}choice {http://relaxng.org/ns/structure/1.0}value

• Consequently programs must operate using schema terminology rather than domain terminology. For example, the XSLT program on the following slide generates an HTML list of all countries from the countries list specified by the XML Schema document

Page 24: 1 4 Approaches to Structuring Lists February 22, 2009.

24<?xml version="1.0"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0"> <xsl:output method="html"/>

<xsl:template match="xs:simpleType">

<html> <head> <title>Countries of the World</title> </head> <body> <ol> <xsl:apply-templates /> </ol> </body> </html>

</xsl:template>

<xsl:template match="xs:enumeration">

<li> <xsl:value-of select="@value" /> </li>

</xsl:template></xsl:stylesheet>

Note the template match values. Rather than the XSLT program operating on <countries> and <country> elements, it operates on <schema>, <simpleType>, <restriction>, and <enumeration> elements. This makes programming challenging and error-prone.

Page 25: 1 4 Approaches to Structuring Lists February 22, 2009.

25

Approach #3

• With approach #3 a list can be used as a building block (data component) which can be immediately dropped into other documents to create compound documents. For example, consider a list of religions, also structured using approach #3:

<?xml version="1.0" encoding="UTF-8"?><religions xmlns="http://www.religions.org">

<religion>Baha'i</religion> <religion>Buddhism</religion> <religion>Catholicism</religion> ...

</religions>

Page 26: 1 4 Approaches to Structuring Lists February 22, 2009.

26

Approach #3

• It's easy to construct a compound document comprised of the country and religion lists:

<?xml version="1.0" encoding="UTF-8"?><religions-per-country> <countries xmlns="http://www.countries.org"> <country>Afghanistan</country> <country>Albania</country> <country>Algeria</country> ... </countries> <religions xmlns="http://www.religions.org"> <religion>Baha'i</religion> <religion>Buddhism</religion> <religion>Catholicism</religion> ... </religions> <!-- markup that maps religions to countries --></religions-per-country>

Page 27: 1 4 Approaches to Structuring Lists February 22, 2009.

27

Approach #3• Due to the modularity provided by approach #3, it is possible to perform

list-specific processing on this compound document. That is, a country-list-aware application would be able to extract the country list from this compound document and process it. Ditto for a religion-list-aware application.

<?xml version="1.0" encoding="UTF-8"?><religions-per-country> <countries xmlns="http://www.countries.org"> <country>Afghanistan</country> <country>Albania</country> <country>Algeria</country> ... </countries> <religions xmlns="http://www.religions.org"> <religion>Baha'i</religion> <religion>Buddhism</religion> <religion>Catholicism</religion> ... </religions> <!-- markup that maps religions to countries --></religions-per-country>

country-list-awareapplication

religion-list-awareapplication

Page 28: 1 4 Approaches to Structuring Lists February 22, 2009.

28

Constrast with Approach #1 and Approach #2

• With approach #1 and approach #2 the XML vocabulary used to construct the list is the same regardless of the list. Here is the compound document using lists that are defined using the XML Schemas vocabulary:

<?xml version="1.0" encoding="UTF-8"?><religions-per-country> <xs:simpleType xmlns:xs="http://www.w3.org/2001/XMLSchema" name="countriesType"> <xs:restriction base="xs:string"> <xs:enumeration value="Afghanistan"/> <xs:enumeration value="Albania"/> <xs:enumeration value="Algeria"/> ... </xs:restriction> </xs:simpleType> <xs:simpleType xmlns:xs="http://www.w3.org/2001/XMLSchema" name="religionsType"> <xs:restriction base="xs:string"> <xs:enumeration value="Baha'i"/> <xs:enumeration value="Buddhism"/> <xs:enumeration value="Catholicism"/> ... </xs:restriction> </xs:simpleType> <!-- markup that maps religions to countries --></religions-per-country>

Applications can't distinguish the country list from the religion list. The namespace used by the country list cannot be distinguished from the namespace used by the religion list. Thus, the benefits namespaces provide in terms of modularity are negated. It is not easy to create country-list-aware applications or religion-list-aware applications.

Page 29: 1 4 Approaches to Structuring Lists February 22, 2009.

29

Approach #3

• Approach #3 has minimal markup overhead.

Page 30: 1 4 Approaches to Structuring Lists February 22, 2009.

30

Approach #4

• In this approach the vocabulary is not customized for a specific list as with approach #3; rather, it is a vocabulary for any list.

• An element in an XML instance document can be validated against the list using Schematron in the same manner described in Approach #3.

• With the other approaches, the vocabulary is identified via a namespace. Approach #4 doesn't use namespaces; instead, it uses data to identify the list:

<Identifier>http://www.countries.org</Identifier>

This data indicates that this is a list of countries

Page 31: 1 4 Approaches to Structuring Lists February 22, 2009.

31

Identifying a Vocabulary via a Namespace versus

Identifying a Vocabulary via a data

Page 32: 1 4 Approaches to Structuring Lists February 22, 2009.

32

Identifying a Vocabulary via aNamespace

One way of identifying an XML building block (data component) is by namespace. For example, this list component is identified by the namespace http://week.days.org

<DaysOfTheWeek xmlns="http://week.days.org"> <Day>Sunday</Day> <Day>Monday</Day> <Day>Tuesday</Day> <Day>Wednesday</Day> <Day>Thursday</Day> <Day>Friday</Day> <Day>Saturday</Day> </DaysOfTheWeek>

This list is identified by the namespace http://meetings.org

<Meetings xmlns="http://meetings.org"> <Meeting>Dentist</Meeting> <Meeting>Doctor</Meeting> <Meeting>Boss</Meeting> </Meetings>

Applications can be built that are namespace-aware.

Different data components can be mashed together into a single document and still be extracted and processed individually because each is in a namespace.

Page 33: 1 4 Approaches to Structuring Lists February 22, 2009.

33

Identifying a Vocabulary via Data

There is an alternate way of identifying an XML building block (data component): by embedding an identifier within the document, as data. The weekday list could be expressed like this:

<List> <Identifier>http://week.days.org</Identifier> <li>Sunday</li> <li>Monday</li> <li>Tuesday</li> <li>Wednesday</li> <li>Thursday</li> <li>Friday</li> <li>Saturday</li> </List>

And the meetings list could be expressed like this:

<List> <Identifier>http://meetings.org</Identifier> <li>Dentist</li> <li>Doctor</li> <li>Boss</li> </List>

Page 34: 1 4 Approaches to Structuring Lists February 22, 2009.

34

Cont.

Things to note on the previous slide:

1. Namespaces are not being used.

2. The list is identified by the content of <Identifier>

3. The same XML vocabulary is used for both lists. (In fact, the same XML vocabulary is used for all lists)

The two lists can be brought together into a single document and still be processed individually. Applications partition the document based on the value in <Identifier>

Page 35: 1 4 Approaches to Structuring Lists February 22, 2009.

35

Analysis

• The namespace approach has the benefit of being widely adopted. Most XML tools, parsers, and technologies are based on namespaces. For example, NVDL is entirely based on using namespaces to partition a compound document; an XSLT processor processes a document based on the XSLT namespace.

Page 36: 1 4 Approaches to Structuring Lists February 22, 2009.

36

Cont.

• By using data to identify a list (rather than namespaces) the same XML vocabulary can be used for all lists which makes all list-processing algorithms and code independent of the content, allowing one to leverage a single investment in software and access all code lists.

• However, that raises an interesting question: is content-specific processing easier when lists are expressed using a domain-specific vocabulary or when lists are expressed using a generic vocabulary?

Page 37: 1 4 Approaches to Structuring Lists February 22, 2009.

37

Analysis of all Approaches

• Regardless of which approach is used, the meaning of the list and its values must be clearly documented. It may be challenging to achieve consensus on meaning:– The same terminology may be used by different people to mean the

same thing. For example, one person expects to see Puerto Rico in a country list, whereas another person does not. This is because one person defines "country" only as principal sovereignties whereas another person defines "country" to include territories and protectorates.

– Further, some people use different terminology to mean the same thing. For example, one person calls it "country" another calls it "principality."

• With all approaches the issue arises of which terminology and definitions to adopt.

Page 38: 1 4 Approaches to Structuring Lists February 22, 2009.

38

Recommendation

• Each of the four approaches has pros and cons so, as always, be sure to understand the alternatives and decide which is best for your situation.

Page 39: 1 4 Approaches to Structuring Lists February 22, 2009.

39

genericode

• genericode is a standardized generic list vocabulary [1]. That is, it is an example of approach #4.

• Here's the idea behind the design of genericode's vocabulary:– Oftentimes when creating a list there are multiple ways to

express each value in the list. For example, in a list of countries we may express the first value as Afghanistan or AF. genericode permits each value to be expressed in multiple ways. Thus, the list is expressed in terms of rows and columns - each row has a column for the multiple ways to express a list value.

[1] http://docs.oasis-open.org/codelist/cd-genericode-1.0/doc/oasis-code-list-representation-genericode.pdf

Page 40: 1 4 Approaches to Structuring Lists February 22, 2009.

40

<gc:CodeList xmlns:gc="http://docs.oasis-open.org/codelist/ns/genericode/1.0/"> <Identification>uniquely identify this list here</Identification> <SimpleCodeList> <Row> <Value> <SimpleValue>AF</SimpleValue> </Value> <Value> <SimpleValue>AFGHANISTAN</SimpleValue> </Value> </Row> <Row> <Value> <SimpleValue>AL</SimpleValue> </Value> <Value> <SimpleValue>ALBANIA</SimpleValue> </Value> </Row> ... </SimpleCodeList></gc:CodeList>

Page 41: 1 4 Approaches to Structuring Lists February 22, 2009.

41

Acknowledgements

• Thanks to the following people for contributing to this document:– Roger Costello– Bruce Cox– Ken Holman– Rick Jelliffe– Michael Kay– Rob Simmons