Top Banner
1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2
51

Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

Jun 04, 2018

Download

Documents

vuongdieu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

1

Validating XML Datawith an XML Schema

Date: May 2007Version: DRAFT 0.2

Page 2: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

2

Contents1. XML Validation Concepts

a. Conceptsb. Errorsc. Resources

2. Example: Validation with XMLSpya. Downloading Spyb. Creating a new XMLSpy Projectc. Associate the homestead XML Schema with a folderd. Open the file in XMLSpye. Add the active file to the folderf. Click the "Validate" button

3. Example: Manipulating Large XML Data Sets with Ant & Eclipsea. Tools for Records and Metadata vs. Tools for Datab. Apache Ant – DOS command linec. Eclipse – GUI interfaced. V – The File Viewer – Viewing large filese. XML databases

Page 3: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

3

Disclaimer• The information and examples in this document are for

demonstration purposes only.

• The information and examples presented are for your information to assist in enhancing the abilities of counties to work with and validate XML datasets with Minnesota Revenue XML schemas.

• The Minnesota Department of Revenue does not endorse nor support any products mentioned in this presentation. It is beyond the scope of the mission of the Property Tax Division to support tools within each county.

• Your staff is responsible for assuring that your tools match you business requirements.

Page 4: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

4

XML Validation Concepts

<XML File/> Validation errors

Validates<XML Schema/>

XML Validator

If you have: 1) A valid XML file. And 2) a well defined XML Schema, you can 3) check the XML file to see if it is XML and has all the

required tags defined by the schema with any standard XML validation program.

This is called validation.

Page 5: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

5

XML Validation Concepts• XML is a text file where well defined tags surround

each data value.

• An XML Schema describes what tags are needed and where they need to be for a particular file.

Tag example: <Zip_Code>55101</Zip_Code>

<xs:element name="Zip_Code"><xs:simpleType>

<xs:restriction base="xs:string"><xs:pattern value=“[0-9]{5}"/>

</xs:restriction></xs:simpleType>

</xs:element>

This fragment from an XML Schema defines a tag for Zip_Code

Page 6: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

6

XML Validation Errors

If you have:

1) An invalid XML file: You get an invalid XML, malformed XML or content error. Examples are missing tag brackets or other syntaxerrors.

2) A valid XML file with tag errors: You get a reasonable list of XML tag errors found that are inconsistent with the specific XML Schema being validated against.

<XML File/> Validation errors

Validates<XML Schema/>

XML Validator

Tag example: <Zip_Code>55101</Zip_Code>

Page 7: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

7

double quote

single quote or apostrophe

ampersand

greater than

less than

Name

&quot;"

&apos;'

&amp;&

&gt;>

&lt;<EscapeCharacter

There are five characters are used in XML syntax that cannot be used directly in a data value. They must be “escaped” by representing the character using the ampersand representation

XML Validation Errors for XML Escape Characters

Page 8: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

8

10 Common XML Transmission Errors1. Mal-formed XML2. Missing namespace declarations3. Invalid document structure4. Missing required element5. Missing data in element6. Invalid document type code values7. Invalid property type code value8. Invalid character values9. Incorrect number of repeating fields10. Incorrect tax year

For more information about XML Errors, please also refer to the document: XML and XML Errors

Page 9: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

9

XML & Validation Resources• W3C XML Standards Page – http://www.w3.org/XML/

• OASIS XML Cover Pages –http://xml.coverpages.org/xml.html#xmlValResources (lots of references)

• XML.com – http://www.xml.com (up-to-date XML information)

• XML.com Schema Tools –http://www.xml.com/pub/a/2000/12/13/schematools.html (older list of schema tools)

• XMLSpy – http://www.Altova.com (free 30 day eval xml tools and validation)

• XMLStar – http://xmlstar.sourceforge.net (free tools and validation)

Page 10: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

10

Example:

Validating a Homestead File with XMLSpy

Page 11: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

11

Validating with XMLSpy Steps1. Download XML Spy (30 day free eval)

and homestead zip file2. Create a new XML Spy Project3. Associate the homestead XML Schema

with a folder4. Open the file in XMLSpy5. Add the active file to the folder6. Click the "Validate" button

Page 12: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

12

Download XML Spy• http://www.altova.com/products/xmlspy/xml_editor.html

Altova will e-mail you a 30 day license key

Page 13: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

13

Download Homestead Files

hlaroche
Typewritten Text
hlaroche
Typewritten Text
http://taxes.state.mn.us/taxes/property_tax_administrators/other_supporting_content/homestead-ptr-2009a.zip
hlaroche
Typewritten Text
hlaroche
Typewritten Text
Page 14: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

14

Start XML Spy

• Double click the XML Spy icon

• Create a New Project

Page 15: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

15

New Project Window

• Note: if the window is not visible use the Window/Project menu to show the project window

Page 16: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

16

Set the Properties of the XML Folder• Right click over the XML

files folder in the project view

• NOTE: RIGHT CLICK not left click

Page 17: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

17

Folder Properties

• Click the "Validate with:" check box

Page 18: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

18

Browse… to homestead schema

• Click OK and then double click on yourxml data file to be validated

Page 19: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

19

Add this file to your project• RIGHT click and select the

"Add Active File"

Page 20: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

20

• Click the green check

Page 21: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

21

View Results in Validation View

• If your file is valid a green check will appear in the validation view

• Error message will appear in this same window

Page 22: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

22

File Size Limitations• XMLSpy tends to have problems validating

files over about 25MB on a system with 1GB of RAM

• Use Apache Ant and/or Eclipse if you want to validate larger files

Page 23: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

23

Example:

Manipulating LargeXML Data Sets with Ant & Eclipse

Tips for XML Files Above 25MB

Page 24: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

24

Agenda• Tools for Records and Metadata vs. Tools

for Data• Apache Ant

– DOS command line• Eclipse

– GUI interface• V – The File Viewer

– Viewing large files• XML databases

Page 25: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

25

Records vs. Databases• XML File Viewers (like XML Spy) are ideal

for viewing single records and metadata (XML Schemas)

• Visual editing tools tend stop working when file sizes exceed about 25MB (given 2GB of RAM) (e.g. We don't use MS-Word to edit 100,000 records in a database)

• Other tools are more appropriate for debugging large data sets

Page 26: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

26

In Memory vs. Streaming • There are several different approaches to

checking large files– Load the entire file into memory (DOM)– Stream the file through memory (SAX)– Page only relevant sections into memory

(Chunking – used in V-The-File-Viewer)

Page 27: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

27

Apache Ant• Open source build manager• User give ant a high-level description of a task• Ant executes task using dependency analysis

(only validate after extract)• Called from shell (DOS or UNIX)• Called from Integrated Development

Environment (IDE)

See Wikipedia "Apache Ant"

http://www.uniontransit.com/apache/ant/binaries/apache-ant-1.7.0-bin.zipDownload Link

Page 28: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

28

Page 29: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

29

Download .zip file

Page 30: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

30

Adding tools.jar• Apache ant needs one missing jar file call

"tools.jar" that is free with Sun's Software Development Tools

• It is freely available from the Java download as part of the JavaSDK 1.4+ (but not the JDK)

• Temporary file is on the Java Open Source User Group JOSUG web site (www.josug.org/tools.jar)

• File is about 6MB!• This must be in your build "Classpath"

Page 31: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

31

Apache Ant 1.7• Many new features• Simple <schemavalidate> target• Faster execution

<schemavalidatenoNamespaceFile="homestead-data_v0.28.xsd"file="my-homestead-data.xml">

</schemavalidate>

path to your xml schema

path to your xml data

Page 32: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

32

<?xml version="1.0" encoding="UTF-8"?><project default="validate-homestead">

<property name="SrcDir" value="C:/homestead/stress-test"/><property name="SchemaDir" value="C:/homestead/schemas"/>

<target name="validate-homestead"><schemavalidate

noNamespaceFile="${SchemaDir}/homestead-data_v0.28.xsd"file="${SrcDir}/100MB-test.xml">

</schemavalidate></target>

</project>

Ant From DOS Command Line

1. Download Apache Ant version 1.7.02. Copy the build.xml into a directly3. Change file locations in properties of the build file to match your local files4. Run ant.bat (using the full path name) in folder that build file is located in

Change theseto match yourlocal system

build.xml

Page 33: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

33

Apache Ant Tasks• schemavalidate

– New Ant 1.7 optional task just for XML Schema• xmlvalidate

– very general Ant 1.6 task for validation of XML files– check for well-formed files– check for validation against an XML Schema

• xslt – transforms XML files• replace

– replace specific text in large files

Page 34: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

34

schemavalidate options

http://ant.apache.org/manual/OptionalTasks/xmlvalidate.htmlhttp://ant.apache.org/manual/OptionalTasks/schemavalidate.html

Page 35: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

35

Example <schemavalidate> task

• 100MB file validates in 10 seconds

Page 36: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

36

Sample Ant 1.6 Validate Script

• This will validate only the 100MB-test.xml file• Replace this with *.xml and all XML files in the source directory will be validated

Page 37: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

37

Eclipse

• OpenSource Integrated development environment originally sponsored by IBM

• "GUI" front end to Apache Ant

See http://www.eclipse.org/

Page 38: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

38

Sample Ant Classpath

Page 39: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

39

Complete Ant 1.7 Build File<?xml version="1.0" encoding="UTF-8"?><project default="validate-homestead">

<property name="DataDir" value="C:/homestead/data-files"/><property name="SchemaDir" value="C:/homestead/schemas"/>

<target name="validate-homestead"><schemavalidate

noNamespaceFile="${SchemaDir}/homestead-data_v0.28.xsd"file="${DataDir}/my-data-file.xml">

</schemavalidate></target>

</project>

Properties can be set once in the file and reference many times.This makes your build files easier to maintain.

Page 40: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

40

GUI "Point and Click" UI

• Sample "point and click" GUI interface• Alt+Shift+X, Q to run a task

Page 41: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

41

XML Transform• View a homestead record of a specific

parcel ID

Big File(Gigabytes)

XMLTransform

With MatchingRules

VerySmallFile

match

no match

Page 42: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

42

Sample XML Transform<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet version="2.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"xmlns:mn="http://data.state.mn.us"xmlns:c="http://niem.gov/niem/common/1.0"xmlns:u="http://niem.gov/niem/universal/1.0"xmlns:mnr="http://revenue.state.mn.us"xmlns:mnr-ptx="http://propertytax.state.mn.us"><xsl:output indent="yes" exclude-result-prefixes="mn mnr c u mnr-ptx"/>

<!-- only display the homestead record for this parcel ID --><xsl:template

match="/HomesteadRecordsDocument/CountyHomesteadRecord/HomesteadParcels/HomesteadParcel/CountyPropertyTaxStatement[mn:ParcelID='1234567']">

<!-- copy the CountyHomesteadRecord that matched this parcel ID to the output --><xsl:copy-of select="../../.."/>

</xsl:template>

<!-- do not output anything else --><xsl:template match="@*|node()">

<xsl:apply-templates select="@*|node()"/></xsl:template>

</xsl:stylesheet>

Page 43: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

43

V-The File Viewer• $20 application (less

in quantity)• Easily allows viewing

of files greater than 1GB (uses file "chunking" technology)

• Note: read-only toolSee http://www.fileviewer.com/

Opens multi-gigabyte files in a few seconds

Page 44: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

44

Use Goto Function

• Goto is (Ctrl-G)

or

Page 45: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

45

XML Databases• XML databases store XML in its native

format• You can associate a column in your

databases or a "collection" with the homestead XML Schema

• This allows you to have the database itself validate data before transmission to the state

Page 46: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

46

Example of XML Databases• IBM DB2 version 9 "PureXML"

– free and low-cost "express" versions for development and testing

• eXist (open source)– native XML database with XML Schema validation

• Over 50 other free and low-cost solutions with 30, 60 or 90 day evaluation periods

http://www.rpbourret.com/xml/XMLDatabaseProds.htm

Page 47: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

47

DB2• IBM DB2 version 9 supports fast searches

on complex XML data sets• Load records into XML datatype• Records are quickly validated using an

XML Schema• Searching is very fast

Page 48: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

48

eXist• Open source• Built in web-administration• Easy to setup and configure• Allows data to be validated on insert• Fast searches• Every XQuery IS a REST web service

Page 49: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

49

Microsoft SQL Server 2005• Supports native XML datatype• Supports fast indexing• Add SOAP services to XML documents• Support for XQuery and XQuery updates

Page 50: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

50

Ant Book

• Covers Ant 1.7

Page 51: Validating XML Data with an XML Schema · 1 Validating XML Data with an XML Schema Date: May 2007 Version: DRAFT 0.2

51

Questions?