Top Banner
XML Tools Presentation by Jeiel & Joost 21-3-2007
32

XML Tools Presentation by Jeiel & Joost 21-3-2007.

Dec 29, 2015

Download

Documents

Shawn McGee
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: XML Tools Presentation by Jeiel & Joost 21-3-2007.

XML Tools

Presentation by

Jeiel & Joost

21-3-2007

Page 2: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Intro

• Introduction

• HaXml

• HXT

• Summary

Page 3: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Why Functional?

• Transforming a XML document:Applying a transform function to a XML argument and getting a XML result.

• XSLT is based on functional languages.

Page 4: XML Tools Presentation by Jeiel & Joost 21-3-2007.

HaXml features

• Combinators

• DTD2Haskell

• Haskell2XML/XML2Haskell(made easy with DrIFT)

• Xtract

• Parsing, validating, special tools for HTML.

Page 5: XML Tools Presentation by Jeiel & Joost 21-3-2007.

HaXml is not widely used

• Installation was a pain

• Little documentation and support available

• Google searches also not helpful

• Not very userfriendly, e.g. no easy way to validate XML document.

Page 6: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Example: Validate a document

x = do f <- readFile "zetels.xml"

g <- readFile "properties.dtd"

z <- return.getElement $ (xmlParse "zetels" f)

d <- return.fromJust $ (dtdParse "properties" g)

return (validate d z)

getElement :: Document -> Element

getElement (Document _ _ e _) = e

Had to be saved to disk

Result: ["Document type should be <extsubset> but appears to be <properties>."]

Page 7: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Example: Combinators

coalitie :: CFiltercoalitie = foldXml (cat [tag "properties", txt, attrval ("key",AttValue [Left "CDA"]), attrval ("key",AttValue [Left "PvdA"]), attrval ("key",AttValue [Left

"ChristenUnie"]) ])

main = processXmlWith coalitie

Result: ["Document type should be <extsubset> but appears to be <properties>."]

Page 8: XML Tools Presentation by Jeiel & Joost 21-3-2007.

DTD2Haskell

example = Properties (Properties_Attrs {propertiesVersion = Default ""}) (Just (Comment "Tweede Kamer Zetel Verdeling 2003")) [Entry (Entry_Attrs {entryKey = "CDA"}) "44", Entry (Entry_Attrs {entryKey = "PvdA"}) "42", Entry (Entry_Attrs {entryKey = "VVD"}) "28"]

Based on Java Properties DTD: http://java.sun.com/dtd/properties.dtd

Page 9: XML Tools Presentation by Jeiel & Joost 21-3-2007.

showXML example

<?xml version='1.0' ?>

<properties>

<comment>Tweede Kamer Zetel Verdeling 2003

</comment>

<entry key="CDA">44</entry>

<entry key="PvdA">42</entry>

<entry key="VVD">28</entry>

</properties>

Page 10: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Haskell2Xml

• Make your datatype an instance of Haskell2Xml

• Use the DrIFT tool to derive this class automatically.

class Haskell2Xml a wheretoHType :: a -> HTypetoContents :: a -> [Content]fromContents :: [Content] -> (a, [Content])

Page 11: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Haskell2Xml

• For free:

toXml :: Haskell2Xml a => a -> Document fromXml :: Haskell2Xml a => Document -> a readXml :: Haskell2Xml a => String -> Maybe a showXml :: Haskell2Xml a => a -> String fReadXml :: Haskell2Xml a => FilePath -> IO a fWriteXml :: Haskell2Xml a => FilePath -> a -> IO ()

hGetXml :: Haskell2Xml a => Handle -> IO a hPutXml :: Haskell2Xml a => Handle -> a -> IO ()

Page 12: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Haskell2Xml example

• showXml (Just True)

<!DOCTYPE maybe-bool [

<!ATTLIST bool value CDATA #REQUIRED>

<!ELEMENT bool EMPTY>

<!ELEMENT maybe-bool bool?>

]>

<maybe-bool><bool value="True"/></maybe-bool>

Page 13: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Xml2Haskell

• Same as Haskell2Xml

• Make datatype instance of XMLContent instead of Haskell2Xml

• use DTD2Haskell to generate the instances of XMLContent

• Xml2Haskell and Haskell2Xml both have showXml, readXml, etc. be alert

Page 14: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Haskell XML Toolkit (HXT)

Features:

• Support for different character sets (Unicode and UTF-8, US-ASCII and ISO-Latin-1)• Wellformed document parsing, validation, construction• Namespace support: namespace propagation and checking • XPath support for selection of document parts • Liberal HTML parser for interpreting any text containing < ... > as HTML/XML • Schema validator • Integrated XSLT transformer

Page 15: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Datatypestype NTree a = NTree a [NTree a] -- already defineddata XNode = XText String

| ...| XTag QName XmlTrees | XAttr QName | ...  

data QName = QN { namePrefix :: String localPart :: String namespaceUri :: String}type XmlTree = NTree XNode   type XmlTrees = [XmlTree]

Page 16: XML Tools Presentation by Jeiel & Joost 21-3-2007.

More types - filters

type XmlFilter = XmlTree -> [XmlTree]

or more general:type Filter a b = a -> [b]

but for now: XmlFilter

Page 17: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Simple predicate functions

isA :: (a -> Bool) -> a -> [a]

isA p x

| p x = [x]

| otherwise = []

isText :: XmlFilter

isText t@(NTree (XText _) _) = [t]

isText _ = []

Page 18: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Another example

getChildren :: XmlFilter

getChildren (NTree n cs) = cs

getGrandChildren :: XmlFilter

getGrandChildren (NTree n cs) = concat [ getChildren c | c <- cs ]

Seems a bit overdone...

Page 19: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Combining filters

(>>>) :: XmlFilter -> XmlFilter -> XmlFilter

(f >>> g) t = concat [g t' | t' <- f t]

So the definition of grandchildren becomes:getGrandChildren :: XmlFilter

getGrandChildren = getChildren >>> getChildren

Or selecting only text children:getTextChildren :: XmlFilter

getTextChildren = getChildren >>> isText

Page 20: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Logical and/or

• The >>> is a logical and, using predicates as XmlFilters

• Where is the logical or?

Answer:(<+>) :: XmlFilter -> XmlFilter -> XmlFilter

(f <+> g) t = f t ++ g t

Page 21: XML Tools Presentation by Jeiel & Joost 21-3-2007.

More combinatorsorElse :: XmlFilter -> XmlFilter -> XmlFilterorElse f g t | null (f t) = g t | otherwise = f t

guards :: XmlFilter -> XmlFilter -> XmlFilter guards g f t | null (g t) = [] | otherwise = f t  

when :: XmlFilter -> XmlFilter -> XmlFilter when f g t | null (g t) = [t] | otherwise = f t

Page 22: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Tree traversal

deep :: XmlFilter -> XmlFilter deep f = f `orElse` (getChildren >>> deep f)

multi :: XmlFilter -> XmlFiltermulti f = f <+> (getChildren >>> multi f)

Page 23: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Partial summary

• Combinators are powerful and elegant

• More elegant than pure functions

• But filters alone are not general enough

• What about side effects?

• A big difference between HaXml and HXT is... can you guess?

Page 24: XML Tools Presentation by Jeiel & Joost 21-3-2007.

HXT uses Arrows!!!

• >>> and arr are in Arrow• <+> is in ArrowPlus• Functions like isA and isText lift pure functions in

the arrow• List filters are in ArrowList• Choice filters are in ArrowIf• Tree filters are in ArrowTree• Xml specific filters are in

Text.XML.HXT.XmlArrow

Page 25: XML Tools Presentation by Jeiel & Joost 21-3-2007.

4 typesIn HXT there are 4 types for using pure list arrows:newtype LA a b = LA { runLA :: (a -> [b]) }newtype SLA s a b = SLA { runSLA :: (s -> a -> (s, [b])) }newtype IOLA a b = IOLA { runIOLA :: (a -> IO [b]) }newtype IOSLA s a b = IOSLA { runIOSLA :: (s -> a -> IO (s, [b])) } Which are instances of ArrowXml:class (Arrow a, ArrowList a, ArrowTree a) =>

ArrowXml a where ...

Page 26: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Examples

Selecting all text nodes from an XML document:

selectAllText :: ArrowXml a => a XmlTree XmlTree

selectAllText = deep isText

Selecting all tags in an XML document with a certain name:

selectAllTags :: ArrowXml a => String -> a XmlTree XmlTree

selectAllTags s = deep (isElem >>> hasName s)

Page 27: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Textbased browser?selectAllTextAndRealAltValues :: ArrowXml a => a XmlTree XmlTreeselectAllTextAndRealAltValues = deep ( isText <+> ( isElem >>> hasName "img" >>> getAttrValue "alt" >>> isA significant >>> arr addBrackets >>> mkText ) ) where significant :: String -> Bool significant = not . all (`elem` " \n\r\t") addBrackets :: String -> String addBrackets s = " [[ " ++ s ++ " ]] "

Page 28: XML Tools Presentation by Jeiel & Joost 21-3-2007.

All images in an HTML document

imageTable :: ArrowXml a => a XmlTree XmlTreeimageTable = selem "html" [ selem "head" [ selem "title" [ txt "Images in Page" ]] , selem "body" [ selem "h1" [ txt "Images in Page" ] , selem "table" [ collectImages >>> genTableRows ] ] ] where collectImages = deep ( isElem >>> hasName "img" ) genTableRows = selem "tr"[ selem "td" [ getAttrValue "src" >>> mkText ] ]

Page 29: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Class ArrowXml a

• Contains a lot of XML specific functions• Some of these have implementations

based on others for convenience:– mkelem– aelem– selem– eelem

• Extension class is ArrowDTD a, which contains DTD specific functions.

Page 30: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Tools over XML

• As mentioned before, HXT contains an XPath module– We’ll have a quick look at it in a minute.

• As a result of a master’s thesis, a basic XSLT processor has been implemented on top of HXT– Pure Haskell– 1800 lines of code (compare to XALAN’s 347000!!)– The processor is not yet optimized– Kind of hard to give a quick view.

Page 31: XML Tools Presentation by Jeiel & Joost 21-3-2007.

XPath• XPath in C#XmlNodeList XmlElement.SelectNodes(string xpath);

• XPath is just another plain old XmlFilter:getXPath :: String -> XmlFiltergetXPathWithNsEnv :: NsEnv -> String -> XmlFiltergetXPathSubTrees :: String -> XmlFiltergetXPathSubTreesWithNsEnv :: NsEnv -> String ->

XmlFiltergetXPathNodeSet :: String -> XmlTree -> XmlNodeSetgetXPathNodeSetWithNsEnv :: NsEnv -> String ->

XmlTree -> XmlNodeSetevalExpr :: Env -> Context -> Expr -> XPathFilter

Page 32: XML Tools Presentation by Jeiel & Joost 21-3-2007.

Summary