Dr. Sabin Buraga http://www.purl.org/net/busaco Semantic Web <?xml version=“1.0” ?> <curs desc=“…” /> Web semantic Dr. SabinCorneliu Buraga Facultatea de Informatica Universitatea “A.I.Cuza” – Iasi, Romania http://www.infoiasi.ro/~busaco/
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
Web semantic
Dr. SabinCorneliu BuragaFacultatea de Informatica
Universitatea “A.I.Cuza” – Iasi, Romania
http://www.infoiasi.ro/~busaco/
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
Marcarea datelor pentru <Web />
Baze de date native XMLLimbajul de interogare XQuery
Detalii in [TX, 153‐177]
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
cuprins
Interogarea datelor XMLIntroducere
Stocarea datelor XMLDefinitii
Limbajul XQuerySisteme de manipulare a informatiilor XML
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
intro
In sens strict, un document XMLeste o baza (colectie) de date
viziune centrata spre documente (document centric)
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
intro
Pentru managementul datelor XML,trebuie sa existe suport pentru:
stocare (documente/arbori XML)scheme: DTD, scheme XML, RELAX NG etc.limbaje de interogare: XPath, XSLT, XQuery,...
procesare – via API‐uri (SAX, DOM,...)
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
intro
Mai trebuie asigurate: stocarea eficientamecanismele de indexaresecuritatea datelortranzactiileintegritatea dateloraccesul concurentmultiusertrigger‐eleinterogarile rapide in documente multiple...
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
stocarea datelor XML
Maniere:
documente centrate pe date(datacentric documents)
viziunea centrata pe documente(documentcentric documents)
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
stocarea datelor XML
Documente centrate pe date: XML folosit pentru transportul datelorDocumente XML proiectate pentru a fi eficientprocesate de calculator: structura regulata, reprezentari fine ale datelor
Exemple: orare de curse aeriene, date stiintifice, ordine de plata
Caracter structurat al datelor stocare in baze de date relationale + interschimb de date
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
stocarea datelor XML
Viziunea centrata pe documente: XML utilizat pentru date destinate oamenilorExemple: carti electronice, mesaje email, documente XHTML
Caracter semistructurat al datelor(structura interna nu este atit de regulata)
Ordinea aparitiei unor elemente situate pe acelasi nivel (siblings) este, uzual, importanta
Documentele sunt concepute manual sau provin din alte formate, fara a fi stocate in baze de date clasice
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
stocarea datelor XML
Practic, cele doua categorii nu pot fi distinseintotdeauna clar
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
stocarea datelor XML
Datele se stocheaza in baze de date relationale, orientate‐obiect ori ierarhice
Datele pot fi accesate in format XML via sisteme XMLenabledSchema DB ↔ schema XMLAdoptarea unui model obiect‐relationalXML binding
Documentele (nu datele) sint stocatein baze de date native XML ori in sistemede management al continutului (CMS‐uri)
Modelul RHOX(Relational,
Hypertext, Object,XML)
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
“definitii”
Baze de date native XML ≡ baze de date proiectatespecial pentru a stoca documente XML
unitatea fundamentala de stocare (logica)este documentul XML (si nu inregistrarea – record)
nu necesita folosirea unui sistem de stocare fizica
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
“definitii”
Baze de date native XML
arhitectura interna poate fi bazate petext (fisier, cimp BLOB/CLOB) sau model (arbore DOM)
utilizate pentru stocarea informatiilor centratepe documente si pentru managementul datelor
semi‐structurate
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
aspecte importante
NormalizareaDependente functionaleRedundanta & anomaliilecauzate de actualizarile datelor XML
Integritatea referentialaID, IDREF, key & keyref (XML Schema), XLink
ScalabilitateaOperatiile sa aiba loc in termen rezonabilsi pentru documente XML de mari dimensiuni
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
facilitati
Utilizarea colectiilor de documentecolectie ≡ tabela (la modelul relational)ori director (la sistemul de fisiere)
Recurgerea la limbaje de interogare
Suport pentru tranzactii, locking, concurenta
Folosirea de API‐uri
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
facilitati
Optimizarea accesului (roundtripping)
Accesul la date la distanta (remote data)
Suportul pentru indexare(value/structural/fulltext indexes)
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
RDBMS vs. XMLDBS
Baze de date relationale Baze de date native XML
O baza de date relationalacontine tabele
O baza de date XML contine colectii
O baza de date relationalacontine inregistrari avindaceeasi schema
O colectie contine documenteXML avind scheme identice oridiferite
O inregistrare reprezinta o listaneordonata de valori iden‐tificate prin nume si avind tipuriapriori stabilite
Un document XML reprezintaun arbore de noduri, ce poateinclude date semi‐structurate
O interogare intoarce un setneordonat de inregistrari
O interogare intoarce o sec‐venta ordonata de noduri
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare
Scop #1Dezvoltarea unor limbaje de interogare
a continutului XML
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare
Necesitati:Obtinerea rezultatelor in format XMLSuportul pentru procesare pe partea de serverRealizarea de operatii complexe(selectii, extractii, reductii, restructurari,...)
Suportul pentru alte standarde XML (XPath, spatii de nume, XML Schema etc.)
Suportul pentru noi tipuri de date
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare
Precursori: WebSQL, XML‐QL, XML‐GL, XQL, UnQL, XMAS,...
Standarde: XPath 1.0/2.0:
www.w3.org/TR/xpath-datamodel/ XSLT 1.0/2.0: www.w3.org/TR/xslt20/XQuery 1.0: www.w3.org/TR/xqueryXQueryX: www.w3.org/TR/xqueryx
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery
Model:document XML ≡ arbore compus din noduri
(conform XPath 2.0 + XSLT 2.0)
Documentul XML interogat poate avea asociatao schema (optionala)
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery
O valoare (value) ≡ secventa de nodurisau de valori atomice
O secventa de noduri este ordonata, de obiceiin ordinea aparitiei acestora in document
Valorile atomice sunt cele tipice (vezi XML Schema)
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery
Spatii de nume ce pot fi folosite:xs (schema XML) http://www.w3.org/2001/XMLSchemaxsi (instanta de schema XML)
http://www.w3.org/2001/XMLSchema-instancexdt (tipuri de date XPath 2.0)
http://www.w3.org/2003/xpath-datatypeslocal (functii XQuery locale)
http://www.w3.org/2003/11/xquery-local-functionsxml (XML)
http://www.w3.org/XML/1998/namespace
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery
Pentru selectarea nodurilor, se foloseste XPathOrice expresie XPath reprezinta un program XQueryRezultatul unui program XQuery este o padure(secventa de arbori XML)
Ca la XSLT, interogarile pot fi incluse in sabloanecare sa genereze rezultatul dorit
<proiecte>{ /projects/project/* }</proiecte>
Program XQueryce contine o expresie
XQuery
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery
O expresie de forma $nume este o referintala variabila nume
Variabilele pot fi folosite(si) in cadrul expresiilor XPath
Variabilele pot contine valori sau rezultateale expresiilor FLOWR
(ForLetOrderWhereResult)
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery – expresii
for serveste la atasarea de valori unor variabile, in maniera iterativa (se permit si join‐uri)
let asigneaza valoarea unei expresii unei variabilewhere permite formularea de conditii (filtrari de date) asociate unei constructii for
order by specifica ordinea de selectare a valorilorreturn intoarce rezultatul unei expresii XQuery, putind include sabloane, expresii XPath, sub‐expresii FLOWR imbricate
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery – expresii
Exemplu:
for $stud in doc ("students.xml")//student where $stud/year = 2 return $stud/name
echivalent cu fraza SQL
select stud.name from students stud where stud.year = 2
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery – expresii
Se poate recurge la facilitatile XPath 2.0:cuantificatori (all si some),
expresii conditionale (if...then...else), operatori (cei de la XPath 2.0 + suplimentari), functii (predefinite ori scrise de utilizator)
Nu sunt acceptate toate axele XPath
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery – expresii
Pot fi declarate tipuri de date(cele oferite de XML Schema)pentru valorile intoarse
for $contor as xs:integer in 1 to 5return $contor * 3
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery – functii uzuale
Erori & depanare: error(), trace()Accesori de date: node-name(), string(), data()Numerice: abs(), ceiling(), round(), count(), avg(),
max(), min(), sum(),...Siruri de caractere: compare(), concat(), string-join(),
substring(), string-length(), upper-case(), translate(),escape-uri(), contains(), starts-with(), ends-with(),substring-before(), substring-after(),...
Expresii regulate: matches(), replace(), tokenize()
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery – functii uzuale
Data & timp: duration-equal(), time-less-than()...Noduri: name(), local-name(), namespace-uri(), root()Secvente: zero-or-one(), one-or-more(), exactly-one(), index-of(),
empty(), exists(), distinct-values(), remove(), insert-before(), reverse(), unordered(), deep-equal(),...
Context: position(), last(), current-dateTime(),...Generare de secvente: doc(), collection()
Detalii la http://www.w3.org/TR/xpath-functions/ Pot fi folosite si functii externe
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery – exemple
for $proj in doc ("http://www.infoiasi.ro/projects.xml")/projects/*
where some $projid in $proj/@idsatisfies ($projid = $stud/project)
return<project class="{ $proj/@class }">
{ $proj/title }{ $proj/desc }
</project>
Generareaaltei structuri
XML
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery – exemple
(: Listeaza ca document XHTML toti studentiiale caror nume contin 'Ping' si se termina cu 'uin‘ :)
<div class="stud">{for $stud in doc ("students.xml")//studentlet $e := $stud/name[ contains (string (.), "Ping")
and ends-with (string (.), "uin") ]where exists ($e)return<p><span class="name"> { $stud/name/text() } </span>
cu proiectul <span class="title"> { $stud/project/text() }</span></p>
}</div>
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery – exemple
(: Fibonacci recursiv – dupa Chris Wallace, 2008 :)declare function s:fib-rec ($n as xs:integer)
as xs:integer? { if ($n < 0) then () else if ($n = 0) then 0else if ($n = 1) then 1else s:fib-rec ($n - 1) + s:fib-rec ($n - 2) };
A se consulta si http://en.wikibooks.org/wiki/XQuery/
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery – exemple
Modularizare via fisiere‐modul (.xqm) si functii‐utilizator
module namespace utils="http://urn:infoiasi.ro:xdb-utils";(: Declara o secventa de nume de luni :)declare variable $utils:months { ("Ian", "Feb", "Mar", "Apr", "Mai", "Iun",
"Iul", "Aug", "Sep", "Oct", "Nov", "Dec")};(: Transforma data in forma dd Mmm yyyy :)declare function utils:format-date-RO ($date as xs:dateTime)
as xs:string {string-join ((
day-from-date ($date),item-at ($utils:months, month-from-date ($date)),
year-from-date ($date)), " ")};
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery – exemple
Utilizarea functiei definite anterior:
import module namespaceutils="http://urn:infoiasi.ro:xdb-utils";
declare variable $data external; <data>{ utils:format-date-RO ($data) } </data>
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: xquery – exemple
(: Autentificare prin nume de cont & parola preluatedin sesiunea HTTP curenta; intoarce o pereche(user, password) sau o secventa vida in caz de esec :)
declare function main:checkUser() as xs:string* {let $user := request:get-session-attribute ("user"),
$pass := request:get-session-attribute ("password"),$login := xdb:authenticate ("xmldb:exist:///db",
$user, $pass)return
if ($login) then($user, $pass)
else()
};
Se folosesc functiiexterne – aici, oferitede serverul eXist
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: alte limbaje
Limbajul XQuery in varianta actuala nu oferasuport pentru actualizarea datelor XML(operatii CRUD – Create, Read, Update, Delete)
XUpdate – in curs de standardizare,dar suportat deja de o serie de aplicatiiwww.w3.org/TR/xquery‐update‐10/
Viitoarele versiuni ale XQuery vor includesi posibilitati de actualizare a datelor
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: alte limbaje
Pentru tipuri particulare de documente,pot fi folosite limbaje de interogare speciale
exemplu: SPARQL pentru RDF
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
interogare: implementari
Instrumente, aplicatii & implementari (exemple):Editare & depanare: <oXygen />, Stylus StudioSuport pentru baze de date native XML: Berkeley DB XML, eXist, Mark Logic’s CIS, Sedna
Sisteme relationale XMLenabled: Oracle, MS SQL Server
API‐uri – e.g., Saxon (Java, .NET),XML::XQuery (modul Perl), XQuery API for Java (XQJ) – JSR 225
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
manipulare
Scop #2:Implementarea unor sisteme de management
al informatiilor XML
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
manipulare
Middleware:DB2XML, DBIx::XML, XDBC (XML DataBase Connectivity)
Sisteme de baze de date native XML:Berkeley DB XML, DBDOM, eXist, Tamino, XDB, Xindice,...
Servere (suportind) XML:AxKit, Enhydra, WebObjects
Servere XMLenabled: DB2 Information Integrator, MS SQL, Oracle,…
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
manipulare
Motoare de interogare XML
Conectoare (XML data binding)
Sisteme de management al continutului:Docato, Dynabase, Frontier, iENGINE, Mark Logic’s CIS,
Prowler, Syncato, UltraXML,...
API‐uri: Persistent DOM (PDOM), XML:DB, XQJ
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
manipulare: utilizari
Integrarea datelor semi‐structurate:date privitoare la afaceri
“Get a coherent view of the mess in the back office”(Michael Champion)
analiza cererilor de plata
productia de stiri (online)
date financiare
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
manipulare: utilizari
Integrarea datelor semi‐structurate:informatii privitoare la transport (aerian)
industrii: medicala, de divertisment,...
suportul acordat clientilor(in contextul CRM – Client Relationship Management)
informatii din domeniul legal
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
manipulare: utilizari
Necesitatea suportului pentru tranzactiistocare
cozi de mesajearhive
meta‐datedepozite
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
manipulare: exemple
Elsevier Sciencepeste 2 TB de informatii
>5 mil. articole, >60 mil. referinte + rezumate,>1000 carti,...Mark Logic’s CIS
Oxford University Pressplatforma online de publicare,
continutul stocat in format XML, facilitati de cautareMark Logic’s CIS
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
manipulare: exemple
Autodeskmanuale (HTML, PDF, CHM,...) in >30 de limbi
X‐Hive/DB
Las Vegas Sun>10 GB (peste 750000 doc. XML, imagini si doc. PDF)
FDX XML Server
US Navypeste 100 mii volume tehnice
Tamino
Dr. Sabin Buraga http://www.purl.org/net/busaco
Semantic Web <?xml version=“1.0” ?><curs desc=“…” />
rezumat
Interogarea datelor XMLIntroducere
Stocarea datelor XMLDefinitii
Limbajul XQuerySisteme de manipulare a informatiilor XML