This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Slide 1
Directions Web X.0, NoSQL DBs and the Semantic Web
Slide 2
Quick review: Web development frameworks Web 2.0/3.0 is about
making websites faster, smarter, more media rich and more intuitive
There is a generation of web development frameworks that focus on
faster and smarter, they attack the back end, not the front end
Ruby on Rails, Grails, Django, Symfony, and others They tend to use
some kind of relational to object mapping/wrapping They support and
in fact enforce the MVC approach to developing websites Model (the
database with mapping/wrapping), View (web pages), Controller
(pieces of code that map view manipulation into model manipulations
and vice-versa). They are AJAX friendly
Slide 3
MVC
Slide 4
Client-side web development There is another generation of web
frameworks that focus on making it easy to create rich web
interfaces Flash Builder (gone open source from Adobe) Silverlight
(from Microsoft and perhaps dead?) They support 2D and some 3D
graphics They use upfront loading to minimize interaction with
server There is a newer effort involving HTML5 Graphics is
supported, with 2D and some 3D Local storage with simple insert,
delete, can use SQLite Better multimedia support More powerful
Javascript libraries are coming out, e.g. JQuery, as well
Slide 5
Important to note Web X.0 efforts try to make use of graphics
in interfaces, as well as provide better displaying of media But
supporting blob and continuous data access is still very
rudimentary (images, video, audio, etc.) Problem: we cannot screen
media in real time Problem: it is very difficult to capture the
semantics of media The solution: We tend to build accompanying meta
databases with tag sets (one per piece of media) assigned by
experts using specialized namespaces. To enhance accuracy, there is
sometimes a feedback loop where users can train the search
facility
Slide 6
Quick review: the Semantic Web This is oriented around making
the web more automatically searchable Main foci: Assertions and
inferences Exposing databases that contain hidden data Searching of
media bases (blog and continuous), i.e., exposing them Searching
document bases, i.e., exposing them Data mining
Slide 7
Querying the Semantic Web RDF - triples We can use URIs for all
three pieces of a triple SPARQL - triples query language, used for
spanning Web boundaries Example: THE BALL is ORANGE. ORANGE is an
UGLY COLOR. The inference we can make is THE BALL has an UGLY
COLOR
Slide 8
An RDF Example
xmls:rdf=http://www.w3.org/1999/02/22-rdf-syntax- ns#>
xmls:zx=http://www.someurl.org/zx/> funstuff
http://www.yetanotherurl.org/professor
Slide 9
The assertions and an inference www.awebsite.org/index.html
funstuff The topic of the resource at www.awebsite.org/index.html
is funstuff www.awebsite.org/index.html
http://www.anotherurl.org/buzz http://www.anotherurl.org/buzz
www.awebsite.org/index.html was created by someone who is
identified by the url http://www.anotherurl.org/buzz. We see that
the value in the first triple, which concerns the topic of our
resource, consists of a character string, but the value in the
second triple, which concerns the created-by of our resource, is
actually a URL.
Slide 10
SPARQL SPARQL stands for Protocol And RDF Query Language, with
an S tossed into the beginning so we can say it as sparkle. It is a
language that can be used to traverse graphs that consist of RDF
triples that are chained together into an object network. prefix
website1: SELECT ?x WHERE { website1:was-created-by ?x } This code
will find the creators of http://awebsite.orghttp://awebsite.org It
will search through all of these triples and find the ones of
interest to us, and then pluck off the names of the creators. These
triples could be distributed all around the Web
Slide 11
The Semantic Web, continued Main tools Namespaces posted on web
and shared XML Ontologies of assertions Tall people play basketball
Joe is tall (note both schema and instance based) Walking paths
linked by assertions with languages like SPARQL Forming inferences
from assertions along the way XML extensions to accommodate complex
data and non-string data and querying of large datasets Support
pointers to namespaces Support complex, non-textual documents,
along with object IDs, keys and foreign keys
Slide 12
XML
Slide 13
Continued
Slide 14
Accommodating complex data Schemas Initially DTDs Later XML
schema Save schema fragments and import them Non-string data types
Keys and FKs Type constructors Primitive integer, float, boolean,
date, ID Simple list, union Complex groups of elements
Slide 15
Data types in XML Schema
Slide 16
Continued
Slide 17
DTDs
Slide 18
XML schema and namespaces
Slide 19
XPATH for searching XML schemas hierarchically An XPath
expression takes a document tree as input and returns a multi-set
of nodes of the tree absolute path expressions Expressions that
start with / are absolute path expressions Expression / returns
root node of XPath tree /Students/StudentStudent Students
/Students/Student returns all Student-elements that are children of
Students elements, which in turn must be children of the root
/Student /Student returns empty set (no such children at root
Slide 20
XPATH continued Currentcontext Current (or context node) exists
during the evaluation of XPath expressions (and in other XML query
languages). denotes the current node;.. denotes the parent
foo/barbarfoo foo/bar returns all bar-elements that are children of
foo nodes, which in turn are children of the current
node./foo/bar./foo/bar same../abc/cdecdeabc../abc/cde all cde
e-children of abc e-children of the parent of the current node
relative Expressions that dont start with / are relative (to the
current node)
Slide 21
Attributes, text, /Students/Student/@StudentIdStudentId
StudentStudents /Students/Student/@StudentId returns all StudentId
a- children of Student, which are e-children of Students, which are
children of the root /Students/Student/Name/Last/text( )
/Students/Student/Name/Last/text( ) returns all t-children of Last
e-children of /comment( ) /comment( ) returns comment nodes under
root XPath provides means to select other document components as
well
Slide 22
XQuery General structure: FOR variable declarations WHERE
condition RETURN document Example: (: students who took MAT123 :)
FOR $t IN doc(http://xyz.edu/transcript.xml)//Transcript WHERE
$t/CrsTaken/@CrsCode = MAT123 RETURN $t/Student Result:
Slide 23 ">
XML and Web X.0: Flash Builder
Slide 24
Results in
Slide 25
Semantic Web big problems Massive reengineering effort to make
use of Semantic Web technology Assertions that span nodes can be
extremely time consuming to traverse Making media accessible Easy
enough to generate low level assertions automatically Very time
consuming to add assertions manually by experts Our main tools are
tagging and image/sound processing packages that are very complex
and very heuristic driven XML Schema, the big XML extension, is
unwieldy
Slide 26
Web X.0 big problems We are not just trying to search
relational databases Graphics is often used in a gratuitous,
non-useful, even distracting fashion, and they eat up download time
and computational time We still cannot manipulate or search or
interpret media
Slide 27
Comparison with NoSQL DBs Key-document and key-value databases
are a way of organizing document and value (blob) and continuous
databases so they can be searched quickly by next generation web
applications, as well as by programs automatically searching the
web Graph databases are a way of dynamically extending assertions
between objects, but dont play well with large networks
Slide 28
Nice things about NOSQL DBS and the Semantic Web and Web X.0
NoSQL DBs are minimalistic in just the right way Much easier to
plug in than complex XML Schema front ends to databases and can
work with existing relational dbs Documents are natural to both
efforts Media blogs are natural to both efforts Graphs are natural
to the Semantic Web
Slide 29
Web services Supports non-interactive database access Uses XML,
HTTP, etc. Examples are Google and Amazon Universal Description,
Discovery, and Integration (UUID) for creating distributed
registries of web services Web Services Description Language (WSDL)
Simple Object Access Protocol (SOAP) is XML based and is a protocol
that allows apps to send messages to each other over the
Internet
Slide 30
Security The complexity of server-side technology, along with
its heterogeneity The need to allow dynamic web page support,
email, ftp, etc. The need to support services Access to databases
from multiple sources on either side of the firewall
Slide 31
continued The tendency to loosen firewalls when things dont
work Email attachments Rapid rate of change of software and content
and services The use of open source and legacy dbs that are poorly
understood
Slide 32
Another security issue Web and database servers are used to
support newer sorts of data and service access Warehousing data
(usually, but not always inside the firewall) Mining data, which is
often outside the firewall Specialized document retrieval systems
Specialized advanced media retrieval systems Integration of
heterogeneous data Sharing of namespaces, schema fragments, and
query code (often in XML technologies)
Slide 33
continued All of these can be layered and span multiple sites
Such as hierarchical data marts Mediator based integration
hierarchies A wide class of people, inside and outside of the
organization must have access to data (such as content
taggers)
Slide 34
Data Privacy HIPAA Authorization of users and applications
Passwords Two factor (like a password or code and a physical code)
Mediated (using a third party) Encryption Storage Transmission