Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK http://www.ksi.mff.cuni.cz/semwex/ Semantic Web infrastructure Trisolda current state and perspectives 10. Mixer 26.11.2008
Apr 02, 2015
Filip Zavoral, Jiří DokulilSemWex - KSI MFF UK
http://www.ksi.mff.cuni.cz/semwex/
Semantic Web infrastructure Trisolda current state and perspectives
10. Mixer 26.11.2008
Semantic web vs. semantization
Semantic web vision Tim Berners-Lee
“The Semantic Web,” Scientific Am. 2001 semantic research generously funded 'hardly one has ever seen ...'
New buzzwords Web 2.0, Web 3.0, Social web, Web of data, Meshups, …
Semantic web died? no, not yet born
Web Semantization
Semantic technologies
TCP/IP
HTTP
HTML
Browser
Technical details
Semantic web services
Trisolda
Motto 'hardly one has ever seen ...' the semantic web
data from real life incomplete, duplicated, inaccurate, >20 millions triples
Jena very slow load, over >1 million of triples → crash
Sesame unable to load more then 200 000 triples exponential complexity for loading
where is a working platform for semantic web research?
Technology background Repository – data integration DataPile
Trisolda
Trisolda Architecture
Import interfaces
Repository
Querying & Executors
Repository
Trisolda Repository Stores incoming data Retrieves results for queries Stores used ontology DataPile structure
holds data in any formatApplications server
Not all data and knowledge available when imported the knowledge is not
accurate Background worker
inferencing data unifications reasoner
Framework for plug-ins
Import
Direct import data in data sources converters to the used ontology
Crawling wild Web Egothor web crawler
AgentMat parsed pages stored deductors deduce data and
ontology real life data incomplete, duplicated,
inaccurateImport modes
batch insert immediate insert
Querying
Query API Based on simple graph matching
query: set of RDF triples with var.
result: multiset of possible variable mapping – a relation
Not another SQL-like language set of C++ classes and
operators Query evaluation
levels of support by q engines
Query environments present outputs examples: rep. browser, RDF
visualizer, semantic executors service composition -
conductors
AgentMat - data semantization framework
AgentMat - data extraction
Future work
Conclusions working infrastructure
currently not working - re-deployment, AgentMat & TriQ integration
gathering, storing and querying of semantic data platform for research and experiments
Future work & long-term goals specialized semantic data storage semantic acquisition, data semantization interface-based loosely coupled network of Semantic
Web repositories semantic computing, services, composition, executors ...
Selected Publications
Beňo, Míšek, Zavoral: AgentMat: Framework for Data Scraping and Semantization, 3rd International Conference on Research Challenges in Information Science, IEEE, 2009
Dokulil, Yaghob, Zavoral: Trisolda: The Environment for Semantic Data Processing, International Journal On Advances in Software, IARIA, 2009
Podzimek, Dokulil, Yaghob, Zavoral: Mám hlad: pomůže mi Sémantický web?, Informačné technológie - Aplikácia a Teória, ITAT 2008
Dokulil, Tykal, Yaghob, Zavoral: Semantic Web Repository And Interfaces, International Conference on Advances in Semantic Processing, SEMAPRO 2007, IEEE Computer Society Press - Best Paper Award
Dokulil, Tykal, Yaghob, Zavoral: Semantic Web Infrastructure, IEEE International Conference on Semantic Computing ICSC, IEEE Computer Society Press 2007
Yaghob, Zavoral: Semantic Web Infrastructure using DataPile, Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Itelligent Agent Technology, Hong Kong, IEEE Computer Society Press 2006
PART II
Tables in RDF querying -do we really need them?
SPARQL
syntax SQL-like – at first look “simple language” but complex grammar
{?x ?y ?z . OPTIONAL { ?a ?b ?c . } . ?k ?l ?m . } {?x ?y ?z OPTIONAL { ?a ?b ?c } ?k ?l ?m }
SPARQL
semantics lot of changes – now stable based on algebra
works with sets of variable mappings – i.e. tables very different from SQL
“closed” no compositionality
SPARQL
RDF is a graphSPARQL provides pattern (subgraph) matching –
no other graph handling
SPARQL handles only fixed-size graphsRDFS supports arbitrary hierarchy of classes
SPARQL has no aggregate functions, no “group by” no constructors
Seasoned SQL developer
Seasoned SQL developer
Idea… ?
make the language SQL-like inside not just outside joins, selection, projection, grouping,
aggregation relational algebra works with relation, i.e. sets of
triples, the database is made of relations RDF data is made of… RDF graphs
maybe we should work with RDF graphs
Tables – Graphs
John Smith
John Doe
Jane Doe
Bill Jackson
John
Smith
John
Doe
Jane
Doe
Bill
Jackson
Basic pattern
variables -> “columns”
?firstname
?lastname
?personex:firstname
ex:lastname
Further operations
selection, joins, aggregation, projectiongroup by
Local and global aggregations
more values in one “column”
maximal number of mailstotal count of mails
What’s more?
optional parts of the graphregular expressionstextual representation (language)
Conclusion
current state is badtry something different ?
PART III
Let’s have a look – RDF visualizer
RDF
subject – the thing we are describingpredicate – the property of the thingobject – the value of the property
a graph (directed, labeled)
Visualization
triangle layout layered drawing for trees
node merging more information for a node
navigation the way to handle huge data
Let’s have a look
A picture is worth a thousand words…