A Taxonomy of Semantic Web data Retrieval Techniques Anila Sahar Butt , Armin Haller, Lexing Xie The Australian National University, Australia.
A Taxonomy of Semantic Web data Retrieval Techniques
Anila Sahar Butt, Armin Haller, Lexing Xie
The Australian National University, Australia.
Introduction
The Semantic Web • Provides access to an increasing amount of
structured information in a wide variety of domains• Information overload due to the large amount of
structured data is as much a problem as on the traditional Web
• Ample research has been proposed on Semantic Web data retrieval (SWR) techniques
Introduction
The questions:• Is the field of Semantic Web data retrieval making
progress? • What are the directions that have been taken?• What are some of the promising significant directions
to pursue future research?
Semantic Web Data Retrieval Approaches
o Ontology Retrieval TechniquesSwoogle, BioPortal, AktiveRank, LOV, OntoSearch2 ,OBO ,OntoSelect WATSON ,OntoKhoj
o Linked/RDF data Retrieval TechniquesSindice, Sig.ma, SWSE, SemRank, LTR
o Graph/Structured data Retrieval TechniquesSLQ, OSQ, Lindex, NeMa, BLINK, SSSGD
Overview of Semantic Web retrieval process
Data Acquisition
Data Warehousing
Reasoning
Indexing
Ranking
Query Evaluation
User Interface
Pre-processing Online-
processing
Semantic Web data Retrieval TechniquesRetrieval Aspects
Storage & Search Ranking Evaluation Practical
Aspects
Implementation
Datasets
Efficiency
Effectiveness
Scalability
Ranking Scope
Ranking Factor
Ranking Domain
Data Acquisition
Indexing
Query Match
Scope
Query Model
Results Type
Data Storage
User Interface
Semantic Web data Retrieval TechniquesRetrieval Aspects
Scope
Query Model
Results Type
Type(s) of the data that can be explored with the approach
• Ontologies • Linked/RDF Data• Graph/Structured Data
Way(s) a user can initiate the retrieval process• Keyword search• Structured query search• Faceted browsing• Hyperlink-based navigation
Type(s) of the output as a result of a user’s query• Relation centric • Entity centric• Document centric
Semantic Web data Retrieval TechniquesRetrieval Aspects
Storage & Search Ranking Evaluation Practical
Aspects
Implementation
Datasets
Efficiency
Effectiveness
Scalability
Ranking Scope
Ranking Factor
Ranking Domain
Data Acquisition
Indexing
Query Match
Scope
Query Model
Results Type
Data Storage
User Interface
Semantic Web data Retrieval TechniquesStorage &
Search
Data Acquisition
Indexing
Query Match
Data Storage
Way(s) and rule(s) of the data collection• Manual collection• HTML agnostic crawlers• HTML aware crawlers• Focussed crawlers
Type(s) of storage structures• Relational databases• Native storage• NoSQL databases
Type(s) of index structures• No index • Full text index • Structural index • Graph index• Multi-level indexing
Way(s) to find query matches in the data collection
• Exact Match• Partial Match
Semantic Web data Retrieval TechniquesRetrieval Aspects
Storage & Search Ranking Evaluation Practical
Aspects
Implementation
Datasets
Efficiency
Effectiveness
Scalability
Ranking Scope
Ranking Factor
Ranking Domain
Data Acquisition
Indexing
Query Match
Scope
Query Model
Results Type
Data Storage
User Interface
Semantic Web data Retrieval Techniques
Ranking
Ranking Scope
Ranking Factor
Ranking Domain
Query dependence of a ranking model • Global• Focus
Factor(s) based on which the ranks are calculated• No Ranking• Popularity• Authority• Informativeness• Relatedness• Coverage • Centrality• Learned model• Feedback
Original domain of a ranking model• Semantic web• Graph databases• Document retrieval• Machine learning
Semantic Web data Retrieval TechniquesRetrieval Aspects
Storage & Search Ranking Evaluation Practical
Aspects
Implementation
Datasets
Efficiency
Effectiveness
Scalability
Ranking Scope
Ranking Factor
Ranking Domain
Data Acquisition
Indexing
Query Match
Scope
Query Model
Results Type
Data Storage
User Interface
Semantic Web data Retrieval Techniques
Evaluation
Efficiency
Effectiveness
Scalability
Time taken to retrieve the relevant results • Query execution time• Index construct time• Index update time
Correctness of the retrieved or relevant results• Recall• Precision• F-Measure• MAP• NDCG
Flexibility of the approach• Data size• Data complexity• Query size• Query complexity
Semantic Web data Retrieval TechniquesRetrieval Aspects
Storage & Search Ranking Evaluation Practical
Aspects
Implementation
Datasets
Efficiency
Effectiveness
Scalability
Ranking Scope
Ranking Factor
Ranking Domain
Data Acquisition
Indexing
Query Match
Scope
Query Model
Results Type
Data Storage
User Interface
Semantic Web data Retrieval TechniquesPractical Aspects
Implementation
Datasets
User Interface
Programming language(s) adopted• Java• Python• C#• C
Type of dataset used for implementation or evaluation
• Synthetic• Real
Ways(s) of user interaction• GUI• API
Survey of Existing SWR Techniques
o Ontology Retrieval TechniquesSwoogle, BioPortal, AktiveRank, LOV, OntoSearch2 ,OBO ,OntoSelect WATSON ,OntoKhoj
o Linked/RDF data Retrieval TechniquesSindice, Sig.ma, SWSE, SemRank, LTR
o Graph/Structured data Retrieval TechniquesSLQ, OSQ, Lindex, NeMa, BLINK, SSSGD
Categorization of Prominent Semantic Web Retrieval Techniques 1/2
Techniques
Search Aspect
Storage and Search
Ranking Evaluation
Practical Aspect
Search Scope
Query Model
Result Type
Data Acquisition
Data Storage
Indexing
Query Match
Ranking Scope
Ranking Factor
Ranking Domain
Efficiency
Effectiveness
Scalability
Implementation
Datasets
User Interface
LOV [16] O K,S D M N F E G P D - - - J R G,A
BioPortal [10] O K,F D M - - E G F D - - - J R G,A
OntoSearch2 [14]
O K,F D M N S P G Co - Q - DS - R G
OBO [13] O H D M - N - - - - - - - - R G
OntoSelect [3] O K,F D M,AG
- - - - - D - - - - R G,A
Swoogle [6] O,L K D AG R F P G P D Q - - J R G
WATSON [5] O,L K,S D AG N S,F
E - N D - - - J R G,A
OntoKhoj [12] O K D AG R F - G P D - F - - R G,A
AKTiveRank[1] O K D - - - P F P,Co,C
G,D - P - - - -
Sindice [11] L K D F NoS F P F Co D Q,C,U - DS J R G,A
Categorization of Prominent Semantic Web Retrieval Techniques 2/2
Techniques
Search Aspect
Storage and Search
Ranking Evaluation Practical Aspect
Search Scope
Query Model
Result Type
Data Acquisition
Data Storage
Indexing
Query Match
Ranking Scope
Ranking Factor
Ranking Domain
Efficiency
Effectiveness
Scalability
Implementation
Datasets
User Interface
Sig.ma [15] L K E - NoS
F P F A,Co D Q - - J R G,A
SWSE [8] L K E AG N S P F A,P D Q,C,U
- DS J R G
SemRank [2]
L - R M - - E F - M - - - - S -
LTR [4] L K E M - - - G L M - N - - R -
SLQ [19] G K D M N S,F P F C M Q P,M,N DS J R,S -
OSQ [17] G S D M N G P - - - Q,C P,M,N DS - R,S -
Lindex [20] G S D M N,M
G P - - - Q,C,U
- DS - R,S -
NeMa [9] G S D M N G P - - - Q,C,U
P,R,F DS - R -
BLINK [7] G K D M N M P F C G Q,C - DS - R -
SSSGD[18] G S D M N G P - - - Q - QS,DS
- R,S -
Discussion and Research Directions 1/3 o Dynamic Faceted Browsing
Challenge: Syntactic diversity in describing the same propertyA title of a resource can be described as a name, a title, a label etc.
Solution: Clustering similar types of properties into a single group
o Ontology Retrieval Challenge: Discovering the most related vocabularies for a
query string Solution: Find relevant ontologies that cover most of the
query terms or related concepts to these terms
Discussion and Research Directions 2/3 o Ontology Ranking Models
Challenge: Match of a search term with a more expressive class, property or ontology description
Solution: Ranking models that consider design perspective, level of details, and extension in ontologies.
o Linked data retrieval Effectiveness vs. Efficiency Challenge: Effectiveness focussed techniques vs. Efficiency
focussed techniques Solution: A reasonable trade off between effectiveness and
efficiency
Discussion and Research Directions 3/3 o Ranking of triples for entity retrieval approaches
Challenge: Ranking of a property depends upon the entity it belongs to, and ranking of the object values for multivalued properties depending upon the entity to which the property belongs to
Solution: Ranking of triples for the entity to prioritize relevant attributes and object values of that entity
o An evaluation framework for Semantic Web data retrieval techniques
Challenge: Comparative evaluation of different SWR techniques with regards to their effectiveness, efficiency and scalability
Solution: Conducting comprehensive comparative experimental studies
Conclusion
o An overview of Semantic Web data retrievalo A taxonomy for Semantic Web data retrieval
techniqueso Categorization of Prominent Semantic Web
Retrieval techniqueso Future research directions
Thanks!o For more details
Anila Sahar Butt, Armin Haller, Lexing Xie, “A Taxonomy of Semantic Web Data Retrieval Techniques” In the Proceedings of the 8th International Conference on Knowledge Capture, Palisades, NY, USA, 7th – 10th October 2015
o Contact [email protected]@anu.edu.au
References
[1] H. Alani, C. Brewster, and N. Shadbolt. Ranking ontologies with aktiverank. In The Semantic Web-ISWC 2006, pages 1–15. Springer, 2006.
[2] K. Anyanwu, A. Maduko, and A. Sheth. Semrank: ranking complex relationship search results on the semantic web. In Proceedings of the 14th international conference on World Wide Web, pages 117–127. ACM, 2005.
[3] P. Buitelaar, T. Eigner, and T. Declerck. Ontoselect: A dynamic ontology library with support for ontology selection. In In Proceedings of the Demo Session at the International Semantic Web Conference. Citeseer, 2004.
[4] L. Dali, B. Fortuna, T. T. Duc, and D. Mladeni´c. Query-independent learning to rank for rdf entity search. In The Semantic Web: Research and Applications, pages 484–498. Springer, 2012.
[5] M. d’Aquin and E. Motta. Watson, more than a semantic web search engine. Semantic Web, 2(1):55–63, 2011.
[6] L. Ding, T. Finin, A. Joshi, R. Pan, R. S. Cost, Y. Peng, P. Reddivari, V. Doshi, and J. Sachs. Swoogle: a search and metadata engine for the semantic web. In Proceedings of the thirteenth ACM international conference on Information and knowledge management, pages 652–659. ACM, 2004.
[7] H. He, H. Wang, J. Yang, and P. S. Yu. Blinks: Ranked keyword searches on graphs. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 305–316. ACM, 2007.
[8] A. Hogan, A. Harth, J. Umbrich, S. Kinsella, A. Polleres, and S. Decker. Searching and browsing linked data with swse: The semantic web search engine. Web semantics: science, services and agents on the world wide web, 9(4):365–401, 2011.
[9] A. Khan, Y. Wu, C. C. Aggarwal, and X. Yan. Nema: Fast graph search with label similarity. the VLDB Endowment, 6(3):181–192, 2013.
[10] N. F. Noy, N. H. Shah, P. L. Whetzel, B. Dai, M. Dorf, N. Griffith, C. Jonquet, D. L. Rubin, M.-A. Storey, C. G. Chute, et al. Bioportal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Research, 37(suppl 2):W170–W173, 2009.
References
[11] E. Oren, R. Delbru, M. Catasta, R. Cyganiak, H. Stenzhorn, and G. Tummarello. Sindice. com: A document-oriented lookup index for open linked data. International Journal of Metadata, Semantics and Ontologies, 3(1):37–52, 2008.
[12] C. Patel, K. Supekar, Y. Lee, and E. Park. Ontokhoj: a semantic web portal for ontology searching, ranking and classification. In Proceedings of the 5th ACM international workshop on Web information and data management, pages 58–61. ACM, 2003.
[13] B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters, L. J. Goldberg, K. Eilbeck, A. Ireland, C. J. Mungall, et al. The obo foundry: coordinated evolution of ontologies to support biomedical data integration. Nature biotechnology, 25(11):1251–1255, 2007.
[14] E. Thomas, J. Z. Pan, and D. Sleeman. Ontosearch2: Searching ontologies semantically. In Proceedings of the OWLED 2007 Workshop on OWL: Experiences and Directions, volume 258 of CEUR Workshop Proceedings, 2007.
[15] G. Tummarello, R. Cyganiak, M. Catasta, S. Danielczyk, R. Delbru, and S. Decker. Sig. ma: Live views on the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, 8(4):355–364, 2010.
[16] P.-Y. Vandenbussche and B. Vatant. Linked Open Vocabularies. ERCIM news, 96:21–22, 2014.
[17] Y. Wu, S. Yang, and X. Yan. Ontology-based subgraph querying. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pages 697–708. IEEE, 2013.
[18] X. Yan, P. S. Yu, and J. Han. Substructure similarity search in graph databases. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 766–777. ACM, 2005.
[19] S. Yang, Y. Wu, H. Sun, and X. Yan. Schemaless and structureless graph querying. Proceedings of the VLDB Endowment, 7(7), 2014.
[20] D. Yuan and P. Mitra. Lindex: a lattice-based indexfor graph databases. The VLDB Journal, 22(2):229–252, 2013.