Top Banner
1 Practical semantic web mining platform
33
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: download

1

Practical semantic web mining platform

Page 2: download

2

What is?

SWM includes: Semantic Web and RDF Regular Expressions, Web Agents HMMs and Information Extraction Rule Mining, F-Logic, Description Logic Information Integration Planning for Data Gathering Ontologies, Learning, Editing Text Classification Applications: E-Commerce Web services Semantic Web Browser etc

Page 3: download

3

Some Background

Page 4: download

4

Semantic Annotation

XML textMulti-Media

Deep Web

Web Services

Semantic Content

Semantic Indexing/

Integration

Indexing

Semantic Retrieval

Reason

Learn annotation

Efficient indexing engine

Learn Reasoning

rule

Semantic Retrieval

Ontology base

Ontology base

Learn Mapping/

Link

Semantic Crawler

texttextWWW

??

Active Learning Machine Learning: BC, NN, GA, AR

Multi-view Learning Multi-view detection

Application

Make it efficient in

domainActive

learning driven OM/

OL

Ontology based

Summary

Page 5: download

5

Algorithm/theory of ML

Techniques of Machine Learning /Data Mining Bayesian classification/NN/GA Statistical technique Active Learning, Multi-View Learning Risk Minimization/Maximum Entropy Model

Page 6: download

6

Annotation

Multiple Sources Annotation tools Using ML to automate the process

Learn annotation rule Active Learn Driven (reduce training sample) Multi-view (improve performance) Multi-view detection (improve again)

Page 7: download

7

Mapping & Link

Mapping Find mapping points Find Complex mapping points (subof, superof, 5*(a+b), even conj

unct of, etc) Translate instances based on Mapping

Link Find Link Points Find Complex Links Integrate Ontology

Mapping/Link detection.

Page 8: download

8

Page 9: download

9

Mapping & Link

Multi-view name Instance Relationship, etc

Active learning. Ask the user to specify the most confused mapping/link

Multi-view detection. Improve the performance

Page 10: download

10

Indexing

What is the difference between SI and Text indexing/XML indexing?

How to define the data structure of SI? (note that such structure should represent the characters of SW & Ontology)

How to make it efficient? (how to compare to others work? Are there some works on it?)

Page 11: download

11

Semantic Retrieval

Domain vs. General Make use of SI & Ontology to improve the

performance. Make use of reasoning technique to improve.

Page 12: download

12

Reasoning

Reasoning rules learning Example: Resumes, Jobs

How to find the most appropriate job for individual? How to find the most appropriate person for specified job? Define the Rules: if Person.Age(x)<30 then Job(y).Salary>8

000 Rule Discovery

Page 13: download

13

Applications

Jobs & Resumes E-Commerce. E.g. Travel, Tickets, etc. Personal Assistant. Track ones work and

interest to find new information automatically. Semantic Web Browser

Page 14: download

14

Free discussion for the platform

Page 15: download

15

Aspects

Data Content

what will to do, what can do, what not. Semantic web, semantic web services

Theory->>may be basic for SCI Practical application!!!! important Proposal & Schedule.

Page 16: download

16

Data

Data preparation Domain: job&resume, software (from sourceforge), travel w

eb services. ontology. Metadata & instance

Works: metadata definitionintegrate a ontology editor (protégé or

ontoedit or orient) Instance database, use technique of annotation or IE to

extract information from specific web sites. How to save use jena to save the data in database and q

uery it by RQL indexing?

Page 17: download

17

Content

Ontology building, knowledge base buildinguse wordnet to assist

Composition for web services. If not web services, what we can do, such as jobs & resumes.

Annotation & deep annotation. Web service annotation, text annotation, even image annotation.

Mapping. concept mapping, instance mapping. translation, merge, meaning negotiation(mapping representation)

Data Integration. Combine annotation and mapping

Page 18: download

18

Content

Semantic search engine. Its definition? Simple search=data search, then how to make use of ontology. Reasoning? How to make it practical, that is, how to do it in our

domain. Shall it be a general one or domain one? Ontology summary. Need a better name.

output knowledge in ontology by NLP. Indexing? Tools integration

Page 19: download

19

Theory

ML, data mining. Inductive learning: NN, Bayes, SVM, GA. Code them or on

e of them by ourselves. It will cost our time, but it doesn’t mean waste time.

Transductive learning. Selective learning. More general theory, risk minimization. Note that RM is an

algorithm. It is a framework for ML. Any learning algorithms can be used as its implementation.

Active learning + multi-view Reduce the samples of training. Improve the precision.

Page 20: download

20

Practical application

Jobs & resumes Targets: to find the best qualified resumes/person

s for specified job or to find the best jobs for a person.

Software from sourceforge, etc. Aim at software composition. web service com

position. Software search

Page 21: download

21

Practical application

more?

Page 22: download

22

Proposal & schedule

Why proposal? Why schedule? Can we work together for the possible

platform?

Page 23: download

23

Further Reading

Page 24: download

24

Further reading on Semantic Annotation A. Kiryakov, B. Popov, et al. Semantic Annotation, Indexing, and Retrieval. 2nd International Semantic W

eb Conference (ISWC2003), http://www.ontotext.com/publications/index.html#KiryakovEtAl2003 [Alani, 2003] Alani, H., Kim, S., Millard, D., Weal, M., Hall, W., Lewis, P. and Shadbolt, N. Automatic Ontol

ogy-Based Knowledge Extraction from Web Documents. IEEE Intelligent Systems 18(1):pp. 14-21. [Bemjamins, 2002]Richard Benjamins, Jesus Contreras. White Paper Six Challenges for the Semantic W

eb. Intelligent Software Components. Intelligent software for the networked economy (isoco). April, 2002. [Berners-Lee, 1999] Tim Berners-Lee, Mark Fischetti (Contributor), Michael L. Dertouzos; “Weaving the

Web: The Original Design and Ultimate Destiny of the World Wide Web”; 1999. [Califf, 1998] Califf M. E. (1998), Relational Learning Techniques for Natural Language Information Extrac

tion, Ph.D. thesis, Univ. Texas, Austin, 1998 [Ciravegna, 2001] Fabio Ciravegna. (LP)2, an adaptive algorithm for information extraction from web-relat

ed texts. In Proceedings of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining held in conjunction with 17th International Joint Conference on Artificial Intelligence (IJCAI), Seattle, Usa, August 2001.

Page 25: download

25

Further reading on Semantic Annotation [Cohen, 2001] W. Cohen, L. Jensen, A structured wrapper induction system for extracting i

nformation from semi-structured documents, in: Proceedings of the Workshop on Adaptive Text Extraction and Mining (IJCAI’01), 2001.

[Cunningham. 2002] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.

[Czejdo, 2000] B. Czejdo, J. Dinsmore, C. H. Hwang, R. Miller, M. Rusinkiewicz. Automatic Generation of Ontology Based Annotations in XML and Their Use in Retrieval Systems. Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1. IEEE Computer Society Washington, DC, USA. 2000. 296-300

[Dhamankar, 2004] Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy, Pedro Domingos. iMAP: Discovering Complex Semantic Matches between Database Schemas. SIGMOD 2004 June 1318, 2004, Paris, France.

Page 26: download

26

Further reading on Semantic Annotation [Dill, 2003] Stephen Dill, Nadav Eiron, David Gibson, Daniel Gruhl, R. Guha, Anant Jhingran, Tapas Kanu

ngo, Kevin S. McCurley, Sridhar Rajagopalan, Andrew Tomkins, John A. Tomlin, Jason Y. Zien. A case for automated large-scale semantic annotation. Journal of Web Semantics: Science, Services and Agents on the World Wide Web. Published by Elsevier B.V. July, 2003:115-132

[Eriksson, 1999] H. Eriksson, R. Fergerson, Y. Shahar, and M. Musen. Automatic generation of ontology editors. In Proceedings of the 12th Banff Knowledge Acquisition Workshop, Banff Alberta, Canada, 1999.

[Handschuh, 2002] S. Handschuh, S. Staab, F. Ciravegna, S-CREAM—semi-automatic creation of metadata, in: Proceedings of the 13th International Conference on Knowledge Engineering and Management (EKAW 2002), Siguenza, Spain, 2002, pp. 358-372.

[Heflin, 2000] J. Heflin, J. Hendler, Searching the web with shoe, in: AAAI-2000 Workshop on AI for Web Search, Austin, Texas, 2000.

[Kahan, 2001] J. Kahan, M.-R. Koivunen, Annotea: an open RDF infrastructure for shared web annotations, in: World Wide Web, 2001, pp. 623-632.

Page 27: download

27

Further reading on Semantic Annotation [Kogut, 2001] P. Kogut, W. Holmes, AeroDAML: applying information extraction to generate DAML annota

tions from web pages, 2001. [Kushmerick, 1997] N. Kushmerick, D.S. Weld, R.B. Doorenbos, Wrapper induction for information extract

ion, in: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 1997, Nagoya, Japan, pp. 729-C737.

[Leonard, 2001] T. Leonard, H. Glaser, Large scale acquisition and maintenance from the web without source access, http://www. semannot2001.aifb.uni-karlsruhe.de/positionpapers/Leonard. pdf, 2001.

[Lerman, 2001] K. Lerman, C. Knoblock, S. Minton, Automatic data extraction from lists and tables in web sources, in: IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, Seattle, WA, August 2001.

[Li, 2001] L.Z. Jianming Li, Y. Yu, Learning to generate semantic annotation for domain specific sentences, in: Knowledge Markup and Semantic Annotation Workshop in K-CAP 2001, Victoria, BC, 2001.

[Popov, 2003] Borislav Popov, Atanas Kiryakov, Dimitar Manov, Angel Kirilov, Damyan Ognyanoff, and Miroslav Goranov. Towards Semantic Web Information Extraction. In ISWC'03 Workshop on Human Language Technology for the Semantic Web and Web Services, 2003.1-21

Page 28: download

28

Further reading on Semantic Annotation [Schaffer, 1993] Selecting a classification method by cross-validation. Machine Learning, 1

3(1):135-143 [Soderlan, 1999] Soderland, S. Learning information extraction rules for semi-structured an

d free text. Machine Learning. 1999,1. 1-44 [Soo, 2003] Von-Wun Soo, Chen-Yu Lee, Chung-Cheng Li, Shu Lei Chen and Ching-chih

Chen. Automated Semantic Annotation and Retrieval Based on Sharable Ontology and Case-based Learning Techniques. Proceedings of the 2003 Joint Conference on Digital Libraries. 2003 IEEE.

[Vargas-Vera, 2001] M. Vargas-Vera, E. Motta, J. Domingue, S. Buckingham Shum, and M. Lanzoni. Knowledge Extraction by using an Ontology-based Annotation Tool. In K-CAP 2001 workshop on Knowledge Markup and Semantic Annotation, Victoria, BC, Canada, October 2001.

[Vargas-Vera, 2002] M. Vargas-Vera, E. Motta, J. Domingue, M. Lanzoni, A. Stutt, F. Ciravegna, MnM: ontology driven semiautomatic and automatic support for semantic markup, in: Proceedings of the 13th International Conference on Knowledge Engineering and Management (EKAW 2002), Siguenza, Spain, 2002.

Page 29: download

29

Further reading on Ontology Mapping [1] Berger, J. Statistical decision theory and Bayesian analysis. Springer-Verlag. 1985 [2] Calvanese, D.; De Giacomo, G.; and Lenzerini, M. 2002. A framework for ontology integration. In Cruz,

I.; Decker, S.; Euzenat, J.; and McGuinness, D., eds., The Emerging Semantic Web. IOS Press. 201-214. [3] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Devel

opment Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002.

[4] Robin Dhamankar, Yoonkyong Lee, AnHai Doan, etal. iMAP: Discovering Complex Semantic Matches between Database Schemas. Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, 2004. Paris, France: ACM Press.

[5] H. Do and E. Rahm. Coma: A system for flexible combination of schema matching approaches. In Proc. of VLDB-2002.

[6] Doan, A.H., P. Domingos, A. Halevy: Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach. SIGMOD 2001.

[7] A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Learning to map between ontologies on the semantic web. In Proceedings of the World-Wide Web Conference (WWW-2002), pages 662-673. ACM Press, 2002.

Page 30: download

30

Further reading on Ontology Mapping [8] J. Kang and J. Naughton. On schema matching with opaque column names and data values. In Proc.

of SIGMOD-2003. [9] W. Kim and J. Seo. Classifying schematic and data heterogeneity in multidatabase systems. IEEE Co

mputer, 1991, 24(12):12-18 [10] J. Madhavan, P. Bernstein, and E. Rahm. Generic schema matching with cupid. In Proc. of VLDB-20

01. [11] A. Maedche, B. Moltik, N. Silva and R. Volz. MAFRA -An Ontology MApping FRAmework in the Cont

ext of the Semantic Web. In Proceeding of the EKAW'2002, Siguenza, Spain. 2002. [12] Alexander Maedche, Steffen Staab: Ontology Learning for the Semantic Web. IEEE Intelligent Syste

ms 16(2): 72-79 (2001) [13] Jayant Madhavan, Philip Bernstein, Kuang Chen, Alon Halevy, and Pradeep Shenoy. Corpus based

schema matching. In Proc. of the IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), 2003.

[14] McGuinness D., Fikes R., Rice J., and Wilder S. :An environment for merging and testing large ontologies. Proceedings of the 7th International Conference on Principles of Knowledge Representation and Reasoning. Colorado, USA.

Page 31: download

31

Further reading on Ontology Mapping [15] S. Melnik, H. Molina-Garcia, and E. Rahm. Similarity flooding: a versatile graph matching algorithm. I

n Proc. of ICDE-2002. [16] N. F. Noy and M. A. Musen. PROMPT: Algorithm and Tool for Automated Ontology Merging and Alig

nment. In Proc. of AAAI-2000, pages 450-455, 2000. [17] Nuno Silva and Joao Rocha. Semantic Web Complex Ontology Mapping. IEEE/WIC International Co

nference on Web Intelligence (WI'03) October 13-17, 2003 Halifax, Canada:82-100 [18] Omelayenko, B. RDFT: A Mapping Meta-Ontology for Business Integration; Workshop on Knowledge

Transformation for the Semantic Web (KTSW 2002) at ECAI'2002. Lyon, France; 2002:76-83 [19] Palopoli, L., G. Terracina, D. Ursino: The System DIKE: Towards the Semi-Automatic Synthesis of C

ooperative Information Systems and Data Warehouses. ADBIS-DASFAA 2000, 108¡§C117 [20] Park, J. Y., Gennari, J. H. and Musen, M. A.; "Mappings for Reuse in Knowledge-based Systems"; 11

th Workshop on Knowledge Acquisition, Modelling and Management (KAW 98); Banff, Canada; 1998. [21] Patrick. P, Dekang. L. Discovering Word Senses from Text. In Proceedings of ACM SIGKDD Confere

nce on Knowledge Discovery and Data Mining 2002:613-619.

Page 32: download

32

Further reading on Ontology Mapping [22] Richard Benjamins, Jes¡§?s Contreras. White Paper Six Challenges for the Semantic Web. Intelligent

Software Components. Intelligent software for the networked economy (isoco). April, 2002. [23] E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. The VLDB Jou

rnal, 10:334-350, 2001. [24] Tim Berners-Lee, Mark Fischetti (Contributor), Michael L. Dertouzos; "Weaving the Web: The Original

Design and Ultimate Destiny of the World Wide Web"; 1999. [25] K. M. Ting and I. H. Witten. Issues in stacked generalization. Journal of Artificial Intelligence Researc

h, 10:271-289, 1999. [26] Wache, H.; Voegele, T.; Visser, U.; Stuckenschmidt, H.;Schuster, G.; Neumann, H.; and Huebner, S.

2001. Ontology-based integration of information - a survey of existing approaches. In Proc. of IJCAI 2001 Workshop on Ontologies and Information Sharing.

[27] Wiesman, F., Roos, N., and Vogt, P. (2001). Automatic ontology mapping for agent communication. Technical report.

[28] L. Xu and D. Embley. Using domain ontologies to discover direct and indirect matches for schema elements. In Proc. of the Semantic Integration Workshop at ISWC-2003.

Page 33: download

33

Further Reading on Machine Learning Muslea. Multi-view plus active learning. (thesis) Tom M. Mitchell. Machine Learning. Richard O. Duda. Pattern Classification. (Second Ed

ition) Zhai-Xiang Chen. Risk Minimization based Informati

on Retrieval. (thesis) Wrapper Induction. Several thesis: rapier, etc Data Mining. Han,