Semantically-supported team building in a KDD virtual environment

Semantically-supported Team Building in a KDD Virtual Environment

Claudia Diamantini, Domenico Potena and Emanuele Storti Dipartimento di Ingegneria dell’Informazione

Università Politecnica delle Marche via Brecce Bianche, 60131 Ancona

{c.diamantini, d.potena, e.storti}@univpm.it

Abstract— Team building plays a crucial role in many collaborative projects. The use of semantic technologies and tools, like ontologies and metadata management proved to be a powerful approach to organize people competencies and support the formation of teams. Although Knowledge Discovery in Databases (KDD) has inherent collaborative characteristics, team building in this domain has not been the subject of extensive work yet. In this paper we start filling this gap by presenting TeamOnto, an ontology for the representation of project teams. TeamOnto is part of a wide, modular and integrated Knowledge Base about KDD projects, of which we briefly illustrate some modules, showing their use to support team building in the KDD domain.

Keywords-Knowledge Representation and Ontology; KM for Collaboration and Decision Support; Knowledge Management Systems and Applications; Knowledge Discovery in Databases; Team Building

I. INTRODUCTION

Knowledge Management (KM) is nowadays considered essential for enterprises to remain competitive. While KM is important in individual tasks, it becomes strategic in collaborative work. Skill Management (SM) and Knowledge Discovery (KD) are two different perspectives of KM jointly considered in this paper. SM is the part of Knowledge Management devoted to understand and codify skills within an organization, mainly for two purposes: mapping competencies already available in an organization and searching for collaborators suitable for certain tasks, in order to build a team. KD deals with the set of methodologies and tools enabling organizations to gain new insights on a problem or opportunity, for decision support. When KD is performed starting from past data stored in the organization’s or third parties’ databases, then the set of methodologies, techniques and tools to analyze data and extract novel and potentially useful information from them is known as Knowledge Discovery in Databases (KDD). KDD is a non-trivial task, requiring diverse competencies ranging from statistics’ and machine learning’s theoretical background, to technical skills and experience with the application domain. Since this vast amount of competencies can hardly be possessed by a single individual, teams of (possibly geographically distributed) experts are constituted to design and manage KDD projects. In this paper we present the design and implementation of a Knowledge Base for the

representation of skills of members of one or more organizations engaged in the development of KDD projects. The Knowledge Base is mainly aimed at providing searching capabilities to support efficient and effective team building. It is structured to maintain the formal definition of core concepts in ontologies as well as records of past projects and their characteristics. In this way, the Knowledge Base can be exploited to find people that have worked in projects similar to the one at hand, where similarity can be assessed with respect to the applicative domain (e.g. bioinformatics, marketing) or the final goal (e.g. protein classification, user profiling). Also, people can be characterized with respect to their knowledge of a particular data analysis technique or tool, where knowledge can be assessed by evaluating the kind of activities performed, like authoring of documents about the technique, development of the software tool, or frequent use of the technique/tool inside different projects. Finally, collaboration in past teams, the role of people therein and the kind of projects can be analyzed to find complementary skills and complete a team. More details on the functionalities enabled by the Knowledge Base will be given after its full definition is established.

The specification of the Knowledge Base starts from requirements gathered during the development of the Knowledge Discovery in Databases Virtual Mart (KDDVM). KDDVM is a platform aiming at establishing a “virtual market” where autonomous, geographically distributed users can easily share resources, capabilities, and find useful tools supporting collaborative design and execution of KDD processes [1]. This requires to deal with heterogeneity and interoperability problems, and to conceive an inherently open and flexible architecture. The basis of such architecture is the Service Oriented paradigm, hence tools and architecture components are all implemented by using Web Service standards. Ontologies and metadata management are also central in the development of the platform to push interoperability to a higher extent. The wide range of knowledge that must be compiled and maintained in KDDVM has been the subject of several work by the authors. In particular, previous work has explored the characterization of the KDD domain and of KDD computational resources and processes. In this work we focus on the definition of TeamOnto, an ontology for the characterization of people and teams involved in the design and the execution of KDD processes.

978-1-4673-1382-7/12/$31.00 ©2012 IEEE 45

https://www.researchgate.net/publication/242329560_A_virtual_mart_for_knowledge_discovery_in_databases?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==

In Section II we briefly survey related work on the use of ontologies and metadata in Skill Management Systems, while Section III presents the design methodology, conceptualization and axiomatization of TeamOnto. Then, in Section IV relationships between TeamOnto and other existing parts of the Knowledge Base are established. Section V demonstrates how this modular and integrated Knowledge Base can be exploited for team building support in the context of KDD projects. Finally, Section VI concludes the paper.

II. RELATED WORK

Enterprises make frequent use of Skill Management Systems (SMS), in which skills are identified and tracked. SMS implement functionalities for retrieval of employers with certain characteristics. The most effective solutions are capable of keeping track of tasks and job achieved by an employer so to automatically update the corresponding skills. According to [2] the core competencies of an organization include tacit and explicit knowledge, and should be conceived of as a mix of skills and technologies. In this context, the concepts of knowledge and competence are closely related [3]. The knowledge domain in a skill management application can be structured in many ways: dictionaries, thesaurus, metadata, knowledge graphs, and ontologies, that nowadays are recognized as an effective tool for knowledge management. In the Literature, especially in Human Resources Management and Intellectual Capital Management fields, there exist several proposals dealing with skill representation and management that are based on the aforementioned knowledge representation structures. In this Section we have chosen to refer only to work based on ontologies for two main reasons: our goal is to extend the KDDVM knowledge base coherently with already developed modules, and we think that using reasoning mechanisms over ontologies can profitably support the team building task (these points are detailed in Sections IV and V respectively). As a matter of fact, inference over a skill ontology allows to discover implicit knowledge in a company, while explicit knowledge about employers is useful to allocate these skills. Besides this, ontologies can help to reconcile semantic and linguistic divergence that typically exists between identical information gathered from different sources or expressed by people with different background.

Many publications deal with a small set of skills, but only few of them exploit more detailed representations. The KOWIEN project [4] developed a taxonomy for skills, as well as SkiM system [5]. [6] proposes an ontology for evaluating skill-based human performance. [7] proposes a human resource ontology realized by means of the integration of existing standards and classifications, and a skill sub-ontology, derived from KOWIEN ontology, that defines concepts representing competencies and competence levels. Also [8] defines competence levels for comparing competency profiles of certain individuals with those of expected competencies in job offers, and also in competence management in universities.

More advanced SMS, moreover, offer skill detection, diagnosis of the missing skills needed for a certain task, planning of an educational process to support the acquisition of missing skills, and continuous monitoring of competencies. A first work in this direction is the People Capability Maturity

Model released by the Software Engineering Institute [9], initially introduced with the aim to support the improvement of the knowledge and competencies of its workforce, thus aligning performance with the business objectives of the organization. Similarly, in [10] authors identify typical skills metadata for the design of mechanical components. Then, such skills are exploited in a E-Learning Support System, with the aim to support designers to acquire the needed skills for mechanical design in a more efficient way. Within the same applicative domain, in [11] authors use a set of controlled vocabularies to describe the details of a job posting and the CV of a job seeker. The ontology is composed of two main modules, namely Job Offer and Job Seeker, and several support modules, among which: Competence, Education, Language, Occupation, Skill, Time. While the Education ontology represents knowledge of educational level and fields, the Skill ontology models knowledge of Information Technology (IT) skills and Organizational skills. It was built from scratch taking as reference the skills and abilities requested by the IT employment market. Support modules are used by the main ontologies to define a job in terms of its requirements, or a CV in terms of its features about previous jobs, competencies, education, spoken languages, and so forth.

As concerns application of skill ontology for team building, although much work has been devoted to the description of organizations and its components, few work propose their application specifically for KDD. [12] introduces a framework where organization ontologies are exploited to structure the activities of the Business Understanding phase and to integrate their outcomes with succeeding phases. To the best of our knowledge no other specific proposal for a KDD team ontology has been presented yet.

Besides domain, technical and organizational knowledge, it would be desirable to have at disposal a sort of psychological knowledge, in order to build up tools supporting the definition of human-centric interface and to personalize the collaboration system. The formalization of such knowledge is not considered in the present work as well as in other works related to KDD.

III. TEAM ONTOLOGY

Being a complex task, ontology engineering requires formal care and proper domain knowledge to be accomplished. In order to avoid errors and to obtain a valid and useful ontology, firstly the ontology goals and then the design methodology will be defined.

Defining the scope of this ontology consists in the identification of the most important concepts to be represented, and the adequate granularity level. TeamOnto is mainly aimed at supporting the team building by enabling the search of KDD experts and practitioners with specific needed skills. Important concepts to be taken into account include: personal details, the organizations in which they work, the KDD projects they joined in the past, scientific or technical publications they wrote.

According to several authors (e.g., [10]), a unique definition of skill is not feasible, as every field of study has a specific definition for it. Moreover, different applicative contexts lead to further distinctions.

46

https://www.researchgate.net/publication/221407647_Features_Missing_in_Action_Knowledge_Management_Systems?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==

https://www.researchgate.net/publication/4315037_Organization-Ontology_Based_Framework_for_Implementing_the_Business_Understanding_Phase_of_Data_Mining_Projects?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==

https://www.researchgate.net/publication/200772385_Nonaka_I_A_Dynamic_Theory_of_Organizational_Knowledge_Creation_Organization_Science_51_14-37?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==

https://www.researchgate.net/publication/228864162_Ontology_development_for_human_resource_management?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==

https://www.researchgate.net/publication/220348734_Ontology-Based_Skills_Management_Goals_Opportunities_and_Challenges?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==

https://www.researchgate.net/publication/220346320_Skill_Ontology_for_Mechanical_Design_of_Learning_Contents?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==

https://www.researchgate.net/publication/220346320_Skill_Ontology_for_Mechanical_Design_of_Learning_Contents?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==

https://www.researchgate.net/publication/2954722_Developing_organizational_competence?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==

For the purposes of this work, according to the mentioned goals of the KDDVM platform, we refer to “skill” as the ability to use a computational resource that takes part in a KDD process, e.g., a KDD web service or its abstract specification as an algorithm. Hence, a user has a skill about the C4.5 algorithm if she knows how to use it, including how to provide the input, how to setup its parameters and how to interpret its output. We assess such a skill both if the user has expertise on the algorithm and if she authored a publication about the algorithm at hand. Besides technical skills like those introduced, which are useful for the data manipulation phases of a KDD process, a user can have knowledge skills, useful in the initial business understanding and the final model interpretation steps. In this case either she is an expert about a business domain or she authored at least a publication about it.

Detailed information about non-central concepts of the ontology, like projects and computational resources, are already available in dedicated knowledge bases within the KDDVM platform. Therefore, mappings will be provided to link concepts in TeamOnto to those, as explained in Section IV.

A. Methodology Several methodologies for ontology development have been

proposed in the last 15 years. However, despite the growing popularity of semantic technologies, no widely accepted standard has emerged so far.

According to [13], to which we refer for a review of the soundest methodologies in the Literature, the reason is that none of the existing approaches cover all aspects of ontology engineering, and many issues are still to be faced. However, in many cases several overlaps can be recognized among such methodologies: indeed, they are usually made of interactive and iterative steps which, starting from the identification of main representative terms of the domain, lead to the final ontology through step-wise refinements.

Early proposals, e.g. [15], [16] and [17], provided the theoretical ground and principles for knowledge engineering in the context of domain ontologies. CommonKADS [16] is mainly focused on the corporate knowledge management, for the design and implementation of knowledge-intensive information systems, rather than ontology engineering. Instead, Enterprise Ontology [15], which influenced much work in ontology community, represents one of the first attempts to define a sound development methodology. In [17], authors propose some criteria specifically conceived for ontology design, which are based on concepts and theories derived from logics, linguistics and analytical philosophy areas. The main idea is to realize an ontology as an axiomatic theory, i.e., a set of logical axioms formalizing relationships among domain concepts, capable to clarify and disambiguate the meaning of terms, and to support the integration and comparison of distinct ontologies. An important contribution of [17] is the introduction of a set of quality requirements that a formal ontology should satisfy:

• coherence: the ontology must be intrinsically non-contradictory;

• clarity: the ontology should effectively communicate the intended meaning of defined terms in a non-ambiguous fashion;

• extensibility: the ontology should be extensible for satisfying new goals without a revision of the existing definitions;

• minimization of ontological commitment: an ontology should define only those terms that are essential to the communication of knowledge consistent with that theory.

It has to be noted that [17] only gives the design criteria to build an ontology, and does not define any development step, except for first individuating “primitive” concepts, from which every other domain concept is then derived. Some of the contributions, like [14] and [18] are specifically conceived for knowledge engineering, while some others [19] are derived from software engineering field. In more detail, METHONTOLOGY [14] is a complete methodology to build ontologies from scratch, which defines the steps to be taken to perform each activity: specification, conceptualization, formalization, integration, implementation, and maintenance. Other phases, such as quality assurance, are carried out during the lifecycle, which is based on evolving prototypes. OTK (On-To-Knowledge) methodology [18], which partially comes from [16] refined through feedback from industries, has a special emphasis on knowledge maintenance and is characterized by a strong iterative cycle among the refinement/evaluation phases; UPON [19] is based upon Unified Software Development Process and is supported by UML (Unified Modeling Language).

Our approach for the building of TeamOnto is based on a blend of the approaches described in [14] and [18] where the previously introduced quality requirements [17] are taken into account. The high-level steps of the employed methodology are the following:

• Concepts identification and definition: the identification of the primitive concepts of domain and to provide as accurate as possible definitions of concepts in natural language. The output of such a phase is the glossary of terms describing domain concepts;

• Building of a conceptual schema: concepts are represented through classes and relations. Each concept definition should be represented, if possible, through axioms and formal statements. At the end of this phase the axiomatic theory is provided;

• Implementation: the goal of this phase is the translation of axioms in a machine readable language, on which automatic inference may be applied. The chosen language is OWL, which is enough expressive for representing the domain at hand. However, an excessive expressiveness could lead to time-consuming inferential procedures or even to undecidable inferences. For such a reason, the OWL-DL sublanguage is needed to balance between expressiveness and decidability.

47

https://www.researchgate.net/publication/2626138_Toward_Principles_for_the_Design_of_Ontologies_Used_for_Knowledge_Sharing?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==





https://www.researchgate.net/publication/220504147_A_software_engineering_approach_to_ontology_building?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==

https://www.researchgate.net/publication/220504147_A_software_engineering_approach_to_ontology_building?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==

https://www.researchgate.net/publication/2601385_Gruninger_M_Ontologies_Principles_methods_and_applications_Knowledge_Eng_Rev_112_93-155?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==

https://www.researchgate.net/publication/2601385_Gruninger_M_Ontologies_Principles_methods_and_applications_Knowledge_Eng_Rev_112_93-155?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==

https://www.researchgate.net/publication/220628822_Knowledge_Processes_and_Ontologies?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==



https://www.researchgate.net/publication/220628697_CommonKADS_A_comprehensive_methodology_for_KBS_development?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==



https://www.researchgate.net/publication/50236211_METHONTOLOGY_from_ontological_art_towards_ontological_engineering?el=1_x_8&enrichId=rgreq-0cb925d5-391e-4062-a881-8727b8ea92bb&enrichSource=Y292ZXJQYWdlOzIzNTc5MDM2MjtBUzoxMDY2MTQ5NTg4NTQxNDhAMTQwMjQzMDM5NTU3Nw==



Figure 1. TeamOnto classes and relations.

At the end of each step the satisfaction of functional and non-functional requirements is verified.

B. Conceptual Definition Being aimed at team building, the central element in

TeamOnto is person, which is described according to several dimensions related to skills and competencies. Hence, the key concepts in the ontology are the following:

• Person: a user that takes part in a team, in order to work together in a project, or that authors an algorithm, a web service or a publication. A person refers to an organization, and has skills on some algorithms and web services.

• Business Domain: an applicative domain that a person knows, an organization focuses in, a project or a publication is about.

• Organization: a private business or a public structure dealing with a specific business domain, in which some persons work.

• Algorithm: an algorithm suitable for a specific KDD task; it is described by its name and a description. Each algorithm may be implemented by one or more web services (see Subsection IV.A);

• Web Service: a web service for a KDD task. It implements a specific algorithm and is described by its WSDL’s URL, a name and a description.

• Project: a project is the collaborative development of a KDD process, undertaken by a team of persons, which is executed over data referring to a business domain. A project is characterized by a name, a description, a starting/ending date, an optional URI to an external descriptive document.

• Publication: every public virtual/printed issue describing an algorithm, a web service, or a business domain, written by some persons. It is defined by a URI referring to the corresponding document, a title, an abstract, a set of keywords, the publication date, and a typology.

C. Axiomatization Representing concepts through axioms helps to make the

model clearer, and also enables to represent the schema with a formal language. In this way deductive inference can be performed on the model, with the aim to extract hidden knowledge and to check consistency and its non-contradictory nature. Hereafter descriptions are given in First-order Logic syntax.

Analyzing concepts definition given in the previous Section, it is straightforward to recognize classes and relations of the ontology. First, we identify the primitive classes, namely classes that cannot be defined in terms of others. They correspond to the above described 7 key concepts:

Person(x)→ Object(x) Domain(x)→ Object(x) Organization(x)→ Object(x)

48

Algorithm(x)→ Object(x) WebService(x)→ Object(x) Project(x)→ Object(x) Publication(x)→ Object(x)

The schema of the TeamOnto is fully given when relations between classes are introduced. For lack of space, here we describe only relations involving the class Person, which is the core class of the TeamOnto. The full schema is shown in Figure 1.

• worksIn links a person to the organization where she works:

∀ x,y.Person(x)∧ worksIn(x,y)→Organization(y)

• writes links a person to the instances of the class Publication she has authored:

∀ x,y.Person(x)∧ writes(x,y)→Publication(y)

• memberOf represents the membership of a person to a project team. Since for each project a process (and its versions) exists, this relation links persons to processes as well. It concerns both existing and past projects:

∀ x,y.Person(x)∧ memberOf(x,y)→Project(y)

• knows is used to represent the skills that a person has at the moment she registers at the system, that is when a new instance is created in the ontology. Skills are referred to business domains, algorithms as well as web services:

∀ x,y.Person(x)∧ knows (x,y)→Domain(y) ∨ Algorithm(y) ∨ WebService(y)

• uses describes the skills that a person acquires

participating in KDD project teams; for instance, such relation links the person to algorithms and web services that she has chosen to insert during the design of the process, or whose execution’s management she has been responsible for:

∀ x,y.Person(x)∧ uses(x,y)→ Algorithm(y)∨ WebService(y)

• develops identifies the person who has contributed in

the development of a given algorithm or a given web service. Note that such relation assigns a level of expertise on an algorithm (or web service) higher than the one given by the knows relation:

∀ x,y.Person(x)∧ develops(x,y)→Algorithm(y)∨ WebService(y)

Other interesting relations are:

• about, which links an instance of Publication to the instance of Domain, of Algorithm and of WebService, it is about.

• studies, which associates a project with a domain.

• implements, linking a web service with the implemented algorithm. The inverse of such relation is implementedIn.

Subclasses can be described as derived by the above defined relations and primitive classes. For instance, persons are further specified in authors of a publication, developers of and experts about an algorithm or a web service, according to the kind of relations they are involved in:

Author(x)→Person(x) ∀ x,y. Person(x)∧ Publication(y)∧ writes(x,y)→ Author(x) Developer(x)→Person(x) ∀ x,y. Person(x)∧ develops(x,y)∧ (Algorithm(y)∨ WebService(y))→Developer(x) Expert(x)→ Person(x) ∀ x,y. Person(x)∧ (knows(x,y)∨ uses(x,y))∧ (Algorithm(y)∨ WebService(y))→Expert(x)

Note that a person is an expert if she knows or has used a specific algorithm or service.

The taxonomy is further expanded by introducing subclasses of Author: DomainAuthor, AlgorithmAuthor and WebServiceAuthor; according to what the publication is about. For lack of space we show only one of these:

AlgorithmAuthor(x)→Author(x) ∀ x,y,z. Person(x)∧ Publication(y)∧ writes(x,y)∧ Algorithm (z)∧ about(y,z)→AlgorithmAuthor(x)

Similarly, AlgorithmExpert and WebServiceExpert are subclasses of Expert; AlgorithmDeveloper and WebServiceDeveloper are subclasses of Developer.

IV. DETAILING TEAM KNOWLEDGE

TeamOnto describes the skills of actual and possible users of the platform in terms of the algorithms and web services they know and processes they are involved in. Knowing the properties of algorithms, services and processes allows us to associate such properties to user’s skills and, hence, to have a richer user characterization and a more sophisticated team building functionality. In order to enrich the description of team skills we link TeamOnto to other knowledge sources developed in previous work, de-facto making the TeamOnto a wide, modular and integrated knowledge base. Here we briefly survey information managed about algorithms, services and processes that are relevant for the present work. For more details we refer the interested reader to [1].

• Algorithms are described in a dedicated ontology called KDDONTO, where abstract properties of a computational resource are formalized. The basic structure of the ontology contains a categorization of algorithms that can be exploited for KDD, e.g. algorithms that can be used for pre-processing data, for

49

classification, for clustering and so forth. In turn, classification and clustering are seen as special types of Data Mining task. Properties of algorithms are reported: among the most important we recall the kind of input needed and the output produced, the preconditions that allow the usage of the algorithm, and the characteristics of every input and output data.

• Web services are concrete (i.e. executable) computational resources that implement specific algorithms. According to Web service standards, their technical description is given by a WSDL document. We extended the WSDL structure into an eSAWSDL to accommodate specific information about the semantics of input and output data. eSAWSDL are indexed by a UDDI registry, whose records contain additional information like the organization responsible for its publication, and some QoS indexes, enabling discovery. A link to KDDONTO allows also to implement more advanced, semantic discovery capabilities.

• Processes are workflows whose activities are Web Services. About processes we maintain information on its schema, execution logs, and versions. We distinguish between design versions, where the schema is altered, and execution versions, where the same schema is executed with different parameter settings. Executions logs contain information about the date of execution, and performance like execution time and quality of the discovered model. Processes are organized in a lattice describing the most frequent and representative sub-processes. This enables efficient search and the representation of significant usage patterns.

Figure 2 shows the links occurring among modules of the whole knowledge base, and in particular between TeamOnto and other knowledge sources. Instances of the Algorithm, WebService and Project classes refer respectively to an instance of the algorithm class of KDDONTO, a record in the UDDI registry and an element of the process repository. Furthermore, each datum described in every eSAWSDL document, and referring to a concept of the KDD domain, is annotated by linking it to the related KDDONTO concept. The link between UDDI and KDDONTO allows us to assign a web service to the algorithm it implements.

Following the paths in the figure one can satisfy several information needs for user skills discovery and team building. In the next Section we illustrate such capability by means of a case study.

Some considerations have to be done on the population of the TeamOnto. Characterization of persons occurs by direct and indirect interactions with the Knowledge Base. By direct interactions we mean all the activities devoted to insert or update instances in TeamOnto by directly accessing the ontology. These include for example the introduction of a new person with her affiliation and skills, or updates due to some changes occurred during her life inside the platform. The introduction of new instances can cause the inconsistency of the whole knowledge base, when the new instance belongs to

the WebService or Algorithm classes. In fact in these cases the new skill could refer to a service or technique that are not yet present in the UDDI registry or in KDDONTO. Proper integrity enforcing actions have to be established, like requesting to deploy the declared service, or starting a procedure of ontology evolution to cope with this issue. On the contrary, the introduction of a new instance of Project will not cause inconsistencies since it is assumed that a new project have no process associated with it.

Figure 2. KDDVM Knowldege Base

Indirect interaction refers to all updates that take place as a consequence of the interactions of people inside the platform. For example, the design of a new process will cause the insertion of new uses relation instances between a service involved in the process and the actor who has chosen it.

Integrity enforcing actions and indirect interaction has not been implemented yet, and will be the subject of future work.

V. QUERYING THE KNOWLEDGE BASE

In the following we show some queries performed on the knowledge base that demonstrate how it can support the team building task. These queries are basic bricks that can be composed to respond to more complex information needs. It is to be noted that some of this information could not be obtained without a semantic support.

We start from the following scenario: let us suppose that a new team has to be constituted to manage a KDD project in the marketing domain, and in particular for an email marketing campaign. The project responsible is a marketing expert, but she does not have any experience on the use of KDD for marketing. For lack of space, details of some of these queries have been omitted.

1. Find people who are expert in KDD and have previously worked in email marketing or similar projects:

Person(x) ∧ memberOf(x,y) ∧ Project(y) ∧ studies(y,z) ∧ (equal(y,”em”) ∨ subcl(”em”,y))

Here subcl is a short for subsumption reasoning.

50

2. Find which kind of data mining techniques can be exploited for email marketing (the expected result can be: clustering, association rules). This query requires to:

a. find email marketing projects:

Project(x) ∧ studies(x,y) ∧ (equal(y,”em”) ∨ subcl(”em”,y))

b. find services used in these projects:

WebService(y) ∧ applies(x,y)

c. Output the task of data mining algorithms implemented by these services:

implements(y,z) ∧ Algorithm(z) ∧ KDDONTO.specifies_phase(z,”data_mining”) ∧ KDDONTO.has_task(z,w)

Note that in step c the prefix KDDONTO behaves as a namespace in Semantic Web languages, demonstrating the integrated use of different modules of the Knowledge Base.

3. For a given service A, find services to be executed before (and after), and people who are experts of their use in order to complete the team:

implements(A,x) ∧ KDDONTO.match(x,y) ∧ implements(z,y) ∧ (uses(p,y) ∨ uses (p,z))

Here KDDONTO.match is an advanced reasoning procedure already implemented over the KDDONTO which performs a similarity match of inputs of an algorithm with the outputs of another in order to assess their composability.

By using KDDONTO.match, query 3 exploits the a-priori knowledge compiled by experts in KDDONTO about composability. The exploitation of actual executed processes allows to gain a different notion of composability, with respect to the best practices of the community of experts. As a matter of fact, we can search the process repository to find the most frequent subprocesses being used in the email marketing domain and containing the given service A. In alternative, all frequent subprocesses in the email marketing domain can be extracted, when the project responsible has no special clues about the algorithms to use. The set of services can be used as starting point in a query similar to 3 to find experts of each service and complete the team.

VI. CONCLUSIONS

The paper presented an approach to team building in collaborative KDD. The approach is based on the definition of a knowledge base describing team capabilities as well as computational resources involved in a KDD project. In particular the paper focuses on steps to design TeamOnto, the ontology for team characterization. The integration of TeamOnto with other modules of the Knowledge Base is also discussed. We have shown some representative queries demonstrating the expressive power of the Knowledge Base in

responding to major information needs for team building during the development of a KDD project.

We believe that the results presented in this work can be generalized to define skill ontologies in other e-Science domains as well as in every domain where the notion of algorithm can be substituted by the notion of activity and activities can be at least partially implemented by software tools. We plan to investigate this extension in future work and to compare the resulting methodology with similar approaches in the Literature.

REFERENCES [1] C. Diamantini, D. Potena, and E. Storti, “A virtual mart for knowledge

discovery in databases,” Journal of Information Systems Frontiers, in press.

[2] I. Nonaka, “A dynamic theory of organizational knowledge creation,” Organization Science, vol. 5, no. 1, pp. 14–37, 1994.

[3] R. Lindgren and C. Wallström, “Features missing in action: knowledge management systems in practice,” in Proc of the 8th European Conference on Information Systems, Trends in Information and Communication Systems for the 21st Century, Vienna, Austria, pp. 701–708, July 2000.

[4] L. Dittmann and S. Zelewski, “Ontology-based skills management,” in Proc. of the 8th World Multi-conference on Systemics, Cybernetics and Informatics, vol. 4, pp. 190-195, 2004.

[5] J. R. Reich, P. Brockhausen, T. Lau, and U. Reimer, “Ontology-based skills management: goals, opportunities and challenges,” Journal of Universal Computer Science, Vol. 8, No. 5, pp. 506-515, May 2002.

[6] K. Noda, “Towards a representational model of evaluation ontology,” in Proc. of International Symposium on Large Scale Knowledge Resources, pp. 159-160, 2006.

[7] C. Bizer, R. Heese, M. Mochol, R. Oldakowski, R. Tolksdorf, and R. Eckstein, “The impact of semantic web technologies on job recruitment processes,” in International Conference Economical Informatics, 2005.

[8] J. Dorn, T. Naz, and M. Pichlmair, “Ontology development for human resource management,” in Proceedings of the 2007 International Conference on Knowledge Management, pp. 109-120, 2007.

[9] B. Curtis, W. E. Hefley, S. Miller, and M. Konrad, “Developing organizational competence,” Computer, Vol. 30, No. 3, pp. 122-124, March 1997.

[10] J. Zhou and K. Watanuki, “Skill ontology for mechanical design of learning contents,” in Proc. of 2010 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT), pp. 295-299, 9-10 July 2010.

[11] A. Gómez-Pérez, J. Ramírez, and B. Villazón-Terrazas, “An ontology for modelling human resources management based on standards,” in Proc of the 11th international conference KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I, Lecture Notes in Computer Science, Springer, Vol. 4692, pp. 534-541, 2007.

[12] S. Sharma and K. Osei-Bryson, “Organization-ontology based framework for implementing the business understanding phase of data

51

mining projects,” in Proceedings of the 41st Annual Hawaii International Conference on System Sciences, Washington, DC, USA, p. 77, 2008.

[13] Y. Sure, C. Tempich, and D. Vrandecic, “Ontology engineering methodologies,” in SEMANTIC WEB TECHNOLOGIES: TRENDS AND RESEARCH IN ONTOLOGY-BASED SYSTEMS, J. Davies, R. Studer and P. Warren, Eds., John Wiley & Sons, Ltd, Chichester, UK, pp. 171-190, 2006.

[14] M. Fernandez, A. Perez, and N. Juristo, “METHONTOLOGY: from ontological art towards ontological engineering,” in Proc. of the AAAI97 Spring Symposium Series on Ontological Engineering, pp. 33-40, March 1997.

[15] M. Uschold and M. Gruninger, “Ontologies: principles, methods and applications,” The Knowledge Engineering Review, Vol. 11 , pp. 93-136, 1996.

[16] G. Schreiber, B. Wielinga, R. de Hoog, H. Akkermans, and W. Van de Velde, “CommonKADS: a comprehensive methodology for KBS development,” IEEE Expert, Vol. 9, No. 6, pp. 28-37, 1994.

[17] T. Gruber, “Toward principles for the design of ontologies used for knowledge sharing,” International Journal Human-Computer Studies, Vol. 43, No.5-6, pp. 907-928, 1995.

[18] S. Staab, R. Studer, H. Schnurr, and Y. Sure, “Knowledge processes and ontologies,” IEEE Intelligent Systems, Vol. 16, No. 1, pp. 26-34, 2001.

[19] A. DeNicola, M. Missikoff, and R. Navigli, “A software engineering approach to ontology building,” Information Systems, Vol. 34, pp. 258-275, 2009.

52

Semantically-supported team building in a KDD virtual environment

Documents