Top Banner
Politecnico di Milano Dipartimento di Elettronica e Informazione Searching Repositories of Web Application Models Alessandro Bozzon, Marco Brambilla, Piero Fraternali ICWE 2010 Vienna, July 7th 2010
29

Searching Repositories of Web Application Models

May 06, 2015

Download

Technology

Marco Brambilla

Project repositories are a central asset in software development, as they preserve the technical knowledge gathered in past development activities. However, locating relevant information in a vast project repository is problematic, because it requires manually tagging projects with accurate metadata, an activity which is time consuming and prone to errors and omissions. This paper investigates the use of classical Information Retrieval techniques for easing the discovery of useful information from past projects. Differently from approaches based on textual search over the source code of applications or on querying structured metadata, we propose to index and search the models of applications, which are available in companies applying Model-Driven Engineering practices. We contrast alternative index structures and result presentations, and evaluate a prototype implementation on real-world experimental data.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Searching Repositories of Web Application Models

Politecnico di MilanoDipartimento di Elettronica e Informazione

Searching Repositoriesof Web Application Models

Alessandro Bozzon, Marco Brambilla, Piero Fraternali

ICWE 2010Vienna, July 7th 2010

Page 2: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Context

•Project repositories are a central asset in (Web) software development

• they preserve the technical knowledge gathered in past development activities

• repositories now overcome the boundaries of individual organizations and have a social role in the diffusion of coding and design solutions

• they allow for reuse of knowledge and artifacts

•Locating relevant information in a vast project repository is problematic

• Two options

1. Manual tagging time consuming and prone to errors, omissions and incoherencies

2. Automatic analysis a lot of semantic can be lost in the process

Page 3: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Addressed Problem

•Objective: easing the discovery of useful information from past software projects

•Main resource: application models

•available in companies applying Model-Driven Engineering practices

• In contrast to existing solutions, that mainly focus on discovery of code, documentation, and annotations

•Why dealing with application models is an advantage?• Increased result quality (thanks to the more valuable

information embedded in models wrt to the code)

• Less need for manual tagging

Page 4: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Related work: Component search

•Retrieval of annotated pieces of software dates back to the '90s. Various approaches:

•worldwide search engine based on JavaBeans and Corba [Agora, Internet Computing, 1998]

•Search engines for Web services based on indexed Vector Space Model characterization of their properties [Dustdar et al., ECWS 2005]

•Significance based search that exploits graph models of a software component library (usage relations used as links propagating significance) [Inoue et al., TOSEM 2005]

• Combination of formal and semi-formal specification to describe behaviour and structure of components [Khalifa et al., ASEA 2008]

Page 5: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Related work: Source code search• Several communities and on-line tools for sharing and

retrieving code: Google code, Snipplr, Koders, Codase, Jexamples, SourceForge

•Keyword queries directly matched to the code

• Results are the exact locations where the keyword(s) appear

• Plus advanced behaviours: regular expressions (Google), wildcards (Codase), restriction to specific concept types (Jexamples, Codase), advanced ranking, e.g., based on rank results based on relevance of match, activity, date of registration, recency of last update (SourceForge)

• Other approaches:

• Information retrieval techniques for software reuse [Frakes et al., SIGIR Forum 1987]

• taking advantage of code structural information [Holmes and Murphy, ICSE 2005] and [Sourcerer Project by Bajracharya et al., SUITE ICSE workshop 2009]

Page 6: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Related work: Model search

• The problem is usually restricted to Searching UML or ER models

• XML / XMI format for indexing seamlessly UML models, text files, and others [Gibb et al., 2000] [Lorens et al., 2004] [Moogle: Lucredio et al., Models 2008]

• UML artifacts classified with WordNet terms and extracted though Case-Based Reasoning [Gomes et al., AI Comm., 2004]

• database conceptual model retrieval based on text search, schema matching, and structurally-aware scoring methods, with queries by example and keword-based [Schemr: Chen, Halevy, SIGMOD09]

• IR techniques applied to models and code together, for tracing the association between requirements, design artifacts, and code [Antoniol et al., 2000] […]

Page 7: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Related work: Business Process Discovery

• Different approaches to extraction of BP models from repositories

• Based on the workflow topology only: graph-based comparison or XML-based querying [Beeri et al., VLDB 2006] [Lu et al., BPM 2006] [Shao et al., ICDE 2009]

• Based on semantic reasoning and discovery, using SPARQL, query by example, SQL-like languages, and so on [Kiefer et al., ESWC 2007] [Goderis et al., ICWS 2006] [Awad et al., EDOC 2008] [Zhuge 2002] [Belhajjame, Brambilla, BPMDS 2009]

• Based on IR techniques [Dongen, Dijkman et al., Caise 2008]

Page 8: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Our contribution

A model-based search solution, with several innovations: • it automatically exploits the semantics from the searched

conceptual models• It does not require manual annotation• it supports alternative indexing and ranking functions, based of

the meta-model of the considered DSL(s) • it is based on a model-independent framework, which can be

customized to any meta-model

User study to evaluate acceptance and the quality perceived by users

Performance tests to evaluate scalability

Page 9: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Overall Architecture of the System

Content processing

QUERY (keyword)

RESULTS

SegmentationProject

repository

Indexing

Segment analysis

Linguistic normalization

Query processing

RESULTS

Search User Interface

QUERY(keyword)

Project analysisDSL Metamodel

Metadata

Index Search Engine

Content processing flow L

egen

d Query and result presentation flow

En

gin

eerin

g W

eb

Search

Ap

plica

tion

Bozzo

n, B

ram

billa

, Tu

toria

l @IC

WE

20

10

Page 10: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Overall Architecture of the System

Content processing

QUERY (keyword)

RESULTS

SegmentationProject

repository

Indexing

Segment analysis

Linguistic normalization

Query processing

RESULTS

Search User Interface

QUERY(keyword)

Project analysisDSL Metamodel

Metadata

Index Search Engine

Content processing flow L

egen

d Query and result presentation flow

The Content Processing Flow extracts meaningful information from projects and uses it to create the search

engine index.1. CONTENT PROCESSING • project analysis captures project-level, global metadata • segmentation splits the project into smaller units • segment analysis extracts from segments the information to be indexed • linguistic normalization applies the typical normalization operations of IR

Page 11: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Overall Architecture of the System

Content processing

QUERY (keyword)

RESULTS

SegmentationProject

repository

Indexing

Segment analysis

Linguistic normalization

Query processing

RESULTS

Search User Interface

QUERY(keyword)

Project analysisDSL Metamodel

Metadata

Index Search Engine

Content processing flow L

egen

d Query and result presentation flow

The Content Processing Flow extracts meaningful information from projects and uses it to create the search

engine index.2. INDEXING• each project or segment is physically represented as a document• the search engine indexes are built based on the documents• the DSL metamodel is taken into account

Page 12: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Overall Architecture of the System

Content processing

QUERY (keyword)

RESULTS

SegmentationProject

repository

Indexing

Segment analysis

Linguistic normalization

Query processing

RESULTS

Search User Interface

QUERY(keyword)

Project analysisDSL Metamodel

Metadata

Index Search Engine

Content processing flow L

egen

d Query and result presentation flow

The query and result presentation Flow deals with the submitted

queries and the production of the result set. 1. USER INTERFACE supports

• Keyword-based queries• Content-based queries (aka QBE)• Rendering of the results

Page 13: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Overall Architecture of the System

Content processing

QUERY (keyword)

RESULTS

SegmentationProject

repository

Indexing

Segment analysis

Linguistic normalization

Query processing

RESULTS

Search User Interface

QUERY(keyword)

Project analysisDSL Metamodel

Metadata

Index Search Engine

Content processing flow L

egen

d Query and result presentation flow

The query and result presentation Flow deals with the submitted

queries and the production of the result set. 2. QUERY PROCESSING

• matches the query to the indexed content using a given similarity criteria• produces ranked results

Page 14: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Design Dimensions of Model Retrieval (1/2)

Segmentation Granularity: the “size” of atomic unit of retrieval for the user

• Project

• Sub-project

• Model concepts (all or only the main ones)

Index structure: one or more fields (associated with an boosting score)

• Flat: a simple list of terms without taking into account model semantics• Weighted: model concepts used to weight terms in the ranking• Multi-field: terms belonging to

different model concepts are collected into separate fields

• Structured: the model is translated into a representation that reflects the hierarchies and associations among concepts

Page 15: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Example of Model Indexing

Metamodel Model Model XML Representation

Product Catalogue Catalogue Home Page List Products List of product in the catalogue View Details Details of a

selected product

HYPERTEXT MODEL

1

ID

Product Catalogue Application

PROJECT NAME

Multi-Field

Page 16: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Example of Model Indexing

Metamodel Model Model XML Representation

Product|2.0 Catalogue|2.0 Catalogue|1.0 Home|1.0 Page|1.0 List|0.5 Products|0.5 List|0.2 of|0.2 products|0.2 in|0.2

the|0.2 catalogue|0.2 View Details Details of a selected product

HYPERTEXT MODEL

1

ID

Product Catalogue Application

PROJECT NAME

Multi-Field, Weighted Index

2.0

1.0 0.5

Page 17: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Design Dimensions of Model Retrieval (2/2)

Query Language and Result Presentation: the way queries and results are presented.

• Keyword-based search

• Document-based search: the system extracts the most significant words and submits them as a query

• Search by example: the query is a model, analyzed and matched by similarity

• Faceted search: exploration using facets (i.e., property-value pairs) extracted from the indexed documents

• Snippet visualization: with the matching points highlighted in graphical or textual form

Page 18: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Our model-based search engine prototype

General purpose, model-independent, configurable system:

Configuration of a general purpose search engine according to the selected design dimensions

metamodel-aware rules to analyze models and populate the index segmentation and text-extraction steps model

transformation rules

Offline collection analysis compute statistics for fine-tuning the retrieval and ranking

Stop Domain Concept removal

optimization of the weights assigned to each model concept

Provides a visual interface to perform queries and inspect results.

Content processing has been implemented by extending the text processing and analysis components provided by Apache Lucene

Page 19: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Detailed indexing process

Page 20: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Experiment Settings - Dataset

48 real-world WebML projects from WebRatio

• trouble ticketing, human resource management, multimedia search engines, Web portals, etc.

• Italian and English

•~ 250 Modeling Concepts

•3,800 data model entities (with about 35,000 attributes and 3,800 relationships)

•138 site views with about 10,000 pages and 470,000 units, and 20 Web services.

•The overall repository takes around 85MB of disk space

Page 21: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Experiment settings - Configurations

• 3 different settings of the design dimensions: A, B, C• A flat index structure;

• B and C multi-field weighted (projectID, projectName, documentType, text)

Option Description A B C

Segmentation Granularity

Project Entire project X

Sub-project Subproject X X

Single-Concept Arbitrary model concepts X X

Index Structure

Flat Flat list of words X X

Weighted Words weighted by the model concepts they belong to

X

Multi-field Words belonging to each model concept in separate fields

X X

Query Language and Result Presentation

Keyword-based

Query By Keywords X X X

Faceted Query refined through specific dimensions X X X

Snippets Visualization and exploration of result previews X X X

Page 22: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Experiment C – model-based scoring function

• Experiments A and B exploit a traditional TF-IDF ranking function

• Experiment C exploits the DSL metamodel

•mtw(m, t) : Model Term Weight, a metamodel specific boost that depends on the concept m containing the term t

•dw(d) : Document Weight, a metamodel and model-specific boosting value that expresses the importance of a given document (according to the selected granularity)

Page 23: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

User Interface

(A) Rendered result set and facets

(B) Snippet window with highlighted matches

Page 24: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

User Evaluation – Perceived Quality

• User study has been conducted with 5 expert WebML designers to assess the quality and perception of alternative configurations

• Users rated the results in the result sets,

• Votes ranged from 1 (highly inappropriate) to 5 (highly appropriate)

• Experiment B and C got more votes in high range of the scale

• Success Factor:

• Injecting the semantic

of the meta-model

Page 25: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

User Evaluation - Acceptance

• Users were asked 10 questions about the features of the application

• Votes ranged from 1 (bad) to 5 (good)

Avg.

Var..

Features

Keyword Search 3.6 0.24

Search Result Ranking 3.2 0.16

Faceted Search 3.8 0.16

Match Highlighting 3.6 0.24

Application

Help reducing the maintenance costs 3.2 0.56

Help improving the quality of the delivered application?

3.0 0.4

Help understanding the model assets in the company?

4.4 0.24

Help providing better estimates for future application costs?

2.8 0.56

Wrap Up

Overall Evaluation of the system 4.0 0.4

Would you use the system in your activities? 3.0 1.2

useful for model maintenance and reuse

role in improving the quality of the applications

• a certain distance between the overall judged quality and the adoption likelihood• But there is a bias due

to the lack of a graphical viewer

Page 26: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Performance Evaluation - Query Time

• About 400 2-terms and 3-terms randomly generated keyword queries

• Each query has been executed 20 times

• Query time is abundantly sub-second and curves indicate a sub-linear growth

• The addition of Faceted Search and Snippet Visualization impacts heavily

• with the number of inde

• NOSeg: No Segmentation • Seg: Segmentation• KS: Keyword Search• FS: Faceted Search• Snip: Snippet

Page 27: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Performance Evaluation - Index Size

• Size grows almost linearly with the number of projects in all configurations

• Baseline configurations feature index sizes about 10 times smaller than the repository size

• Faceted Search doubles the index size

• NOSeg: No Segmentation • Seg: Segmentation• KS: Keyword Search• FS: Faceted Search• Snip: Snippet

Page 28: Searching Repositories of Web Application Models

A. Bozzon, M. Brambilla, P. Fraternali. Searching Repositories of Web Application Models

Conclusions and future directions

• A metamodel-aware approach and a system prototype for searches over model repositories

• Scalability tests and user studies in different experimental settings

• Future works:

• Integration of content-based search

• Improve result visualization: integration in the WebRatio tool-suite for WebML and visual highlighting of the matches in the projects

• Adaptive fine-tuning for improving precision and recall

• Experiments with more modeling languages (e.g. BPMN)

• Definition of generic benchmark criteria for model-driven repository search

Page 29: Searching Repositories of Web Application Models

Thanks!

Questions?

Alessandro BozzonMarco Brambilla Piero Fraternali

[email protected]

?Searc

hin

g R

eposi

tori

es

of

Web A

pp

licati

on M

odels