Top Banner
Software Connector Classification and Selection for Data- Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl. Workshop on Incorporating COTS Software into Software Systems
21

Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Software Connector Classification and Selection for Data-Intensive

Systems

Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian

2nd Intl. Workshop on Incorporating COTS Software into Software Systems (IWICSS 2007)

Page 2: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Agenda

• Research Problem and Importance• Our Approach

– Classification– Selection– Analysis

• Evaluation– Precision, Recall, Accuracy Measurements

• Related Work• Conclusion & Future Work

Page 3: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Research Problem and Importance

• Content repositories are growing rapidly in size

• At the same time, we expect more immediate dissemination of this data

• How do we distribute it…– In a performant manor?– Fulfilling system

requirements? ?NASA Planetary Data System

Archive Volume Growth

0

10

20

30

40

50

60

70

80

90

1990 1992 1994 1996 1998 2000 2002 2004 2006 2008

Year

TB (Accum)

TBytes

Page 4: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Software Architecture

• The definition of a system in the form of its canonical building blocks– Software Components: the computational units in the system– Software Connectors: the communications and interactions

between software components– Software Configurations: arrangements of components and

connectors and the rules that guide their composition

Page 5: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Data Distribution Systems

Data Producer

Data ConsumerData ConsumerData ConsumerData Consumer

data

???

data

Connector

Insight: Use Software Connectors to model data distribution technologies

ComponentComponent

Page 6: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Data Movement Technologies

• Wide array of available OTS “large-scale” connector technologies– GridFTP, Aspera software, HTTP/REST, RMI,

CORBA, SOAP, XML-RPC, Bittorrent, JXTA, UFTP, FTP, SFTP, SCP, Siena, GLIDE/PRISM-MW, and more

• Which one is the best one?• How do we compare them

– Given our current architecture?– Given our distribution scenarios & requirements?

Page 7: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Research Question

• What types of software connectors are best suited for delivering vast amounts of data to users, that satisfy their particular scenarios, in a manner that is performant, scalable, in these hugely distributed data systems?

Page 8: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Data Distribution Problem Space

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 9: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Broad variety of distribution connector families

• P2P, Grid, Client/Server, and Event-based

• Though each connector family varies slightly in some form or fashion– They all share 3 common atomic connector

constituents• Data Access, Stream, Distributor• Adapted from Mehta et al.’s Connector

Taxonomy

Page 10: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Connector Tradeoff Space

• Surveyed properties of 13 representative distribution connectors, across all 4 distribution connector families and classified them– Client/Server

• SOAP, RMI, CORBA, HTTP/REST, FTP, UFTP, SCP, Commercial UDP Technology

– Peer to Peer• Bittorrent

– Grid• GridFTP, bbFTP

– Event-based• GLIDE, Sienna

Page 11: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Large Heterogeneity in Connector Properties

Procedure Call Connector Breakdown (5 connectors, 2 families)

0

1

2

3

4

5

6

HTTP ResponseRMI message

GridFTP messageSOAP messageCORBA message

one senderMethod Call

Globus Log LayerHTTP Server logRMI Registry

CORBA Name Registry

Web Server

valuereference

publicprotected

private

one receiverkeyword

Num Connectors

proc_call_params_return_valueproc_call_cardinality_sendersproc_call_invocation_explicitproc_call_params_invocation_recordproc_call_params_datatransferproc_call_accessibilityproc_call_semantics

Data Access Connector Breakdown (8 Connectors, 4 families)

0

1

2

3

4

5

6

7

8

9

ProcessGlobal

Dynamic Data Exchange

Database AccessRepository Access

File I/O

Session-Based

Cache

Peer-Based

Many ReceiversOne Receiver

AccessorMutator

Many SendersOne Sender

Num Connectors

data_access_localitydata_access_persistencedata_access_avail_transientdata_access_cardinality_receiversdata_access_accessesdata_access_cardinality_senders

Distributor Connector Breakdown (8 connectors, 4 families)

0

1

2

3

4

5

6

7

8

9

ad-hocbounded

RMI MessageGridFTP Message

SOAP Message

Event

HTTP MessagePeer Pieces

registry-basedattribute-basedHeirarchical

Flat

content-based

tcp/ip

architecture configuration

tracker

Exactly OnceAt least onceBest Effort

dynamiccachedstaticUnicastMulticastBroadcast

Num Connectors

distributor_routing_membershipdistributor_delivery_typedistributor_naming_typedistributor_naming_structuresdistributor_routing_typedistributor_delivery_semanticsdistributor_routing_pathdistributor_delivery_mechanisms

Stream Connector Breakdown (8 connectors, 4 families)

0

1

2

3

4

5

6

7

8

9

Raw

StructuredMany Senders

One Sender

RemoteLocal

Exactly OnceAt least onceBest Effort

bps

Many ReceiversOne Receiver

StatefulStatelessNamed

Bounded

Asynchronous

Time Out Synchronous

Buffered

Num Connectors

stream_formatsstream_cardinality_sendersstream_localitiesstream_deliveriesstream_throughputstream_cardinality_receiversstream_statestream_identitystream_boundsstream_synchronicitystream_buffering

Page 12: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

How do experts make these decisions?

• Performed survey of 33 “experts”• Experts defined to be

– Practitioners in industry, building data-intensive systems

– Researchers in data distribution– Admitted architects of data

distribution technologies

• General consensus?– They don’t the how and the why

about which connector(s) are appropriate

– They rely on anecdotal evidence and “intuition”

Percentage Breakdown of Expert Responses

67%

15%

15%

3%

No ResponseNot ComfortableNo TimeFull Response

Expert Survey Demographic

6%

18%

12%

12%6%

22%

6%

12%

6%

Cancer Research

Planetary Science

Earth Science

Industry

Grid Computing

Professors

Web Technologies

Open Source

Students45% of respondents claimed to be uncomfortable being addressed as a data

distribution expert.

Page 13: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Our Approach: DISCO

• Develop a software framework for:– Connector Classification

• Build metadata profiles of connector technologies, describing their intrinsic properties (DCPs)

– Connector Selection• Adaptable, extensible algorithm development framework

for selecting the “right” connectors (and identifying wrong ones)

– Connector Selection Analysis• Measurement of accuracy of results

– Connector Performance Analysis

Page 14: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

DISCO in a Nutshell

Page 15: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Building DCPs of all 13 connectors (Classification)

• Rely on Mehta et al. metadata to describe data distribution connectors

• Carefully select metadata to include/exclude

Page 16: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Develop complementary selection algorithms

Page 17: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Preliminary Evaluation

• We developed 13 connector profiles– Based on literature, expert

reviews, and our own development experience

• 30 distribution scenarios• 24 score functions (white

box) and Bayesian domain profiles with 100 conditional probabilities (black box)

ConnectorProfiles

Distribution Scenarios

Answer Key Score Bayesian

DISCO

Precision-RecallAnalysis

Clustering Clustering

Page 18: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Precision-Recall Results

• Error Rate– Probability of incorrectly

labeling a connector as appropriate for a scenario

• Precision– The fraction of selected

connectors appropriate for a scenario

• Recall– Probability of detecting a

connector as appropriate for a scenario

Bayesian Scored-based

True Positive (TP) 101 63

False Positive (FP) 25 200

True Negative (TN) 245 67

False Negative (FN) 19 60

Bayesian Scored-based

Error Rate 11.28% 32.56%

Precision 80.16% 48.46%

Recall 25.90% 16.15%

Page 19: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Related Work

Page 20: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Conclusions & Future Work

• Conclusions– Domain experts (gurus) rely on tacit knowledge and

often cannot explain design rationale– Disco provides a quantification of & framework for

understanding an ad hoc process– Bayesian algorithm has a higher precision rate

• Future Work– Explore the tradeoffs between white-box and black-

box approaches– Investigate the role of architectural mismatch in

connectors for data system architectures

Page 21: Software Connector Classification and Selection for Data-Intensive Systems Chris A. Mattmann, David Woollard, Nenad Medvidovic, Reza Mahjourian 2nd Intl.

Thank You!

Questions?