Top Banner
Little eScience Andrea Wiggins June 18, 2009
46

Little eScience

May 15, 2015

Download

Technology

Andrea Wiggins

Presentation for the myGrid team at the University of Manchester, putting the practice of eScience into the context of little science
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Little eScience

Little eScience

Andrea WigginsJune 18, 2009

Page 2: Little eScience

Overview

• Background

• Exposition: Sociology of Science

• Broad generalizations about science

• Example: FLOSS Research

• Little science context for eScience research

• Expectations: What next?

http://www.flickr.com/photos/pmtorrone/304696349/

Page 3: Little eScience

• BA: Maths with economics

• Nonprofit & IT industry work

• Adult literacy, nonprofit management support, professional theatre

• Web analytics

• MSI: Human-computer interaction, complex systems & network science

• PhD: Information science & technology

My Background

Page 4: Little eScience

Science

• Systematic investigation for the production of knowledge

• Scientific method emphasizes reproducibility

• Not all phenomena are reproducible...

• Many categories

• Experimental, applied, social, etc.

• Categories are not mutually exclusive

http://www.flickr.com/photos/radiorover/419414206/

Page 5: Little eScience

• Kuhn - Laws, theories, applications & instrumentation that create coherent traditions of scientific research

• Paradigms help us direct our research, but limit our view of the world

• New technologies can lead to scientific revolutions by revealing anomalies

Paradigms & Revolutions

http://www.flickr.com/photos/weichbrodt/644302381/

Page 6: Little eScience

Normal Science

• Kuhn - “normal science” is research based on broadly accepted scientific paradigms

• Shared paradigms are based on rules and standards for scientific practice

• Key requirement: agreement onfocus and conduct of research

• Ǝ(Grand Challenges)|Discipline

http://www.flickr.com/photos/themadlolscientist/2421152973/

Page 7: Little eScience

Big Science

• de Solla Price - “Big Science” is...

• Inherently paradigmatic

• Always normal science

• Produces detailed insights into the minutiae of phenomena studied in the paradigm

http://www.flickr.com/photos/31333486@N00/1883498062/

Page 8: Little eScience

• Paradigms require agreement on...

• Epistemology

• Ontology

• Methodology

• Most social sciences are pre-paradigmatic

• Primarily exploratory research

• Very little replication

Pre-paradigmatic Science

http://www.flickr.com/photos/askpang/327577395/

Page 9: Little eScience

Little Science

• de Solla Price - “Little Science” is aromanticized precursor to Big Science,featuring lone, long-haired geniuses misunderstood by society, etc.

• If it’s not Big Science, it’s Little Science

• Pre-paradigmatic and fraught with ambiguity

• Often fundamentally exploratory

• Epistemological/theoretical/methodologicaldivergence among researchers

http://www.flickr.com/photos/mrjoax/2548045246/

Page 10: Little eScience

Social Science

• Social science is real science: the goal is systematic knowledge production

• Focuses on the study of the social life of human groups and individuals

• IMHO, fundamentally more difficult than “hard” sciences due to infinite complexity of social phenomena

• Replicability is a major challenge with respect to scientific method

• Not all social science can or shouldaspire to replicability

http://www.flickr.com/photos/smiteme/2379629501/

Page 11: Little eScience

Normalizing Science

• Becoming a normal science requires community and convergence

• Ǝ(community) != Ǝ(agreement)

• Establishing grand challenges and methods are primary tasks of normalizing

• Resistance to change is pervasive

http://www.flickr.com/photos/9036026@N08/2949211479/

Page 12: Little eScience

Scientific Collaboration

• Collaboration requires common focus, if not also epistemology and ontology

• Challenging enough in normal sciences

• Harder in pre-paradigmatic research

• Economics: systemic disincentives to collaborate, versus potential benefits and ideals of science

http://www.flickr.com/photos/richardsummers/542738965/

Page 13: Little eScience

• LHC, CERN, etc.

• Thousands of collaborators

• Complex but coordinated,at least somewhat centralized

• Requires shared goals and resources, plus (lots of) communication

• Only happens in normal sciences

Big Science Collaboration

http://www.flickr.com/photos/8767020@N08/531355152/

Page 14: Little eScience

• A Professor & a grad student, give or take

• Localized goals and resources

• -> localized research practices

• Small research teams

• Fundamentally difficult to achieve consensus that allows larger groups

• Restricts the ability to obtain fundingand undertake ambitious projects

Little Science Collaboration

http://www.flickr.com/photos/lamazone/2735939345/

Page 15: Little eScience

Scientific Collaboration Requirements

• Shared goals

• Establishes focus of research

• Shared research resources

• Both social and artifactual

• Social aspects include training and community socialization

http://www.flickr.com/photos/ryanr/142455033/

we can has share?

Page 16: Little eScience

• Letters, Books, Journals, Lectures

• Also technologies: methods, instrumentation

• Sharing?

• Recordkeeping is not alwaysa researcher’s main priority

• Without records, there’s notmuch to share except theresearch outputs

Historical Research Artifacts

http://www.flickr.com/photos/smailtronic/1535870363/

Page 17: Little eScience

Today’s Research Artifacts

• Large scale datasets, scripts, software, workflows, papers, images, video, audio, annotations, ephemera, web sites...

• “Research objects” -bundling all the pieces together

• Hybrids of boundary objects and touchstones

• Technologies -> scientific revolution!

• Open science

http://www.flickr.com/photos/smiteme/2379630899/

Page 18: Little eScience

Example: FLOSS Research

• Phenomenological & interdisciplinary

• Software engineering, Information Systems,Anthropology, Sociology, CSCW, etc...

• Ethos

• (Idealistic) combination of open source values and scientific values

http://www.flickr.com/photos/themadlolscientist/2542236565/

Page 19: Little eScience

FLOSS Phenomenon

• Free/Libre Open Source Software “Free as in speech, free as in beer” - liberty versus cost

• Distributed collaboration to develop software

• Volunteers and sponsored developers

• Community-based model of development

http://www.flickr.com/photos/prawnwarp/541526661/

Page 20: Little eScience

Typical FLOSS Research Topics

• Coordination and collaboration

• Growth and evolution (social and code)

• Code quality

• Business models and firm involvement

• Motivation, leadership, success

• Culture and community

• Intellectual property and copyright http://www.flickr.com/photos/eean/519258881/

Page 21: Little eScience

What we study @ SU

• Social aspects of FLOSS

• What practices make some distributed work teams more effective than others?

• How are these practices developed?

• What are the dynamics through which self-organizing distributed teams develop and work?

Page 22: Little eScience

Sharing FLOSS Research Artifacts

• Community: Small but growing, maybe around 400 researchers worldwide, with lively face-to-face interaction but relatively low listserv activity

• Data: Lots of it, and readily available, though often difficult to use for several reasons

• Analyses and tools: Not quite as easy to get, but there if you can find them

• Papers: Repositories are as yetunderdeveloped, but efforts areunderway

http://www.flickr.com/photos/12698507@N08/2762563631/

Page 23: Little eScience

FLOSS Research Community

• Handful of small research groups, mostly in UK & Europe

• Most often found in Software Engineering departments

• International conferences targeted to academics, developers, or both

• OSS, ICSE, FOSDEM, etc.

• IFIP WG 2.13

http://www.flickr.com/photos/steevithak/2883218362/

Page 24: Little eScience

FLOSS Research Data

• Data sources include interviews, surveys, and ethnographic fieldwork

• Digital “trace” data: archival, secondary, by-product of work, easy but hard

• Repositories

• Hosting “forges” like SourceForge, FreshMeat, RubyForge, etc.

• RoRs: Repositories of Repositories

• Data sources for research

Page 25: Little eScience

We Built It...

• Motivations

• Stop hammering forge servers, getting entire campus IPs blocked...

• Stop reinventing the wheel!

• Adoption

• Shared data sources seeing increasing use

• Next step is harder: sharing tools and workflows

http://www.flickr.com/photos/circulating/997909242/

Page 26: Little eScience

RoRs: FLOSSmole

• Multiple PIs @ Syracuse, Elon, & Carnegie MellonOne grad student @ SU (me), a couple of undergrads @ Elon

• Public access to 300+ GB data on

• 300K+ projects from 8 repositories

• Flat files & SQL datamarts

• Released via SF & GC

• 5 TB allotment on TeraGrid @ SDSC

Page 27: Little eScience

RoRs: FLOSSmetrics

• Produced by LibreSoft with academic and corporate partners

• Public access to data for 2800+ projects

• Analyzed & raw data from CVS, email, trackers

• Tools for:

• calculating code metrics

• parsing trackers

• parsing email lists

Page 28: Little eScience

RoRs: SRDA

• SourceForge Research Data Archive

• One PI @ Notre Dame University

• One massive 300 GB+ SQL db of monthly dumps from SourceForge

• Original obtuse structure, regular table deprecation, some documentation

• Gated access: researchers only,condition of data release from SF

Page 29: Little eScience

RoRs: Emerging Sources

• Ultimate Debian Database (UDD)

• 300 MB compressed Postgres DB, produced by Debian community

• Planning to add to FLOSSmole

Page 30: Little eScience

• When available...

• Bespoke Scripts

• Taverna workflows

FLOSS Research Analyses

Page 31: Little eScience

FLOSS Research Papers

• First, there was opensource.mit.edu

• They no longer maintain it, and gave us the data

• Work-in-progress working papers repository at FLOSSpapers.org

• Essential viability problem is thatrepositories require long-termstewardship...

• ...which requires long-termcommitments of funding and personnel, not just volunteers

Page 32: Little eScience

FLOSS Research Collaboration

• Multiple partners involved in producing FLOSSmole & FLOSSmetrics

• Federated data sources by choice, starting to develop ontologies

• As yet, a Little Science domain

• Cross-institutional collaborationposes many challenges

• Usual difficulties magnified bygeneral lack of resources, bothfinancial and human

Page 33: Little eScience

Latest Initiatives

• Resource-oriented

• Expanding resources: data, research artifacts, and pedagogical materials

• DOIs: 10.4118/*

• Semantic data interoperability

• Community-oriented

• FLOSShub.org

Page 34: Little eScience

Evangelizing eScience

• Made presentations at OSS conferences: well received, but hard to make converts for several reasons

• Tried to get other research group members to use Taverna: learning overhead is too high for most

• Submitted a paper on eScienceto an IS conference: rejected because reviewers were unable to adequately evaluate eScienceas a topic, as it’s too unfamiliar

• Currently just doing our work this way, as an exemplar

http://www.flickr.com/photos/naezmi/2418745377/

Page 35: Little eScience

Barriers to Uptake

• Lack of agreement in research focus, theory, methods; researcher isolation

• Bimodal distribution of requisite skills

• “I can’t possibly do that! I can’t code!”

• “Why bother? I can code my own. You should too; just use Python.”

“Overheard” on Twitter:

Friend #1: i HATE that openoffice automatically took over my "open with..." defaults.

Friend #2: @Friend #1 <opensourcedeveloper> If you don't like it, then why don't you submit code to change the behavior!? </opensourcedeveloper> http://www.flickr.com/photos/noner/1739876378/

Page 36: Little eScience

What I had to learn to get this far

• Taverna

• A lot more Unix terminal & XML

• Relational DB management & SQL

• More R, plus packages and dependency management

• Java & Eclipse - just enough to write my own Beanshells

• SVN & SSH

• A little bit of OWL, RDF, & SPARQL

• I would not have taken this on if I had known what was in store, but once I got started, I was hooked

http://www.flickr.com/photos/sashala/292868436/

Page 37: Little eScience

Sociotechnical Engineering

• Tools are part of the solution, thanks to brilliant CS and SE people

• Social elements are the true barrier

• Awareness of methods andbenefits

• Incentive systems

• Resistance to change (paradigms again)

• Proof of concept is difficulthttp://www.flickr.com/photos/pinprick/3117108495/

Page 38: Little eScience

Using Taverna for Little eScience

• Implementing analysis is usually easy

• Data handling is almost always hard

• All data are in SQL databases, with consistent IDs

• Lots of data manipulation is required

• Avoiding web services as much as possible

• Infrastructure and resources are limited

• Benefit is truly questionable: AFAIK, I am 50% of the user base...

Page 39: Little eScience

• Estimating user base and potential user interest in FLOSS projects

• Based on common release-and-download patterns

• Proxy for project success, a common dependent variable

Example: Our Recent Research

Version 0.5 Version 0.6 Version 0.7

Area under curve is active users updating

Active user base growth

Potential user experimentation growth (good publicity?)

down

load

s

Page 40: Little eScience

“Normal” Download-Release Patterns

BibDesk

down

loads

● ● ● ● ● ●

1000

2000

3000

4000

5000

Oct−2005 Apr−2006 Oct−2006 Apr−2007

measure

user_base

baseline

Page 41: Little eScience

External effects!Taverna’s Download-

Release Patterns

1.3.2-RC1+2 presentations 1.5.0

? ?

Page 42: Little eScience

Taverna’s Estimated Baseline & User Base

14 day baseline & drop-off

Page 43: Little eScience

Taverna’s Estimated Baseline & User Base

7 day baseline & drop-off

Page 44: Little eScience

Interpretation

• Taverna is not a “normal” open source project

• Speaking tours, tutorials, articles, and other events influence downloads

• What this demonstrates...

• Care is needed with quantitative measures

• Not all open source projects are the same

• Taverna users are just as reactive as any

http://www.flickr.com/photos/pagedooley/2121472112/

Page 45: Little eScience

Where next?

• Adoption is a long-term agenda, as changing social practices doesn’t happen overnight

• For FLOSS research and our disciplinary communities

• We will keep doing our work this way, and hope to draw in others

“Won’t you come out and play?”

http://www.flickr.com/photos/atiq/2658884520/

Page 46: Little eScience

Thanks!

• Credits where they are due

• Kevin Crowston, my advisor

• James Howison, my collaborator

• Everett Wiggins, my husband