Building communities around open- source scientific software Karen Cranston National Evolutionary Synthesis Center (NESCent) @kcranstn http://www.slideshare.net/kcranstn
May 10, 2015
Building communities around open-source scientific software
Karen CranstonNational Evolutionary Synthesis Center (NESCent)
@kcranstnhttp://www.slideshare.net/kcranstn
NESCentNational Evolutionary Synthesis Center
www.nescent.org
fieldworklabwork
method development
meta-analysisdata synthesis
Species A (mm^2) F (mm^2/mm^2)
N (mm^-2) S (mm^4)
Abelia bifloraAbelia dielsiiAbelia integrifoliaAbelia mosanensisAbelia serrataAbelia spathulataAbutilon fruticosumAbutilon pannosumAcacia albidaAcacia ataxacanthaAcacia borleaeAcacia burkeiAcacia caffra
0.002375829 0.924197654 389.0 6.11E-060.00115375 0.357418211 331.0 3.49E-06
0.001134115 0.240432369 212.0 5.35E-060.000855299 0.632065665 739.0 1.16E-060.000706858 0.206402637 292.0 2.42E-060.000804248 0.230819095 287.0 2.80E-060.001452201 0.137959114 95.0 1.53E-050.003117245 0.124689812 40.0 7.79E-050.012271846 0.049087385 4.0 0.0030679620.013069811 0.169907541 13.0 0.001005370.004071504 0.061072561 15.0 0.0002714340.008992024 0.053952141 6.0 0.0014986710.010207035 0.214347725 21.0 0.000486049
+
trait data about species evolutionary trees
Outcomes: Community
Brian O'Meara, Michael Alfaro, Charles Bell, Ben Bolker, Marguerite Butler, Peter Cowan, Damien de Vienne, Richard Desper, Joe Felsenstein, Luke Harmon, Christoph Heibl, Andrew Hipp, Gene Hunt, Thibaut Jombart, Steve Kembel, Hilmar Lapp, Scott Loarie, Wayne Maddison, Peter Midford, David Orme, Emmanuel Paradis, Sam Price, Dan Rabosky, Brian Sidlauskas, Stacey Smith, Dave Swofford, Todd Vision, Peter Waddell, Amy Zanne, Derrick Zwickl [bold indicates organizer]
Comparative methods in hackathon
Rationale Work at hackathon (Dec. 10-14, 2007)The R statistical analysis package has emerged as a popular platform for implementation of powerful comparative phylogenetic methods to understand the evolution of organismal traits and diversification. It includes methods such as independent contrasts, ancestral state estimation, various models of continuous and discrete trait evolution, lineage through time plots, diversification tests, generalized estimating equations, tree plotting, and more. This event was designed to bring together active R developers as well as end-users working on the integration of comparative phylogenetic methods within R to actively address issues of data exchange standards, code interoperability, usability, documentation quality, and the breadth of functionality for comparative methods available within R. The idea originated from a whitepaper submitted by NESCent postdocs Amy Zanne and Sam Price.
•30 developers and users worked on programming & writing documentation•Split into subgroups on diversification, divergence times, documentation, class design, Mesquite-R interaction, input/output, and trait evolution•Package source code stored on shared repository hosted at R-forge (“PhyloConductor”)
Hackathon participants (red were flown to NESCent, purple participated remotely). Map from Google Maps
•Designed and began implementing a new S4 class for data and trees•Ran “bootcamps” for developers on numerical optimization and S4 coding•Used the Nexus Class Library (Lewis & Holder) and RCpp (Samperi) for reading and interpreting Nexus tree and data files•Began work on R tutorials•Tested existing methods in R, identifying errors•Developed ways for R to call Mesquite and Mesquite to call R
0
150
300
12/10 12/11 12/12 12/13 12/14 12/15
Commits
•R-Phylo Wiki (http://www.r-phylo.org): Tutorials and overview of available analyses and packages from the hackathon have been placed on a public website for all to use and improve. It’s had >7,000 page visits from >30 countries and >600 edits since it went live in March 2008.•R-sig-phylo mailing list (https://stat.ethz.ch/mailman/listinfo/r-sig-phylo): A mailing list for users of R for comparative methods and phylogenetics. Over 100 messages in its first four months.•Comparative methods in R user tutorials planned for 2009 Society for Integrative and Comparative Biology and Evolution meetings.•Addition of R track to NESCent summer course in phyloinformatics, featuring software developed at hackathon and taught by hackathon participant Marguerite Butler.•Proposal to NSF for summer course in R for phyloinformatics.•Ongoing collaborations between hackathon participants.•Two Google Summer of Code projects to sponsor student developers:
•Peter Cowan: Tree and data plotting in the phylobase project (see right)•Matthew Helmus: Enhancing the representation of ecophylogenetic tools in R in the picante project
NESCent informatics
Incompatible tree formats are used in different R packages
Package Function
geiger1.0-9.1 sim.char
ouch1.2-4 brown.dev
picante evolve.brownian
ape2.01 evolve.phylo
Redundancy (at least four functions to evolve traits up the tree using simple Brownian motion)
Can be intimidating to beginners
Coding at hackathon
The US National Evolutionary Synthesis Center (http://www.nescent.org) encourages synthetic, interdisciplinary, and transformative research in evolutionary biology. NESCent, a collaborative effort of Duke, NC State University and UNC Chapel Hill, is located in Durham NC and is supported by the National Science Foundation (EF-0423641).A major goal of NESCent's Informatics branch is to promote community-driven, collaborative open-source software development. This is achieved through hackathons, internships (such as the Google Summer of Code), summer courses, conference workshops, and by
externally funded collaborations for the development and support of important cyberinfrastructure
resources. NESCent accepts whitepapers that provide suggestions for future informatics activities from anyone at any time. See the website or the NESCent booth in the exhibit hall for more information.
Outcomes: Software•Phylobase (http://r-forge.r-project.org/projects/phylobase/): New package for phylogenetic trees and data. Can load trees and data from Nexus files, output to other tree formats, coordinate pruning of taxa from data and tree, traverse tree, handle DNA, morphological, and continuous data types. Work is ongoing (below) to enhance tree plotting and other functions. As with all hackathon products, new developers are welcome to join to further improve the code (one already has).
URL: http://hackathon.nescent.org/R_Hackathon_1 email: [email protected]
0
50
100
150
200
12/16 12/30 1/13 1/27 2/10 2/24 3/9 3/23 4/6 4/20 5/4 5/18 6/1
Commits
•Movement of existing packages to source code repositories allowing more collaborative development (i.e., Picante package has new Google Summer of Code 2008 developer Matthew Helmus)•R-Mesquite interaction: Code written to allow Mesquite (Maddison & Maddison, 2007) to call R packages (such as OUCH (Butler & King 2004) and APE (Paradis et al. 2004)), and for R to call headless Mesquite, although easier installation needs to be created.•Continuing improvement and release of packages by hackathon participants (GEIGER, LASER, ape).•See http://hackathon.nescent.org/R_Hackathon_1 for more info.
Coding for PhyloBase
Nat
ure
Prec
edin
gs :
doi:1
0.10
38/n
pre.
2008
.212
6.1
: Pos
ted
28 J
ul 2
008
O’Meara et al. Nature Preceedings. 2008 http://dx.doi.org/10.1038/npre.2008.2126.1
R-sig-phylo mailing list
32 R packages for comparative biology; maintained by a hackathon participant
Informatics team
Evolutionary biologists
computational skills
domain knowledge
NESCentNational Evolutionary Synthesis Center
www.nescent.orgwww.nescent.org
short bootcamps teaching computational skills to domain scientists
bringing students into open-source programming communities
A grassroots approach to software sustainability. Karen Cranston, Todd Vision, Brian O'Meara, Hilmar Lapp. http://dx.doi.org/10.6084/m9.figshare.790739