Genomes on Rails has_many :sequences
May 19, 2015
Genomes on Railshas_many :sequences
Hello
➊ Previously
➋ Production
➌ Process
➊ Previously
The human genome
15 years to decode
3 billion letters
$3 billion
$3 billion ++
Race for the prize
Open data
Open source
Perl
Lots of Perl
Lots of Perl~4500 modules
Onwards!
40 species
Map evolutionaryspace
Compare genomes
Compare genomes
compare species
Compare genomes
compare species
compare individuals
More Perl~1500 modules
Quantum leap!
1000 personal genomes
1000 personal genomes
beyond 23andme
Hypertension
Diabetes
Coronary heart disease
Bipolar disorder
Malaria
➋ Production
Register projects
Register samples
Sample prep
Sequencing
Analysis
Change!
Flexible data capture
Virtual fields
Sample
Name
Organism
Concentration
class Sample < ActiveRecord::Base has_many :descriptors has_many :descriptor_valuesend
Key value pairs
Faster than you’d think
Change!
Sample
Name
Organism
Concentration
Sample
Name
Organism
Concentration
Origin
Quality metric
V1 V2
Rationalize!
Sample
Name
Organism
Concentration
Sample
Name
Organism
Concentration
Origin
Quality metric
V1 V2
Mapping!
Sample
Name
Organism
Concentration
Sample
Name
Species
Concentration
Origin
Quality metric
V1 V3
Origin
Pipeline management
Task 1 Task 2 Task 3
Workflow
Name
Operator
Instrument
Name
Serial number
Kit
Name
Passed
Throughput!
320Tb 450 CPU
320Tb 450 CPU Archive
75Tb
pilot study!
Multiple apps
Multiple instances
Loosely coupled
Loose coupling is hard
Deployment
Maintenance
Monitoring
Hard to maintain separation
Support novel science
Single code base
nginx reverse proxy
fairnginx
Mongrel
Fast deployment
Automate everything
Interoperability!
Play well with others!
Legacy databases
RESTful services
Generate API stubs
SCALE!
Trillionics
2X
150Tb per week
Over 6 months
More hardware
400 additional nodes
additional 360 Tb
Towards a Virtual Institute
Lots of data
Lots of data, lots of people
Lots of data, lots of people, lots of compute
Lots of data, lots of people, lots of compute,
lots of uses
Lots of data, lots of people, lots of compute, lots of uses, lots and lots
and lots and lots...
➌ Process
Concept Requirements Development Product
Concept Requirements Development Product
takes too long
RequirementsConcept Development Product
these change
takes too long
Concept
What we need Get ready
DevelopmentPlan
REVIEW
Focused
Project owner is key
Weekly releases
More flexible
Less time
Better transparency
Less software
Sequencing informatics
Thank you
GREENISGOOD.CO.UK