Short Introduction to Software engineering for bioinformatics Joe Miyamoto
Progress of software development practice
Waterfall Agile DevOps
There is no silver bullet for software engineering and every each has own advantageBut in terms of bioinformatics, there is very little chance for adopting Waterfall.
Waterfull stlyle development
• Very old and popular style of product development
Do not go back to previous section
It has to be clear about what we really want to make
Suited for Large-scale development
“Few A-Class architect and the mass of C-class programmer”Ref: http://fireside.gamejolt.com/post/the-game-creation-process-part-2-designing-the-idea-viq5rk2t
Waterfull style development
Advantage
•Easy to manage the progress(if we found no contingency)
Disadvantage
•Hard to manage the progress(if we do found contingency)
waterfull(spiral)
• Iteration of waterfull
Advantage
• Task becomes clear by each iteration
Disadvantage
• Time consuming
• Hard to determine how much we have to elaborate on first iteration
ref: http://www.qmetry.com/spiral.html
Agile
• Antithesis for waterfull
• Not Technique, it’s Phiosophy
• 1 iteration is 1~4 week,and 1 feature for each iteration.
Ref: https://www.linkedin.com/pulse/essential-resources-services-technologies-your-startup-jason-oh
Agile
Advantage
• Easy to adopt changes
• Make clear where we are and where we want to go
Disadvantage
• Necessity for refactoring -> CI(We will see later)
• Communication cost -> No more than about 20 people
Difference of agile and spiral
• Spiral … makes every feature in each iteration
• Agile … implements only one feature for each iteration.
One way of agile incarnation
Focus on communication of developers
• Make a list for features we one to implement and update constantly
• Each iteration is 30 days and software has to be deployable in the end
• 15 minutes standing meeting everyday
• No partitioning
Scrum
eXtreme Programming(XP)
One way of agile incarnation
Focus on maintainability of Code
• Test Driven Development(TDD)
• Pare Programming
• Joint ownership of code
• Continuous Integration (CI)
• Issue Tracking
eXtreme Programming(XP)
One way of agile incarnation
Focus on maintainability of Code
• Test Driven Development(TDD)
• Pare Programming
• Joint ownership of code
• Continuous Integration (CI)
• Issue Tracking
2 purpose of software test
Test for users
Focused in Agile
Run test everytime we make a change to source code
Test for developers
eXtreme Programming(XP)
One way of agile incarnation
Focus on maintainability of Code
• Test Driven Development(TDD)
• Pare Programming
• Joint ownership of code
• Continuous Integration (CI)
• Issue Tracking
• Distributed Version Control System(DVCS)
• Able to share history of changes
• Cut a brunch for every single feature or subproject
Ref: http://gotgroove.com/ecommerce-blog/guide-to-version-control-for-magento-using-git-and-beanstalk/
Mercurial (more simple DVCS for pythonista) could be enough for some bioinformaticians, though…
Workflow using git(≒ how to branch)
There are several practice of branching but the following are the principle rule
• 1 feature 1 branch
• Master always have to be deployable
出典:https://www.atlassian.com/ja/git/workflows#!workflow-gitflow
• Hosting service for Git
• Filing issue for every subject makes project trackable
Coding -> Pull Request -> Review -> merge
By following this flow, Source code becomes less dependent to particular person
Workflow using Git&githubWork in local
repository
push
Pull Request
Code Review
merge
Fork & clone
Ref: http://acrl.ala.org/techconnect/post/coding-collaboration-on-github
Workflow using Git&githubWork in local
repository
push
Pull Request
Code Review
merge
Fork & clone
Ref: http://acrl.ala.org/techconnect/post/coding-collaboration-on-github
Ticketing↓Issue Tracking
Buid test↓CI
eXtreme Programming(XP)
One way of agile incarnation
Focus on maintainability of Code
• Test Driven Development(TDD)
• Pare Programming
• Joint ownership of code
• Continuous Integration (CI)
• Issue Tracking
Continuous Integration(CI)
• Run automated test constantly
• Makes easy to track a Problem
Jenkins: The CI tool
Ref: http://www.slideshare.net/whyme/jenkins-reviewbot
Github and CI tool
Run test every time pushing remote
Common Combination is Github + [travisCI or jenkins]
Ref: https://github.com/hltfbk/Excitement-Open-Platform/wiki/Developers
eXtreme Programming(XP)
One way of agile incarnation
Focus on maintainability of Code
• Test Driven Development(TDD)
• Pare Programming
• Joint ownership of code
• Continuous Integration (CI)
• Issue Tracking
Practice for Issue tracking
• Rough schedule is tracked by
Gantt chart, burn down chart
Ref: https://en.wikipedia.org/wiki/Gantt_chart
Ref: http://chandoo.org/wp/2009/07/21/burn-down-charts/
• More precise schedule will be managed by Tickets or issues
Redmine Github + Zenhub
Burn down chart
Gantt chart
Test Driven Development(TDD)
• Manage task Centrally as Ticket
• Make small tasks clear and trackable
出典:http://itpro.nikkeibp.co.jp/article/COLUMN/20130927/507265/?SS=imgview&FD=55983188&ST=devops
Is a commonly used tool
DevOps
• Extending “Agile” from Development to operation
That is ..
• Reflect changes to working system instantly when we update a code. Not only developing a
software.But to Develop a
Whole System.
Technologies for Devops
•Virtualization using container
•Configuration Management tool
http://blog.xebialabs.com/2014/12/05/rocket-vs-docker-myth-simple-lightweight-enterprise-platform/
Fabric
Technologies for Devops
•Virtualization using container
•Configuration Management tool
http://blog.xebialabs.com/2014/12/05/rocket-vs-docker-myth-simple-lightweight-enterprise-platform/
Fabric
Tipical Situation in bioinformatics
•
Small daily analysis on laptop
Realize necessity of computation power
Move pipeline to High-performance server
Able to use Cloud?
Use CloudBiolinux or other VM imageFrom bioimg.org
_人人人人人人人人人人_
> dependency hell< ̄Y^Y^Y^Y^Y^Y^Y^Y^Y^Y ̄
Software (or package) Version difference
_人人人人人人人人人人_
> No Reproducibility< ̄Y^Y^Y^Y^Y^Y^Y^Y^Y^Y ̄
Container Virtualization(docker)
• Include wholeThird-Party developed software into one container.
• Build Once Run Anywhere
• Version-controlable and has Github-like Hosting service
Easy to transport between servers
Develop whole container as “Software”
Progress of Virtualization
chroot、cgroups KVM、Virtualbox
Isolation of file and process space OS Virtualization
• Heavy• Non-easy for Provisioning• Hard to use base image• (chroot has) a danger for depletion of
computation resource by 1 user.
Tries to take advantage of both
Emergenceof Counterforce
• Security problem
• Dockerfile problem
• Portablity problem
Some bugs around caching?Peculiar way of writing ->Better to use packer
Become root is must
Better to be run on Linux kernel version (>= 3.8)
Cloudius OSV
Problem of Docker
Not user-friendly enough so farNot enough community resource such as Base image
Not mature enough to use
Technologies for Devops
•Virtualization using container
•Configuration Management tool
http://blog.xebialabs.com/2014/12/05/rocket-vs-docker-myth-simple-lightweight-enterprise-platform/
Fabric