Top Banner
Short Introduction to Software engineering for bioinformatics Joe Miyamoto
40

Short Introduction of software engineering for bioinformatics

Jan 23, 2018

Download

Engineering

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Short Introduction of software engineering for bioinformatics

Short Introduction to Software engineering for bioinformatics

Joe Miyamoto

Page 2: Short Introduction of software engineering for bioinformatics

Progress of software development practice

Waterfall Agile DevOps

There is no silver bullet for software engineering and every each has own advantageBut in terms of bioinformatics, there is very little chance for adopting Waterfall.

Page 3: Short Introduction of software engineering for bioinformatics

Evolution rather than progress?

Waterfall Agile DevOps

Page 4: Short Introduction of software engineering for bioinformatics

In more precise…

Non-spiralSpiral Scrum

eXtremeProgramming(XP)

Waterfall Agile DevOps

Page 5: Short Introduction of software engineering for bioinformatics

Waterfall Agile DevOps

Page 6: Short Introduction of software engineering for bioinformatics

Waterfull stlyle development

• Very old and popular style of product development

Do not go back to previous section

It has to be clear about what we really want to make

Suited for Large-scale development

“Few A-Class architect and the mass of C-class programmer”Ref: http://fireside.gamejolt.com/post/the-game-creation-process-part-2-designing-the-idea-viq5rk2t

Page 7: Short Introduction of software engineering for bioinformatics

Waterfull style development

Advantage

•Easy to manage the progress(if we found no contingency)

Disadvantage

•Hard to manage the progress(if we do found contingency)

Page 8: Short Introduction of software engineering for bioinformatics

waterfull(spiral)

• Iteration of waterfull

Advantage

• Task becomes clear by each iteration

Disadvantage

• Time consuming

• Hard to determine how much we have to elaborate on first iteration

ref: http://www.qmetry.com/spiral.html

Page 9: Short Introduction of software engineering for bioinformatics

Waterfall Agile DevOps

Page 10: Short Introduction of software engineering for bioinformatics

Agile

• Antithesis for waterfull

• Not Technique, it’s Phiosophy

• 1 iteration is 1~4 week,and 1 feature for each iteration.

Ref: https://www.linkedin.com/pulse/essential-resources-services-technologies-your-startup-jason-oh

Page 11: Short Introduction of software engineering for bioinformatics

Agile

Advantage

• Easy to adopt changes

• Make clear where we are and where we want to go

Disadvantage

• Necessity for refactoring -> CI(We will see later)

• Communication cost -> No more than about 20 people

Page 12: Short Introduction of software engineering for bioinformatics

Difference of agile and spiral

• Spiral … makes every feature in each iteration

• Agile … implements only one feature for each iteration.

Page 13: Short Introduction of software engineering for bioinformatics

Non-spiralSpiral Scrum

eXtremeProgramming(XP)

Waterfall Agile DevOps

Page 14: Short Introduction of software engineering for bioinformatics

One way of agile incarnation

Focus on communication of developers

• Make a list for features we one to implement and update constantly

• Each iteration is 30 days and software has to be deployable in the end

• 15 minutes standing meeting everyday

• No partitioning

Scrum

Page 15: Short Introduction of software engineering for bioinformatics

Non-spiralSpiral Scrum

eXtremeProgramming(XP)

Waterfall Agile DevOps

Page 16: Short Introduction of software engineering for bioinformatics

eXtreme Programming(XP)

One way of agile incarnation

Focus on maintainability of Code

• Test Driven Development(TDD)

• Pare Programming

• Joint ownership of code

• Continuous Integration (CI)

• Issue Tracking

Page 17: Short Introduction of software engineering for bioinformatics

eXtreme Programming(XP)

One way of agile incarnation

Focus on maintainability of Code

• Test Driven Development(TDD)

• Pare Programming

• Joint ownership of code

• Continuous Integration (CI)

• Issue Tracking

Page 18: Short Introduction of software engineering for bioinformatics

2 purpose of software test

Test for users

Focused in Agile

Run test everytime we make a change to source code

Test for developers

Page 19: Short Introduction of software engineering for bioinformatics

eXtreme Programming(XP)

One way of agile incarnation

Focus on maintainability of Code

• Test Driven Development(TDD)

• Pare Programming

• Joint ownership of code

• Continuous Integration (CI)

• Issue Tracking

Page 20: Short Introduction of software engineering for bioinformatics

• Distributed Version Control System(DVCS)

• Able to share history of changes

• Cut a brunch for every single feature or subproject

Ref: http://gotgroove.com/ecommerce-blog/guide-to-version-control-for-magento-using-git-and-beanstalk/

Mercurial (more simple DVCS for pythonista) could be enough for some bioinformaticians, though…

Page 21: Short Introduction of software engineering for bioinformatics

Workflow using git(≒ how to branch)

There are several practice of branching but the following are the principle rule

• 1 feature 1 branch

• Master always have to be deployable

出典:https://www.atlassian.com/ja/git/workflows#!workflow-gitflow

Page 22: Short Introduction of software engineering for bioinformatics

• Hosting service for Git

• Filing issue for every subject makes project trackable

Coding -> Pull Request -> Review -> merge

By following this flow, Source code becomes less dependent to particular person

Page 23: Short Introduction of software engineering for bioinformatics

Workflow using Git&githubWork in local

repository

push

Pull Request

Code Review

merge

Fork & clone

Ref: http://acrl.ala.org/techconnect/post/coding-collaboration-on-github

Page 24: Short Introduction of software engineering for bioinformatics

Workflow using Git&githubWork in local

repository

push

Pull Request

Code Review

merge

Fork & clone

Ref: http://acrl.ala.org/techconnect/post/coding-collaboration-on-github

Ticketing↓Issue Tracking

Buid test↓CI

Page 25: Short Introduction of software engineering for bioinformatics

eXtreme Programming(XP)

One way of agile incarnation

Focus on maintainability of Code

• Test Driven Development(TDD)

• Pare Programming

• Joint ownership of code

• Continuous Integration (CI)

• Issue Tracking

Page 26: Short Introduction of software engineering for bioinformatics

Continuous Integration(CI)

• Run automated test constantly

• Makes easy to track a Problem

Jenkins: The CI tool

Ref: http://www.slideshare.net/whyme/jenkins-reviewbot

Page 27: Short Introduction of software engineering for bioinformatics

Github and CI tool

Run test every time pushing remote

Common Combination is Github + [travisCI or jenkins]

Ref: https://github.com/hltfbk/Excitement-Open-Platform/wiki/Developers

Page 28: Short Introduction of software engineering for bioinformatics

eXtreme Programming(XP)

One way of agile incarnation

Focus on maintainability of Code

• Test Driven Development(TDD)

• Pare Programming

• Joint ownership of code

• Continuous Integration (CI)

• Issue Tracking

Page 29: Short Introduction of software engineering for bioinformatics

Practice for Issue tracking

• Rough schedule is tracked by

Gantt chart, burn down chart

Ref: https://en.wikipedia.org/wiki/Gantt_chart

Ref: http://chandoo.org/wp/2009/07/21/burn-down-charts/

• More precise schedule will be managed by Tickets or issues

Redmine Github + Zenhub

Burn down chart

Gantt chart

Page 30: Short Introduction of software engineering for bioinformatics

Test Driven Development(TDD)

• Manage task Centrally as Ticket

• Make small tasks clear and trackable

出典:http://itpro.nikkeibp.co.jp/article/COLUMN/20130927/507265/?SS=imgview&FD=55983188&ST=devops

Is a commonly used tool

Page 31: Short Introduction of software engineering for bioinformatics

Waterfall Agile DevOps

Page 32: Short Introduction of software engineering for bioinformatics

DevOps

• Extending “Agile” from Development to operation

That is ..

• Reflect changes to working system instantly when we update a code. Not only developing a

software.But to Develop a

Whole System.

Page 33: Short Introduction of software engineering for bioinformatics

Technologies for Devops

•Virtualization using container

•Configuration Management tool

http://blog.xebialabs.com/2014/12/05/rocket-vs-docker-myth-simple-lightweight-enterprise-platform/

Fabric

Page 34: Short Introduction of software engineering for bioinformatics

Technologies for Devops

•Virtualization using container

•Configuration Management tool

http://blog.xebialabs.com/2014/12/05/rocket-vs-docker-myth-simple-lightweight-enterprise-platform/

Fabric

Page 35: Short Introduction of software engineering for bioinformatics

Tipical Situation in bioinformatics

Small daily analysis on laptop

Realize necessity of computation power

Move pipeline to High-performance server

Able to use Cloud?

Use CloudBiolinux or other VM imageFrom bioimg.org

_人人人人人人人人人人_

> dependency hell< ̄Y^Y^Y^Y^Y^Y^Y^Y^Y^Y ̄

Software (or package) Version difference

_人人人人人人人人人人_

> No Reproducibility< ̄Y^Y^Y^Y^Y^Y^Y^Y^Y^Y ̄

Page 36: Short Introduction of software engineering for bioinformatics

Container Virtualization(docker)

• Include wholeThird-Party developed software into one container.

• Build Once Run Anywhere

• Version-controlable and has Github-like Hosting service

Easy to transport between servers

Develop whole container as “Software”

Page 37: Short Introduction of software engineering for bioinformatics

Progress of Virtualization

chroot、cgroups KVM、Virtualbox

Isolation of file and process space OS Virtualization

• Heavy• Non-easy for Provisioning• Hard to use base image• (chroot has) a danger for depletion of

computation resource by 1 user.

Tries to take advantage of both

Page 38: Short Introduction of software engineering for bioinformatics

Emergenceof Counterforce

• Security problem

• Dockerfile problem

• Portablity problem

Some bugs around caching?Peculiar way of writing ->Better to use packer

Become root is must

Better to be run on Linux kernel version (>= 3.8)

Cloudius OSV

Problem of Docker

Not user-friendly enough so farNot enough community resource such as Base image

Not mature enough to use

Page 39: Short Introduction of software engineering for bioinformatics

Technologies for Devops

•Virtualization using container

•Configuration Management tool

http://blog.xebialabs.com/2014/12/05/rocket-vs-docker-myth-simple-lightweight-enterprise-platform/

Fabric

Page 40: Short Introduction of software engineering for bioinformatics

Infrastructure as code

• Maintain Server condfiguration as Code

• Assure to be idempotent

• Easily transport pipelines between servers

Fabric

Ruby base Python base

ChefZero

simple

• Chef requires users to remember fancy jargons• CloudBiolinux supports Fabric

Better to start from fabric

complex