Learning Life through Bioinformatics…
What is Bioinformatics?What is Bio-Bio-1?
Fokhruz Zaman
Founder & Evangelist, Bio-Bio-1
4th July, 2013; UODA BI Workshop, Dhaka
Learning Life through Bioinformatics…
What is Bioinformatics?
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 2
Learning Life through Bioinformatics…
What is Bioinformatics?
Finding patterns in molecular biological data
Implies: • managing molecular biological data• identifying correlations in molecular biological
data
Goals:• characterise biological patterns & processes• predict biological properties
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 3
Learning Life through Bioinformatics…
Bioinformatics: neighbour disciplines
• Computational biology– Broader concept: includes computational ecology,
physiology, neurology etc...
• -omics:– Genomics– Transcriptomics– Proteomics
• Systems biology– Putting it all together...– Building models, identify control & regulation
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 4
Learning Life through Bioinformatics…
Bioinformatics: prerequisites
• Bio- side:– Molecular biology– Cell biology – Genetics– Evolutionary theory
• -informatics side:– Computer science– Statistics– Theoretical physics
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 5
Learning Life through Bioinformatics…
Central Dogma of Molecular Biology
GENOTYPE (i.e. Aa)
PHENOTYPE (pink)
GENE (DNA)
MESSENGER (RNA)
PROTEIN
TRAIT
ATGCAAGTCCACTGTATTCCA
UACGUUCAGGUGACAUAAGGG
transcription reverse tr
translation
replication
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 6
Learning Life through Bioinformatics…
Molecular biology data...
>alpha-DATGCTGACCGACTCTGACAAGAAGCTGGTCCTGCAGGTGTGGGAGAAGGTGATCCGCCACCCAGACTGTGGAGCCGAGGCCCTGGAGAGGTGCGGGCTGAGCTTGGGGAAACCATGGGCAAGGGGGGCGACTGGGTGGGAGCCCTACAGGGCTGCTGGGGGTTGTTCGGCTGGGGGTCAGCACTGACCATCCCGCTCCCGCAGCTGTTCACCACCTACCCCCAGACCAAGACCTACTTCCCCCACTTCGACTTGCACCATGGCTCCGACCAGGTCCGCAACCACGGCAAGAAGGTGTTGGCCGCCTTGGGCAACGCTGTCAAGAGCCTGGGCAACCTCAGCCAAGCCCTGTCTGACCTCAGCGACCTGCATGCCTACAACCTGCGTGTCGACCCTGTCAACTTCAAGGCAGGCGGGGGACGGGGGTCAGGGGCCGGGGAGTTGGGGGCCAGGGACCTGGTTGGGGATCCGGGGCCATGCCGGCGGTACTGAGCCCTGTTTTGCCTTGCAGCTGCTGGCGCAGTGCTTCCACGTGGTGCTGGCCACACACCTGGGCAACGACTACACCCCGGAGGCACATGCTGCCTTCGACAAGTTCCTGTCGGCTGTGTGCACCGTGCTGGCCGAGAAGTACAGATAA>alpha-AATGGTGCTGTCTGCCAACGACAAGAGCAACGTGAAGGCCGTCTTCGGCAAAATCGGCGGCCAGGCCGGTGACTTGGGTGGTGAAGCCCTGGAGAGGTATGTGGTCATCCGTCATTACCCCATCTCTTGTCTGTCTGTGACTCCATCCCATCTGCCCCCATACTCTCCCCATCCATAACTGTCCCTGTTCTATGTGGCCCTGGCTCTGTCTCATCTGTCCCCAACTGTCCCTGATTGCCTCTGTCCCCCAGGTTGTTCATCACCTACCCCCAGACCAAGACCTACTTCCCCCACTTCGACCTGTCACATGGCTCCGCTCAGATCAAGGGGCACGGCAAGAAGGTGGCGGAGGCACTGGTTGAGGCTGCCAACCACATCGATGACATCGCTGGTGCCCTCTCCAAGCTGAGCGACCTCCACGCCCAAAAGCTCCGTGTGGACCCCGTCAACTTCAAAGTGAGCATCTGGGAAGGGGTGACCAGTCTGGCTCCCCTCCTGCACACACCTCTGGCTACCCCCTCACCTCACCCCCTTGCTCACCATCTCCTTTTGCCTTTCAGCTGCTGGGTCACTGCTTCCTGGTGGTCGTGGCCGTCCACTTCCCCTCTCTCCTGACCCCGGAGGTCCATGCTTCCCTGGACAAGTTCGTGTGTGCCGTGGGCACCGTCCTTACTGCCAAGTACCGTTAA
• DNA sequences
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 7
Learning Life through Bioinformatics…
Molecular biology data...
• Amino acid sequences
• Protein structure:– X-ray crystallography
– NMR (Nuclear Magnetic Resonance) Spectroscopy
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 8
Learning Life through Bioinformatics…
Cell biology & proteomics data...
• Subcellular localization
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 9
Learning Life through Bioinformatics…
Cell biology & proteomics data...
protein-protein interactions
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 10
Learning Life through Bioinformatics…DNA microarray technologyTranscriptomics: DNA microarray
technology
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 11
Learning Life through Bioinformatics…
Proteomics & transcriptomics data
Proteins encoded by periodically expressed genes: a functionally diverse protein category
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 12
Learning Life through Bioinformatics…
Phenotype data: human diseases
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 13
Learning Life through Bioinformatics…
Jute Genome Sequencing
• Genome sequencing of jute (mystery of origin of jute) disclosed by Bangladeshi Scientists, June 2010
• Opening up a new vista in the development of variety of the world's most biodegradable natural fibre
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 14
Learning Life through Bioinformatics…
Abiotic Stress tolerance and Crop Improvement
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 15
National Plant Genome Initiative (2009-2013)
1. Expand genomic resources for every major plant of economic importance
• Understanding of plant epigenomes
• Mining plant diversity • Survey sequence resources for thousands of plants• A new kind of reference genome• Integrated comparative sequence resources
2. Advance plant systems biology
• Toolkits to enable systems-level analysis of key plant processes
•Regulation of plant structure and composition
3. Translate basic discovery to the field
• High-throughput phenotyping under field conditions • Breeding for improved local adaptation to biotic and abiotic stress
• A National Genetic Trait Index
Learning Life through Bioinformatics…
Ref: Shortliffe, 1995
Bioinformatics & Human Health
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 17
Learning Life through Bioinformatics…
Bioinformatics as in-silico biology- generates testable hypotheses for the biologist- explores domains that can not be addressed
experimentally
Translational Bioinformatics / Medicine
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 18
Learning Life through Bioinformatics…
Bioinformatics & Human Health
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 19
Learning Life through Bioinformatics…
Would we want to live longer, healthier?
Would we benefit from better crops?
Bioinformatics
Why Bioinformatics?
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 20
Learning Life through Bioinformatics…
Traditional Methods of Drug Discovery
natural (plant-derived)
treatment for illness / ailments
↓
isolation of active
compound(small, organic)
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 21
Learning Life through Bioinformatics…
synthesisof compound
↓
manipulation of structure to get
better drug(greater efficacy, fewer side effects)
Aspirin
Traditional Methods of Drug Discovery
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 22
Learning Life through Bioinformatics…
Modern Methods of Drug Discovery
What’s different?
• Drug discovery process begins with a disease (rather than a treatment)
• Use disease model to pinpoint relevant genetic / biological components (i.e. possible drug targets)
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 23
Learning Life through Bioinformatics…
Modern Drug Discovery
disease → genetic / biological target
↓discovery of a “lead” molecule
- design assay to measure function of target
- use assay to look for modulators of target’s function
↓high throughput screen (HTS)
- to identify “hits” (compounds with binding in low nM to low μM range)
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 24
Learning Life through Bioinformatics…
small molecule hits↓
manipulate structure to increase potency
i.e. decrease Ki to low nM affinity
↓
*optimization of lead molecule into candidate drug*
fulfillment of required pharmacological properties:potency, absorption, bioavailability, metabolism, safety
↓
clinical trials
Modern Drug Discovery
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 25
Learning Life through Bioinformatics…Interesting facts...
• Over 90% of drugs entering clinical trials fail to make it to market
• The average cost to bring a new drug to market is estimated at $770 million
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 26
Learning Life through Bioinformatics…
Drug discovery and development
• It costs in Billions USD and takes ~ 11 years• In 2010, to bring a new Drug to Market was USD 1.2 Billion• 4 out of 5 drugs fails during 1st Phase of Clinical trials
In-silico methods• save an average of $130 million and 0.8 years per drug (2003)
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 27
Learning Life through Bioinformatics…
Beowulf Cluster Computing
Each Computer in the cluster is equipped with:
– Intel Core 2 Duo 6400 Processor(Master: Core 2 Duo 6700)
– 2 Gigabytes of DDR RAM in Dual Channel
– D-Link Gigabyte Network Interface Card(Master: 2x Cards)
– 60 Gigabyte Hard Drive(Master: 1000 Gigabyte RAID 5)
Sample Cluster Computer
CLUSTER USES: Clusters have a variety of different applications in the world. They are used in bioinformatics to run DNA string matching algorithms or to run protein folding applications. Geologists also use clusters to emulate and predict earthquakes and model the interior of the Earth and sea floor Clusters are even used to render and manipulate high-resolution graphics in engineering. Our completed Beowulf cluster will use a computer algorithm known as BLAST,(Basic Local Alignment Search Tool), to analyze massive sets of DNA sequences for research into Bioinformatics.
Researcher: Ben Case
Researcher: Stephen Ciesla
Advisor: Ed Harcout
Biology Consultant: Lorraine Olendzenski
Node Computers
Master Computer
PROJECT: We constructed a parallel processing computer system using the Beowulf cluster computing design created at NASA in an attempt to build a powerful computer that could assist in Bioinformatics research and data analysis.
BEOWULF CLUSTERS: A Beowulf Cluster is a computer design that uses parallel processing across multiple computers to create cheap and powerful supercomputers. A Beowulf Cluster in practice is usually a collection of generic computers, either stock systems or wholesale parts purchased independently and assembled, connected through an internal network.
A cluster has two types of computers, a master computer, and node computers. When a large problem or set of data is given to a Beowulf cluster, the master computer first runs a program that breaks the problem into small discrete pieces; it then sends a piece to each node to compute. As nodes finish their tasks, the master computer continually sends more pieces to them until the entire problem has been computed.
MPICH2: In order for the master and node computers to communicate, some sort message passing control structure is required. MPI,(Message Passing Interface) is the most commonly used such control, and the one that we've incorporated into our project. MPICH2 is a implementation of MPI that was specifically designed for use with cluster computing systems and parallel processing. It is an open source set of libraries for various high level programming languages that give programmers tools to easily control how large problems are broken apart and distributed to the various computers in a cluster.
OUR CLUSTER: Using funding from the Biology department, the cluster we constructed contains eight computers with one master and seven node computers. Each computer in the cluster contains a dual core processor, giving us a total of 16 processors to utilize. Each runs on the Fedora Core 6 version of Linux and uses the MPICH2 libraries for message passing. They are all connected on a internal network through a high speed gigabyte switch.
2 GB RAM
SATA Hard Drives
D-Link Network Card
Intel Core 2 Processor
RESULTS: The total processing power of our cluster has yet to be determined. Once the cluster has been completely streamlined and stabilized, we will run benchmark tests to calculate its average and peak performances
CLUSTER LAYOUT AND DESIGN:
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 28
Learning Life through Bioinformatics…E
xper
tise
leve
l
SYS ALGO STAT VERB LUCKBIO
apprentice ~ 2,000 hours
mastery ~ 10,000 hours
critical weakness – below freshman level knowledge
Let’s ALL try to be Bioinformaticians!
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 29
Learning Life through Bioinformatics…
SYS
ALGO
STAT
VERB
LUCK(Serendipity)
BIO
Let’s ALL try to be Bioinformaticians!
SYStems: ability to identify, understand, run, troubleshoot existing bioinformatics tools and techniques. (near mastery skill needed ASAP)
ALGOrithms: ability to create a new algorithm or to implement these as a software tool. (near freshman skill needed to start)
STATistics: ability to identify proper statistical method and to devise a new statistical approach to extract knowledge from data. (above apprentice skill needed ASAP)
BIOlogy: ability to interpret bioinformatics results in the proper biological context. (well above apprentice skill needed ASAP)
VERBal: ability to understand the needs of individuals from diverse backgrounds and ability to communicate with them with their discipline language. (near mastery skill needed ASAP)
LUCK: ability to be in the right place at the right time and have the skill to work on unexpected tasks. (CHANCE favors ONLY the Prepared Minds !!!)
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 30
Learning Life through Bioinformatics…
Five websites that all Bioinformaticians should know
• NCBI (The National Center for Biotechnology Information)– http://www.ncbi.nlm.nih.gov/
• EBI (The European Bioinformatics Institute)– http://www.ebi.ac.uk/
• The Canadian Bioinformatics Resource– http://www.cbr.nrc.ca/
• SwissProt/ExPASy (Swiss Bioinformatics Resource)– http://expasy.cbr.nrc.ca/sprot/
• PDB (The Protein Databank)– http://www.rcsb.org/PDB/
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 31
Learning Life through Bioinformatics…
• First: – Fix your Critical weakness.
• Second: – Where should you invest next?
• To strengthen your stronger skills? … Or …• To improve on your weaker skills?• Answer ???
Continued…….
Let’s ALL try to be Bioinformaticians!
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 32
Learning Life through Bioinformatics…
Invest into what you are already good at!
People with complementary skill-sets are more valuable
Differentiate yourself
Pick a paper that interests you and redo it! Compute the same quantities for a different genome/annotations
Strengthen Your Stronger Skills
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 33
Learning Life through Bioinformatics…
So… what is Bio-Bio-1?
• A TEAM with Passion for Learning Life through Bioinformatics Knowledge and Skills…
• A TEAM with BIG Dreams to help flourishing the Bioinformatics discipline in Bangladesh and in the World…
• A voluntary not-for-profit organization, formed by some passionate individuals in the late 2008 to learn Bioinformatics for making some senses from the enigma of life.
• Aims to spread the R&D excitement by infecting the young individuals through several programs (weekly study circles, workshops, etc)
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 34
Learning Life through Bioinformatics…
Bio-Bio-1 Main Objectives
• Learn Bioinformatics from closely interacting multiple academic and professional disciplines, including:– Life Sciences, Computing Sciences, Mathematics, Statistics,
Software Engineering, High Performance Computing and Large Scale Database optimization
• Popularize and spread the need for Bioinformatics learning among the local students and professionals in Bangladesh.
• Procure offshore sourcing programming and development projects in Bioinformatics from abroad.
• Write practical handbooks in Bioinformatics and publish papers in reputed journals.
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 35
Learning Life through Bioinformatics…
Bio-Bio-1 Current Main Activities
• Organizing regular weekly Bioinformatics Study Circle– Every Saturday at KAL Gallery, Dept. of Biochemistry and
Molecular Biology, University of Dhaka.
• Organizing Hands-On Bioinformatics Boot Camps & Workshops.
• Collaboratively working with Microbial Genetics and Bioinformatics Lab, Dept. of Microbiology, University of Dhaka for the projects:– “FMDV vaccine design and development” with Prof. Dr. Anwar
Hossain.
– and with Prof. Dr. Mojammel Hoque, to identify the factors that increase the productivity and enzyme activity of protease and keratinase of Bacillus Licheniformis.
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 36
Learning Life through Bioinformatics…
• Zohirul Alam Tiemoon– Founder and General Manager, Bio-Bio-1
Consultant, BASIS (www.basis.org.bd), Dhaka.
– Leading the Bio-Bio-1 v3.0 with new vigor!
• Saddam Hossain– Founding Core Member and Chief Researcher, Bio-Bio-1. Head,
Business Intelligence, Airtel Bangladesh
– Re-incarnated Bio-Bio-1 from long hibernation! Leading the R&D Projects both in Bio-Bio-1 v2.0 and v3.0!
• Farjana Khatun– Founding Core Member and Coordinator, Bio-Bio-1. Lecturer,
Department of Pharmacy, East West University
– The Pioneer Biology knowledge provider in Bio-Bio-1. Patient and Persistent Coordinator!
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 37
Bio-Bio-1 Prime Movers (& Shakers)
Learning Life through Bioinformatics…
Bio-Bio-1 Prime Movers (& Shakers)
• Mosharraf Hossain– Core Member and Coordinator, Bio-Bio-1. Head, Operations
Support System & Business Support System, Novo Tel Ltd.
– Passionate Mentor for the Bio-Bio-1 Study Circle Participants!
• Arafat Rahman– Core Member and Researcher, Bio-Bio-1. Research Student,
Microbial Genetics and Bioinformatics Lab, Dept. of Microbiology, University of Dhaka.
– Great Analytical Mind with knowledge and interest in diverse scientific disciplines! Still a great Bio-Bio-1 Anchor!
• Arif Ashraf Opu– Core Member and Researcher, Bio-Bio-1. Research Student,
Plant Biotechnology Lab, University of Dhaka
– Great Out-of-the-Box Thinker with lucid Presentation Skills!
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 38
Learning Life through Bioinformatics…
Credits / Acknowledgments
• www.cbs.dtu.dk/phdcourse/cookbooks/What_is_bioinformatics.ppt
• Prof. Zeba Islam Seraj, Professor, Dept of Biochemistry and Molecular Biology, Dhaka University
• Prof. Supten Sarbadhikary, India - Chair, HL7 India, Visiting Professor, Dept of Health Informatics, Bangladesh University of Health Sciences
• The Bio-Bio-1 CORE TEAM• The Google & The WWW
4th July, 2013 Bioinformatics Workshop @ UODA, Dhaka 39
Learning Life through Bioinformatics…
Thank You very much for all your time and patience
Hope You will enjoy this 2-days workshop in UODA!