Top Banner
Molecular Biomedical Informatics 分分分分分分分分分 Machine Learning and Bioinformatics 分分分分分分分分分分 Machine Learning & Bioinformatics 1
67

Machine Learning and Bioinformatics 機器學習與生物資訊學

Jan 11, 2016

Download

Documents

Dennis McElroy

Machine Learning and Bioinformatics 機器學習與生物資訊學. Molecular biology. Nucleic acid DNA RNA Central dogma Transcription Translation. Protein Amino acid Primary structure Secondary structure Tertiary structure. Nucleic acid. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning & Bioinformatics 1

Molecular Biomedical Informatics分子生醫資訊實驗室Machine Learning and Bioinformatics機器學習與生物資訊學

Page 2: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning & Bioinformatics 2

Molecular biology Nucleic acid

– DNA

– RNA

Central dogma– Transcription

– Translation

Protein– Amino acid

– Primary structure

– Secondary structure

– Tertiary structure

Page 3: Machine Learning and Bioinformatics 機器學習與生物資訊學

Nucleic acid A nucleic acid is a macromolecule composed

of chains of monomeric nucleotide In biochemistry these molecules carry genetic

information or form structures within cells The most common nucleic acids are

deoxyribonucleic acid (DNA) and ribonucleic

acid (RNA)

Machine Learning & Bioinformatics 3

Page 5: Machine Learning and Bioinformatics 機器學習與生物資訊學

Nucleic acid components

Sugar

Machine Learning & Bioinformatics 5

http://www.mun.ca/biology/scarr/Fg10_09b_revised.gif

Page 6: Machine Learning and Bioinformatics 機器學習與生物資訊學

Nucleic acid components

Base Purine

–Adenine (A) and guanine (G)

Pyrimidine–Thymine (T), cytosine (C)

–Uracil (U, only in RNA)

Machine Learning & Bioinformatics 6

Page 9: Machine Learning and Bioinformatics 機器學習與生物資訊學

DNA Chemically, DNA is a long polymer of simple units

called nucleotides, with a backbone made of sugars

and phosphate groups joined by ester bonds Attached to each sugar is one

of four types of molecules

called bases It is the sequence of these four

bases along the backbone that

encodes informationMachine Learning & Bioinformatics 9

http://upload.wikimedia.org/wikipedia/commons/8/87/DNA_orbit_animated_small.gif

Page 10: Machine Learning and Bioinformatics 機器學習與生物資訊學

DNA

Base pairing Each type of base on one strand forms a bond with

just one type of base on the other strand Here, purines form hydrogen bonds to pyrimidines,

with A bonding only to T, and C bonding only to G DNA sequence

– 5’CpGpCpApApTpT

3’TpTpApApCpGpC

– CGCGAATT

Machine Learning & Bioinformatics 10

Page 13: Machine Learning and Bioinformatics 機器學習與生物資訊學

Hydrogen bond A hydrogen bond exists between an electronegative atom and

a hydrogen atom bonded to another electronegative atom This type of force always involves a hydrogen atom and the

energy of this attraction is close to that of weak covalent

bonds (155 kJ/mol), thus the name – Hydrogen Bonding Biological functions

– DNA/RNA base paring

– protein secondary/tertiary structure formation

– some properties of water molecule

– antibody-antigen (and other protein-protein) binding

Machine Learning & Bioinformatics 13

Page 14: Machine Learning and Bioinformatics 機器學習與生物資訊學

http://upload.wikimedia.org/wikipedia/commons/4/43/Liquid_water_hydrogen_bond.png

Hydrogen bond is resulted from electronegativity

Page 16: Machine Learning and Bioinformatics 機器學習與生物資訊學

DNA structure

Machine Learning & Bioinformatics 16

http://www.youtube.com/watch?v=qy8dk5iS1f0&NR=1

Page 17: Machine Learning and Bioinformatics 機器學習與生物資訊學

Any Questions?

Machine Learning & Bioinformatics 17

About DNA

Page 18: Machine Learning and Bioinformatics 機器學習與生物資訊學

http://fig.cox.miami.edu/~cmallery/255/255hist/mcb4.1.dogma.jpg

Central dogma

Page 19: Machine Learning and Bioinformatics 機器學習與生物資訊學

Central dogma The process by witch information is extracted from

the nucleotide sequence of a gene and then used to

make a protein is essentially the same for all living

things on Earth

and is described by the grandly

named central dogma of

molecular biology Information in cells passes from

DNA to RNA to proteinsMachine Learning & Bioinformatics 19

http://upload.wikimedia.org/wikipedia/commons/3/3a/Crick's_1958_central_dogma.svg

Page 20: Machine Learning and Bioinformatics 機器學習與生物資訊學

RNA Information stored from DNA is used to make a more

transient, single-stranded polynucleotide called RNA

(Ribonucleic Acid) RNA is very similar to DNA, but differs in a few important

structural details– in the cell RNA is usually single stranded, while DNA is usually

double stranded

– RNA nucleotides contain ribose while DNA contains deoxyribose

(a type of ribose that lacks one oxygen atom)

– in RNA the nucleotide uracil substitutes for thymine, which is

present in DNAMachine Learning & Bioinformatics 20

Page 22: Machine Learning and Bioinformatics 機器學習與生物資訊學

Central dogma

Transcription Transcription is the synthesis of RNA under the

direction of DNA Both nucleic acid sequences use the same

language, and the information is simply

transcribed, or copied DNA sequence is copied by RNA polymerase to

produce a complementary nucleotide RNA

strand, called messenger RNA (mRNA)Machine Learning & Bioinformatics 22

Page 23: Machine Learning and Bioinformatics 機器學習與生物資訊學

DNA transcription

Machine Learning & Bioinformatics 23

http://www.youtube.com/watch?v=vJSmZ3DsntU

Page 25: Machine Learning and Bioinformatics 機器學習與生物資訊學

RNA

Various types mRNA

– messenger RNA (mRNA) is the RNA that carries

information from DNA to the ribosome

– the coding sequence of the mRNA determines the

amino acid sequence in the protein that is produced

Non-coding RNA

Machine Learning & Bioinformatics 25

Page 26: Machine Learning and Bioinformatics 機器學習與生物資訊學

Various RNA types

Non-coding RNA Many RNAs do not code for protein These ncRNAs encode in specific genes (RNA

genes) or mRNA introns The most common ncRNAs are transfer RNA

(tRNA) and ribosomal RNA (rRNA) Other ncRNAs such as microRNA (miRNA)

involve in post-transcriptional gene regulation

Machine Learning & Bioinformatics 26

Page 28: Machine Learning and Bioinformatics 機器學習與生物資訊學

Central dogma

Translation Translation is the second stage of protein

biosynthesis Translation occurs in the cytoplasm where the

ribosomes are located In translation, mRNA is decoded to produce a

specific polypeptide according to the rules

specified by the genetic code

Machine Learning & Bioinformatics 28

Page 29: Machine Learning and Bioinformatics 機器學習與生物資訊學

From RNA to protein synthesis

Machine Learning & Bioinformatics 29

http://www.youtube.com/watch?v=NJxobgkPEAo

Page 30: Machine Learning and Bioinformatics 機器學習與生物資訊學

Protein translation

Machine Learning & Bioinformatics 30

http://www.youtube.com/watch?v=nl8pSlonmA0

Page 32: Machine Learning and Bioinformatics 機器學習與生物資訊學

Any Questions?

Machine Learning & Bioinformatics 32

About central dogma

Page 33: Machine Learning and Bioinformatics 機器學習與生物資訊學

Protein

33Machine Learning & Bioinformatics

Page 34: Machine Learning and Bioinformatics 機器學習與生物資訊學

Protein Proteins are large organic compounds made of amino

acids arranged in a linear chain and joined together

by peptide bonds between the carboxyl and amino

groups of adjacent amino acid residues Proteins can also work together to achieve a

particular function, and they often associate to form

stable complexes

Machine Learning & Bioinformatics 34

Page 35: Machine Learning and Bioinformatics 機器學習與生物資訊學

Protein

Amino acid In chemistry, an amino acid is a molecule that

contains both amine and carboxyl functional

groups In biochemistry, this term refers to alpha-

amino acids with the general formula

H2NCHRCOOH, where R is an organic

substituent

Machine Learning & Bioinformatics 35

Page 37: Machine Learning and Bioinformatics 機器學習與生物資訊學

Amino acid

Various side chains The various alpha amino acids differ in which

side chain (R group) is attached to their alpha

carbon They can vary in size from just a hydrogen

atom in glycine through a methyl group in

alanine to a large heterocyclic group in

tryptophan

Machine Learning & Bioinformatics 37

Page 41: Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning & Bioinformatics 41

http://www.russell.embl-heidelberg.de/aas/other_images/lb3.gif

Page 42: Machine Learning and Bioinformatics 機器學習與生物資訊學

Amino acid

The building blocks of proteins Amino acids combine in a condensation

reaction and the new “amino acid residue” are held together by peptide bonds

Proteins are defined by their unique sequence of residues (primary structure)

As the letters form various words, amino acids form a vast variety of sequences/proteins

Machine Learning & Bioinformatics 42

Page 46: Machine Learning and Bioinformatics 機器學習與生物資訊學

Protein

After knowing amino acids Amino acids form short polymer chains called

peptides or longer chains called either

polypeptides or proteins The process of such formation from an mRNA

template (obeying genetic code) is known as

translation, which is part of protein

biosynthesis

Machine Learning & Bioinformatics 46

Page 47: Machine Learning and Bioinformatics 機器學習與生物資訊學

Protein structure hierarchy

47Machine Learning & Bioinformatics

Page 52: Machine Learning and Bioinformatics 機器學習與生物資訊學

Protein structure hierarchy

Secondary structure In biochemistry and structural biology,

secondary structure is the general three-

dimensional form of local segments of

biopolymers such as proteins and nucleic acids It does not, however, describe specific atomic

positions in three-dimensional space, which

are considered to be tertiary structure

Machine Learning & Bioinformatics 52

Page 54: Machine Learning and Bioinformatics 機器學習與生物資訊學

Protein structure hierarchy

Tertiary structure The three-dimensional structure of a protein or

any other macromolecule, as defined by the

atomic coordinates Describe the spatial relations among it

secondary structures Tertiary structure is considered to be largely

determined by the protein’s primary sequence

Machine Learning & Bioinformatics 54

Page 55: Machine Learning and Bioinformatics 機器學習與生物資訊學

Protein tertiary structure

Experiment techniques The majority of protein structures have been

solved with X-ray crystallography The second common way is NMR (Nuclear

Magnetic Resonance)– lower resolution

– limited to small proteins

– provide time-dependent information in solution

Machine Learning & Bioinformatics 55

Page 57: Machine Learning and Bioinformatics 機器學習與生物資訊學

Protein structure hierarchy

Quaternary structure Many proteins are actually

assemblies of more than one

polypeptide chain, which in the

context of the larger assemblage

are known as protein subunits In addition to the tertiary structure

of the subunits, multiple-subunit

proteins possess a quaternary

structure, which is the arrangement

into which the subunits assembleMachine Learning & Bioinformatics 57

http://courses.cm.utexas.edu/jrobertus/ch339k/overheads-1/ch6_quat-struct1.jpg

Page 58: Machine Learning and Bioinformatics 機器學習與生物資訊學

Protein sub-structure

58Machine Learning & Bioinformatics

Page 59: Machine Learning and Bioinformatics 機器學習與生物資訊學

Protein sub-structure

Domain A part of protein sequence

and structure that can

evolve, function, and exist

independently About 25–500 aa Often form functional

units

Machine Learning & Bioinformatics 59

http://upload.wikimedia.org/wikipedia/commons/6/67/1pkn.png

Page 60: Machine Learning and Bioinformatics 機器學習與生物資訊學

http://upload.wikimedia.org/wikipedia/commons/7/79/Zinc_finger_DNA_complex.png

Zinc fingers are small protein structural motifs that can coordinate zinc ions to help stabilize their folds

Page 61: Machine Learning and Bioinformatics 機器學習與生物資訊學

Protein sub-structure

Motif A sequence motif indicate a nucleotide or

amino-acid sequence pattern that is

widespread and often has a biological

significance For proteins, a sequence motif is distinguished

from a structural motif, a motif formed by the

three dimensional arrangement of amino acids,

which may not be adjacentMachine Learning & Bioinformatics 61

Page 62: Machine Learning and Bioinformatics 機器學習與生物資訊學

Protein sub-structure

Structure motif A 3D structural element or fold, which appears

also in a variety of other molecules In the context of proteins, the term is

sometimes used interchangeably with

“structure domain,” although a domain need

not be a motif nor, if it contains a motif, need

not be made up of only one

Machine Learning & Bioinformatics 62

Page 63: Machine Learning and Bioinformatics 機器學習與生物資訊學
Page 66: Machine Learning and Bioinformatics 機器學習與生物資訊學

Molecular biology

Reference 台大莊榮輝教授網站

– http://juang.bst.ntu.edu.tw/BC2008/index.htm

交大分子生物學網站– http://www.life.nctu.edu.tw/~mb/c40101.htm

Machine Learning & Bioinformatics 66

Page 67: Machine Learning and Bioinformatics 機器學習與生物資訊學

Any Questions?

Machine Learning & Bioinformatics 67

About molecular biology