Top Banner
16

2015 bioinformatics bio_python_part4

Jan 11, 2017

Download

Education

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2015 bioinformatics bio_python_part4
Page 2: 2015 bioinformatics bio_python_part4

FBW1-12-2015

Wim Van Criekinge

Page 3: 2015 bioinformatics bio_python_part4

GitHub: Hosted GIT

• Largest open source git hosting site• Public and private options• User-centric rather than project-centric• http://github.ugent.be (use your Ugent

login and password)– Accept invitation from Bioinformatics-I-

2015URI:– https://github.ugent.be/Bioinformatics-I-

2015/Python.git

Page 4: 2015 bioinformatics bio_python_part4

Control Structures

if condition: statements[elif condition: statements] ...else: statements

while condition: statements

for var in sequence: statements

breakcontinue

Page 5: 2015 bioinformatics bio_python_part4

Extra Questions (2)

• How many human proteins in Swiss Prot ?• What is the longest human protein ? The shortest ?• Calculate for all human proteins their MW and pI, display as

two histograms (2D scatter ?)• How many human proteins have “cancer” in their description?• Which genes has the highest number of SNPs/somatic

mutations (COSMIC)• How many human DNA-repair enzymes are represented in

Swiss Prot (using description / GO)?• List proteins that only contain alpha-helices based on the

Chou-Fasman algorithm• List proteins based on the number of predicted

transmembrane regions (Kyte-Doollittle)

Page 6: 2015 bioinformatics bio_python_part4

Primary sequence reveals important clues about a protein

DnaG E. coli ...EPNRLLVVEGYMDVVAL...DnaG S. typ ...EPQRLLVVEGYMDVVAL...DnaG B. subt ...KQERAVLFEGFADVYTA...gp4 T3 ...GGKKIVVTEGEIDMLTV...gp4 T7 ...GGKKIVVTEGEIDALTV...

: *: :: * * : :

small hydrophobiclarge hydrophobicpolarpositive chargenegative charge

• Evolution conserves amino acids that are important to protein structure and function across species. Sequence comparison of multiple “homologs” of a particular protein reveals highly conserved regions that are important for function.

• Clusters of conserved residues are called “motifs” -- motifs carry out a particular function or form a particular structure that is important for the conserved protein.

motif

Page 7: 2015 bioinformatics bio_python_part4

The hydropathy index of an amino acid is a number representing the hydrophobic or hydrophilic properties of its side-chain.

It was proposed by Jack Kyte and Russell Doolittle in 1982.

The larger the number is, the more hydrophobic the amino acid. The most hydrophobic amino acids are isoleucine (4.5) and valine (4.2). The most hydrophilic ones are arginine (-4.5) and lysine (-3.9).

This is very important in protein structure; hydrophobic amino acids tend to be internal in the protein 3D structure, while hydrophilic amino acids are more commonly found towards the protein surface.

Hydropathy index of amino acids

Page 8: 2015 bioinformatics bio_python_part4

5-hydroxytryptamine receptor 2A isoform 1 [Homo sapiens]NCBI Reference Sequence: NP_000612.1GenPept Identical Proteins Graphics>gi|10835175|ref|NP_000612.1| 5-hydroxytryptamine receptor 2A isoform 1 [Homo sapiens]MDILCEENTSLSSTTNSLMQLNDDTRLYSNDFNSGEANTSDAFNWTVDSENRTNLSCEGCLSPSCLSLLHLQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYRWPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVGISMPIPVFGLQDDSKVFKEGSCLLADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEATLCVSDLGTRAKLASFSFLPQSSLSSEKLFQRSIHREPGSYTGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESCNEDVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILVNTIPALAYKSSQLQMGQKKNSKQDAKTTDNDCSMVALGKQHSEEASKDNSDGVNEKVSCV

Page 9: 2015 bioinformatics bio_python_part4

(http://gcat.davidson.edu/DGPB/kd/kyte-doolittle.htm)Kyte Doolittle Hydropathy Plot

Page 10: 2015 bioinformatics bio_python_part4

Possible transmembrane fragment

Page 11: 2015 bioinformatics bio_python_part4

Window size – 9, strong negative peaks indicate possible surface regions

Surface region of a protein

Page 12: 2015 bioinformatics bio_python_part4
Page 13: 2015 bioinformatics bio_python_part4

Prediction of transmembrane helices in proteins(TMHMM)

Page 14: 2015 bioinformatics bio_python_part4

5-hydroxytryptamine receptor 2A (Mus musculus)

Page 15: 2015 bioinformatics bio_python_part4

5-hydroxytryptamine receptor 2 (Grapical output)

Page 16: 2015 bioinformatics bio_python_part4

Examen.py

http://bioinformatics.biobix.be/examen/

Check availability of PC rooms plus additional dates