Top Banner
BIOINFORMATIK I UEBUNG 2 http://icbi.at/ bioinf
26
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

BIOINFORMATIK I UEBUNG 2

http://icbi.at/bioinf

Page 2: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

mRNA processing

Page 3: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

splicing

Page 4: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

U2AFGU YAGA

YAG

U1U4 U6

U5

GU

U2A

Spliceosome assembly

+ ~200 non-snRNPproteins

U4

U1

hnRNP

SR proteins

RNA helicases

kinases and phosphatases

Cyclophilins

U4 U6

U5

U2U6

U5YAGA

GUU1

Page 5: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Different levels of regulation

Page 6: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Regulation of transcription

Page 7: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Farnham, Nature Rev Genetics, 2009

ChIP procedure

AACTAGGTCAAAGGTCA

A/B A/B

E/F E/F

C C PPRE

PPAR RXRPPAR RXR

PPREDNA

Page 8: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

microRNAs

http://www.mirbase.org/

Page 9: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Ensembl BioMart

Page 10: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

UCSC Table Browser

Page 11: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

UCSC Table Browser

Page 12: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Notepad++ and regular expressions

^ > . * \r \n

begin of line> any symbol

0 or more times

carriage return (CR) line feed (LF)

Page 13: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Notepad++ and regular expressions

character meaning

\ escape; used to make specials non-special

() group; you can retrieve its contents e.g. with \1 for the first occurrence

[] any character inside is considered a match

. matches any character

* match the previous character 0 or more times

+ match the previous character 1 or more times

{n} match the previous character n times

^ if the first character in the regex, means “beginning of line”; inside [] means “not”

$ last character in the regex, means “end of line”

\s any space character (space, tab)

\t tab (-->)

\r carriage return (CR)

\n line feed (LF)

Page 14: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Notepad++ and regular expressions

^[ACGT].*\r\n replace with

^(.{20}).*\r\n replace with \1\r\n

^>.*\r\n replace with

Page 15: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

\r\n replace with

> replace with \r\n>

repeatMasking=none replace with \r\n

^>.*\r\n replace with .*(.{20})$ replace with \1

Page 16: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Sequence Logo

http://icbi.at/logo

Page 17: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

KEGG

Page 18: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Protein domains

Uniprot, Prosite, Interpro, Pfam, CD, SMART

Page 19: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Gene Ontology

• cellular component (e.g. mitochondrium)• biological process (e.g. lipid metabolism)• molecular function (e.g. hydrolase activity)

Each entry in GO has a unique numerical identifier of the form GO:nnnnnnn, and a GO term

The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism.

ISS Inferred from Sequence SimilarityIEP Inferred from Expression PatternIMP Inferred from Mutant PhenotypeIGI Inferred from Genetic InteractionIPI Inferred from Physical InteractionIDA Inferred from Direct AssayRCA Inferred from Reviewed Computational AnalysisTAS Traceable Author StatementNAS Non-traceable Author StatementIC Inferred by CuratorND No biological Data available

3 organizing principles

Evidence code

Directed acyclic graph (DAG) with different levels and 2 relations (part_of, is_a)

Page 20: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Orthologs

Homologs: A – B – C

Orthologs: B1 – C1

Paralogs: C1 – C2 –C3

Inparalogs: C2 – C3

Outparalogs: B2 – C1

Xenologs: A1 – AB1

Protein A

Page 21: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Orthologous prediction

Page 22: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Ortholog databases

• YOGY (eukarYotic OrtholoGY) is a web-based resource and integrates 5 independent resources (Sanger)

• COG Cluster of ortholog groups of proteins and KOG for 7 eukaryotic genomes (NCBI),

• Inparanoid (Center Stockholm Bioinformatics)

• HomoloGene (NCBI)

• OrthoMCL use Markov Clustering algorithm (University of Pennsylvania)

Page 23: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Multiple sequence alignment (CLUSTALW)

Progressive tree alignment

Jalview

Page 24: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Exercise 2-1: REGULATORY GENOMICS

Pyruvate Carboxylase as example

Ensembl Biomart1.1 For the human transcript NM_000920 (pyruvate carboxylase) find official gene symbol, number of exons, Ensembl transcript ID, Ensembl gene ID, 3'UTR sequence as fasta file, length of 3'UTR

microRNA target prediction1.2 Is there a complementary sequence within the 3'UTR of PC to postion 2-8 in the sequence of microRNA hsa-mir-182.

UCSC genome browser1.3 Position of transcript start site and transcription end of Pyruvate carboxylase (NM_000920) in hg19 assembly

Page 25: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Exercise 2-1: REGULATORY GENOMICS

Find splicing signals1.4 Get sequences (+10bp/-10bp) around intron-exon borders and exon-intron borders from pyruvate carboxylase using UCSC table browser and Notepad++1.5 Construct in both cases sequence logo and frequency plot. Can you identify (regulatory) sequence motifs?

Regulatory motifs (transcription factor binding sites) 1.6 We know from Chromatin immunoprecipitation (ChIP-seq) experiments in a mouse cell line that the transcription factor Pparg is binding near the pyruvate carboxylase gene and hence potentially regulate its transcription (ppar.wig). Show binding region as custom track in UCSC genome browser and extract sequence.

Page 26: BIOINFORMATIK I UEBUNG 2 . mRNA processing.

Exercise 2-2: PROTEIN FUNCTION

Identify function /processes/pathways for a protein2.1 What is the function of pyruvate carboxylase and in which pathways and processes this enzyme is involved?Show pathway maps and find Enzyme ID (EC) using KEGGIdentify functional domains and Gene Ontology Annotation of the protein sequence using Uniprot, Prosite, Pfam

Find orthologs and perform multiple sequence alignment2.2 Find ortholog protein sequences in Mus musculus, Rattus norvegicus, Saccharomyces cervisiae, perform multiple sequence alignment using ClustalW, and visualize with Jalview.