Top Banner
Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)
47

Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Jan 14, 2016

Download

Documents

Darleen Lindsey
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Regulation of Alternative Splicing

Jihye Kim

Oral Preliminary Exam (May 7, 2007)

Page 2: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Outline

• Alternative Splicing Overview• Goal : Investigate “regulation” of AS• Method : Association Rule Mining• Part I : Finding association rules of cis-regulatory

elements involved in alternative splicing

• Part II : Cis-regulatory Motif Combinations Associated with Tissue-specific Alternative Splicing

• Summary• Future Work

Page 3: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Splicing

• Introns are removed and flanking exons are concatenated

• Spliceosome

- snRNPs and other proteins

[image from http://fig.cox.miami.edu/~cmallery/150/gene/c7.17.11.spliceosome.jpg]

Page 4: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Splice Sites

• Recognized by spliceosome• Splice sites are too weak to predict intron

location accurately

[image from http://web-books.com/MoBio/Free/Ch5A4.htm]

5’ 3’

Page 5: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Splicing Factors and Binding Sites

• Assist spliceosome to identify splice sites• Splicing factors

– SR (serine/arginine-rich) proteins

• Exonic and intronic enhancers and silencers (cis-acting)– ESE (A/G rich motifs), ESS (hnRNP), ISE (G triples, UGCAUG), ISS

[Source from Katherina Kechris in Rocky’05 Conference]

Exon Exon 2

Page 6: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Alternative Splicing

• Over 70% in human genome• Major mechanism to generate protein diversity• Highly relevant to disease

– 15% disease-causing mutations affect splicing [Krawczak 1992]

[Krawczak 1992] Krawczak, M., Reiss, J., and Cooper, D.N. 1992 Hum. Genet. 90: 41-54

protein

Pre-mRNA

mRNA

Page 7: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Types of Alternative Splicing

[Source from Cartegni et al. 2002]

Cassette Exon

Page 8: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Investigating Alternative Splicing

• Traditionally, align ESTs and mRNAs to genomic sequences

• Recently, microarray technology

(Splice arrays)– Exon skipping is measured– Hard to measure other types of AS

Page 9: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Previous Work on AS Regulation

• Most methods– use only sequence data– focus on the effect of individual motifs

• Brain-specific exon skipping [Brudno 2001]– 25 brain-specific cassette exons from literature– Over-representation of UGCAUG in downstream intron

• RESCUE-ESE [Fairbrother 2002]– Frequent hexamers in exon by weak splice sites– 10 ESE motifs show enhancer activity in experiment

[Brudno 2001] Brudno M., Gelfand M.S., et al., 2001 NAR 20 (11) 2338-21348[Fairbrother 2002] Fairbrother WG., et al., 2002 Science 9;297(5583):1007-13

Page 10: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

What We Have Done So Far

• Investigate cis-regulatory motifs that influence amount of AS or tissue-specific AS[Jihye Kim, Sihui Zhao, Steffen Heber, “Finding association rules of cis-regulatory elements involved in alternative splicing”, Proceedings of the 45th annual southeast regional conference (ACM-SE) pp. 232 – 237]

[Jihye Kim, Sihui Zhao, Steffen Heber, “Cis-regulatory Motif Combinations Associated with Tissue-specific Alternative Splicing”,7th workshop on Algorithms in Bioinformatics (WABI 2007) (submitted)

– Use mouse splice array data– Apply Association Rule Mining– Investigate motif combination involved in tissue-

specific AS

Page 11: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

AS Datasets in Mouse

• Dataset– Splice Array [Pan 2004]

with 6 probes– 3126 exon skipping

genes in mouse

– %ASex : percentage of exon skipping in 10 tissues

[Pan 2004] Pan, Q., et al., 2004 Mol Cell 16(6):929-942

Aim I-I : representing data context

Page 12: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Association Rule Mining• By Agrawal et al. in 1993• Initially used for Market Basket Analysis

• An association rule is a pattern that states when X occurs, Y occurs with certain probability

• X : antecedent (left-hand-side, lhs), Y : consequent (right-hand-side, rhs)

• Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf)

X Y

Page 13: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Rule Strength Measures

• Given a rule,

– Support = Pr(X∧Y)

– Confidence = Pr(Y | X)

– Lift = Pr(X∧Y)/ Pr(X)Pr(Y)• Dependency of lhs and rhs• Generally, lhs and rhs have positive dependency

if lift >1.0

X Y

Page 14: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Page 15: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Min supp = 0.5 Min conf = 0.7

Frequent Itemset = itemset whose support > 0.5

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Page 16: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Min supp = 0.5 Min conf = 0.7

Frequent Itemsets (support)

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Page 17: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Min supp = 0.5 Min conf = 0.7

Frequent Itemsets (support)

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Bread(2/5 < 0.5)

Page 18: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Min supp = 0.5 Min conf = 0.7

Frequent Itemsets (support)

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Beer (0.8)Beer (0.8), Jam (0.6),

Diaper (0.6)

{Beer, Diaper} (0.6)

Page 19: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Min supp = 0.5 Min conf = 0.7

Frequent Itemsets

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Beer (0.8), Jam (0.6),

Diaper (0.6)

{Beer, Diaper} (0.6)

Association Rules (confidence)

Page 20: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Min supp = 0.5 Min conf = 0.7

Frequent Itemsets

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Beer (0.8), Jam (0.6),

Diaper (0.6)

{Beer, Diaper} (0.6)

Association Rules (confidence)

Beer => Jam (2/4 < 0.7)

Page 21: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

Min supp = 0.5 Min conf = 0.7

Frequent Itemsets

Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana

Cart 2 : Beer, Nuts, Tissue, Diaper

Cart 3 : Apple, Beer

Cart 4 : Jam, Beer, Diaper

Cart 5 : Bread, Butter, Tissue, Jam

Beer (0.8), Jam (0.6),

Diaper (0.6)

{Beer, Diaper} (0.6)

Association Rules (confidence)

Beer => Diaper (0.75)

Page 22: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Apriori Algorithm

• Most popular algorithm

• Two steps:– Find all itemsets that satisify min_supp.

(frequent itemsets)• any subset of a frequent itemset is also frequent• Find all 1-item frequent itemsets; then all 2-item

frequent itemsets, and so on.

– Generate Rules• A B is an association rule if

Confidence(A B) ≥ min_conf

Page 23: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Part I : Finding association rules of cis-regulatory elements involved in alternative splicing[Proceedings of the 45th annual southeast regional conference (ACM-SE) Winston-Salem, North Carolina pp. 232 – 237]

Page 24: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

K-mers Around Cassette Exon (items)

• Pre-mRNA sequences– Transcripts from NCBI– BLAT to align transcripts

to mouse genome– 200 bps from 7 regions

around cassette exon– 2565 genes in total

• Items (6mers) :AAAAAA to TTTTTT in region 1 … 7

Aim I-I : representing data context

Page 25: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM in Finding AS Motif Rule

• Items : all possible hexamers (motifs)• Transactions : 2565 AS genes• Goal : finding motif association rules in AS

genes. (e.g., AGGATA TTAGCT)• By Apriori algorithm [Agrawal 1993]

Find All Frequent Hexamers

Generate Hexamer Rules

[Agrawal 1993] Agrawal R., Imielinski T., Swami AN., 1993 SIGMOD 22(2):207-216

Page 26: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

[Example]

Seq 1 : ACGATTAGG

Seq 2 : GAATAGG

Seq 3 : TGCAGG

Seq 4 : GGATTAGG

Seq 5 : CAGAT

Min support = 0.5

Min confidence = 0.7

Page 27: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

[Example]

Seq 1 : ACGATTAGG

Seq 2 : GAATAGG

Seq 3 : TGCAGG

Seq 4 : GGATTAGG

Seq 5 : CAGAT

Min support = 0.5

Min confidence = 0.7

- Frequent 3-mer sets (support)AGG (0.8),

Page 28: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

[Example]

Seq 1 : ACGATTAGG

Seq 2 : GAATAGG

Seq 3 : TGCAGG

Seq 4 : GGATTAGG

Seq 5 : CAGAT

Min support = 0.5

Min confidence = 0.7

- Frequent 3mers sets (support)AGG (0.8), GAT (0.6), TAG (0.6),{AGG,TAG} (0.6)

Page 29: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

[Example]

Seq 1 : ACGATTAGG

Seq 2 : GAATAGG

Seq 3 : TGCAGG

Seq 4 : GGATTAGG

Seq 5 : CAGAT

Min support = 0.5

Min confidence = 0.7

- Frequent 3mers sets (support)AGG (0.8), GAT (0.6), TAG (0.6),{AGG,TAG} (0.6)

- Rules (confidence)AGG GATconf = 2 / 4 = 0.5 < minconf

Page 30: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

ARM Example

[Example]

Seq 1 : ACGATTAGG

Seq 2 : GAATAGG

Seq 3 : TGCAGG

Seq 4 : GGATTAGG

Seq 5 : CAGAT

Min support = 0.5

Min confidence = 0.7

- Frequent 3mers sets (support)AGG (0.8), GAT (0.6), TAG (0.6),{AGG,TAG} (0.6)

- Rules (confidence)AGG TAG (0.75)TAG AGG (1.0)

Page 31: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Motif Association Rules from AS Genes

1 2 3 4 5 6 7

- 7_TGAAGA, 7_GAAGAA (ASF/SF2, SRp55)

- 6_TTTTCT, 6_AATAAA, …

- Among 6,000 6-mers, 1/3 are in AEDB

- Candidates of regulatory motifs

Association Rules

Minconf = 0.4

Frequent 6-mers

Minsup = 0.05 (129 genes)

- 7_AAAAAT 7_TGAAGA, 7_AAAGGA 7_AGAAGA,

- 7_GAAAAA 7_AAGAAG, 7_CTGCCT 7_CTGGAG,

- 7_AGGAAA 7_AAGAAG, 7_AATAAA 7_AAGAAG

- Candidates of regulatory combinations for AS

Aim I-II : finding motif association rules for all AS genes

Page 32: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Clustering by AS Pattern in 10 Tissues

• Hypothesize : Motif combinations “cause” AS profile• Cluster genes based on AS profile. We use

– Euclidean distance / Correlation – Average linkage clustering

• Frequent 6-mers in cluster are motif candidates

Aim I-III : finding motif association rules for cluster

Page 33: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Association Rules from Clusters

1 2 3 4 5 6 7

• Lift (XY) > 2.0• Comparison with outside the

cluster (p-value < 2.13e-10)• Association rules are

candidates of motif combinations for the corresponding AS pattern

Correlation based clusters

Aim I-III : finding motif association rules for cluster

Page 34: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Part II : Cis-regulatory Motif Combinations Associated with Tissue-specific Alternative Splicing[7th workshop on Algorithms in Bioinformatics (WABI 2007) (submitted)]

Page 35: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Finding Motifs Involved in Tissue-Specific AS

• Items : – hexamers in gene regions and– exon skipping rate in tissues

• Transactions :– 2565 genes from Pan’s data set

• Goal : find associations AGGATA in cassette exon High exon skipping in Brain

• We focus on complex rules, e.g.{AGGATA in cassette exon, CCTGCG in downstream intron} High exon skipping in Brain

Aim II-I : finding motif association rules for tissue-specific AS

Page 36: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

AS profile items

• Use quartile to convert numeric %ASexes to character AS profile items– BrainLow :The first %ASex

quartile in Brain– BrainHigh : The last %ASex

quartile in BrainBrainLow BrainHigh

Page 37: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Motif Combination ARM Example

[Sequence]

Seq 1 : ACGATTAGG

Seq 2 : GAATAGG

Seq 3 : TGCAGG

Seq 4 : GGATTAGG

Seq 5 : CAGAT

Min support = 0.5

Min confidence = 0.7

[AS profile]

BH, HH

BH, HL

BH, HH

BL, HH

BH, HL

BH : BrianHighBL : BrainLowHH : HeartHighHL : HeartLow

+

Page 38: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Motif Combination ARM Example

Page 39: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Tissue-Specific AS Motif Combinations

• With strict thresholds– Min_supp = 0.01, Min_conf = 0.5, Min_lift = 1.2– MinLen of lhs = 2 (for complex rule)

• Rule appearance– lhs : hexamers, rhs : AS profile items

• 197 association rules are found in total• 27 complex rules are found

– lhs : combinations of 34 frequent hexamersrhs : AS profile items in tissues

– All rules have >1.9 lift – 23 rules show motif combinations in different regions

Aim II-I : finding motif association rules for tissue-specific AS

Page 40: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Antecedent Consequent Support Confidence Lift

{X4_GCTGGA, X4_TGCTGG} {IntestineLow} 0.016 0.519 2.006

{X4_GCTGGA, X4_TGCTGG} {LungLow} 0.016 0.506 1.961

{X4_TGCTGG, X4_CTGGAG} {IntestineLow} 0.011 0.539 2.083

{X4_TGCTGG, X4_CTGGAG} {LungLow} 0.010 0.5 1.937

{X5_TTTTTA, X7_AGAGGA} {HeartHigh} 0.010 0.510 2.043

{X1_AGCAGC, X5_TTTTTA} {MuscleHigh} 0.010 0.54 2.220

{X1_GAGCAG, X3_TTTTAA} {MuscleHigh} 0.010 0.510 2.096

{X1_GAGCAG, X3_TTCTTT} {LiverHigh} 0.013 0.508 2.048

{X4_AGAAGA, X5_TTATTT} {SalivaryLow} 0.011 0.528 2.066

{X4_AGAAGA, X5_TTATTT} {HeartLow} 0.011 0.528 2.075

{X4_AGAAGA, X5_TTATTT} {KidneyLow} 0.011 0.528 2.023

{X4_AGAAGA, X5_TTATTT} {LiverLow} 0.011 0.528 2.041

{X3_ATTTTT, X6_TTCCTG} {SalivaryHigh} 0.011 0.509 2.031

{X3_TTGTTT, X6_TGTCTC} {LiverHigh} 0.011 0.5 2.017

{X2_GCCTGG, X3_CCTCTG} {LiverLow} 0.011 0.542 2.092

{X2_GTGGGG, X5_TTGTTT} {MuscleHigh} 0.013 0.516 2.120

{X5_ATTTTA, X6_TGCTGT} {SalivaryHigh} 0.010 0.510 2.034

{X5_TCTTTT, X6_TTGTCT} {SalivaryHigh} 0.010 0.634 2.530

{X3_TCTGTT, X6_TTGTCT} {HeartHigh} 0.012 0.527 2.110

{X5_TTTTTA, X6_TTGTCT} {HeartHigh} 0.014 0.507 2.032

{X3_CTCTTT, X5_TTAAAA} {KidneyHigh} 0.010 0.5 2.042

{X2_GGGTGG, X5_TTATTT} {SalivaryHigh} 0.011 0.510 2.032

{X5_TCTTTT, X6_TTTTCA} {IntestineHigh} 0.011 0.5 2.007

{X3_TTTATT, X6_TTTCCT} {IntestineHigh} 0.014 0.522 2.094

{X5_TCTTTT, X5_TTATTT, X5_TTTTTA} {HeartHigh} 0.010 0.5 2.004

{X5_TTCTTT, X5_TATTTT, X5_TTTTCT} {SalivaryHigh} 0.011 0.527 2.104

{X3_TATTTT, X3_ATTTTT, X5_TTGTTT} {BrainHigh} 0.011 0.510 2.084

1 2 34 5 6 7

Aim II-I : finding motif association rules for tissue-specific AS

{5_TTTTTA, 7_AGAGGA} => {HeartHigh}

Page 41: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

AS Profile of Motif Combinations

Aim II- II : analyzing motif combination

1 2 3 4 5 6 7

Page 42: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Summary of Graphs

• In some cases, genes with one motif do not show any different AS profile from all AS genes

• However, often, genes containing all multiple motifs show significantly changed exon skipping levels

• Combination of cis-regulatory motifs can influence AS profile in tissues

Page 43: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

• AEDB in EBI– Transcript regulatory sequences from literature– 292 enhancers and silencers

• >60% extracted frequent hexamers are part of AEDB motifs

• >97% of hexamers involved in complex rules are part of AEDB motifs

Comparison with AEDB

Page 44: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Summary

• Association rule mining (ARM) applied

• Finding motif association rules for AS

• Finding motif association rules for AS clusters

• Finding motif combinations for tissue-specific AS

Page 45: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Future Work

Improve method• Improve motif representation, e.g.

– variable motif length, gapped k-mers– results from motif finding tools

• Improve AS profile representation• Add more features, e.g.

– position and distance between motifs– splice site– exon / intron length– conservation, gene information

• Statistical analysis– Thresholds– Multiple testing

Page 46: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Future Work

• Systematic analysis of simple & complex motifs • Other data sources

– Human splice array [Johnson 2003]– ESTs

• Investigate discovered motifs– Apply motif discovery tools– Analyze genome occurrence– Analyze gene and protein structure

• Build predictive model and apply it (If I have enough time )

• Experimental verification[Johnson 2003] Science. 2003 Dec 19;302(5653):2141-4

Page 47: Regulation of Alternative Splicing Jihye Kim Oral Preliminary Exam (May 7, 2007)

Acknowledgements

• Dr. Steffen Heber

• Dr. Eric A. Stone

• Dr. Zhao-Bang Zeng

• Dr. Barbara Sherry

• Sihui Zhao

• Li Zhang

• Hyunmin Kim

THANK YOU