Protein Mutations and Pathways in Cancer Toward Modular & Combinatorial Therapy Less war !! more science !! Chris Sander Computational & Systems Biology Memorial Sloan-Kettering Cancer Center, New York International Conference on Bioinformatics Asia-Pacific Bioinformatics Network
73
Embed
Protein Mutations and Pathways in Cancer Toward Modular & Combinatorial Therapy Less war !!more science !! Chris Sander Computational & Systems Biology.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Protein Mutations and Pathways in CancerToward Modular & Combinatorial Therapy
Less war !! more science !!
Chris SanderComputational & Systems Biology
Memorial Sloan-Kettering Cancer Center, New York
International Conference on BioinformaticsAsia-Pacific Bioinformatics Network
CancerCancer
Simplicity of phenotypeSimplicity of phenotype
Diversity of implementationDiversity of implementation
Modular therapy!Modular therapy!
Combinatorial therapy!Combinatorial therapy!
Cancer GenomicsCancer Genomics
Functional consequences of somatic Functional consequences of somatic mutationsmutations
Molecular alterations in pathway contextMolecular alterations in pathway context
Somatic mutations in cancer:What are the functional consequences ?
Variant Annotation Top Spec/Cons Probability (%)G719S (in lung cancer; somatic mutation) Yes 99G724S (in lung cancer) Yes 100E734K (in lung cancer) 74L747F (in lung cancer) 98R748P (in lung cancer) 99Q787R (in lung cancer) Yes 73T790M (in lung cancer) Yes 98L833V (in lung cancer) Yes 96V834L (in lung cancer) 98L858R (in lung cancer; somatic mutation) 100L861Q (in lung cancer) 99G873E (in lung cancer) Yes 78R962G (in dbSNP:17337451) 100D761Y (in lung cancer, MSKCC) 96
Assessing the functional consequences of mutations
EGFR_human
CEO algorithmCombinatorial Entropy Optimization
Boris Reva, MSKCC
G E K Q E S S S S Y E P K E E F A Q C V L LG E S L E E A S V N G P F Q Y F Y T V E C LG E S S E V A A Q N V P M L W F Y Q R H V MG E Q V E S S E S Q E P H E E F Y Q I R T LW E S K E E N A V N V P H Q K F F T V L T MK E T N E V P W F K K P M R E F Y S AW G LE E Q S E S A E S Q Q P E E P F Y Q I L E LG E K N E V E A F K L P F R E F Y S V Q R VH E R V E S A A S N V P M E T F Y Q I A E LW E E K E E F A V Y I P L Q P F L T F G R LR E C H E V K A Q Y V P M L E F Y Q V K P WG E T N E E E A F N V P R R V F F S V S N LG E S P E E N F V N V P H Q Y F Y T V E P MT E N P E V E L F K V P F R V F F S L S H YS G W K E E L A V N Q P V Q E F E T F E I EG E A S E V E H Q N V P H L K F Y Q E G P PR E A Q E S Q A S N V P M E T F Y Q V R T L
S G W K E E L A V N Q P V Q E F E T F E I EW E E K E E F A V Y I P L Q P F L T F G R LG E S P E E N F V N V P H Q Y F Y T V E P MG E S L E E A S V N G P F Q Y F Y T V E C LW E S K E E N A V N V P H Q K F F T V L T MT E N P E E E L F K V P F R V F F S L S H YK E T N E E P W F K K P M R E F Y S AW G LG E T N E E E A F N V P R R V F F S V S N LG E K N E E E A F K L P F R E F Y S V Q R VE E Q S E S A E S Q Q P E E P F Y Q I L E LG E Q V E S S E S Q E P H E E F Y Q I R T LG E K Q E S S S S Y E P K E E F A Q C V L LR E A Q E S Q A S N V P M E T F Y Q V R T LH E R V E S A A S N V P M E T F Y Q I A E LR E C H E V K A Q Y V P M L E F Y Q V K P WG E S S E V A A Q N V P M L W F Y Q R H V MG E A S E V E H Q N V P H L K F Y Q E G P P
Defining subfamilies and specificity residues
Input Output
Sub-F
amilies
1
2
3
4
Specificity Residues
Clustering
Conserved Residues
Minimize contrast function = difference between entropies of ordered and disordered clusters of sequences of the same size
S
S’
S-S’=0 S-S’=-9S-S’=-3.5 S-S’=-7.5
ordered
disordered
Q: How one can achieve the most distinctive=informative separation of sequences into clusters?
Goal: S-S’->min
∑ ∏=
=k ki
ki N
NS
20,...,1,, !
!ln
αα
∑∏=
=k
ki
ki
N
NS
20,...,1
,,
~
~
!
!ln
α
α
ikki PNN ,,,
~
αα =
∑ ∑=k k
kkii NNP /,,, αα
)(~
0 ∑ −=Δi
ii SSS
Optimization problem: form clusters (subfamilies) of sequences, so as to minimize the combinatorial entropy difference .
For each column i of the alignment one computes the combinatorial entropy
and the reference entropy :
iS
is the number of sequences in cluster (subfamily) k;
is the number of residues of type α in the column i of the cluster k.
kN
kiN ,,α
The entropy difference , summed up over all columns i, is a measure of the deviation of a given sequence clustering from random. This difference is minimal when each cluster has its distinct type of residues.
ii SS~
−
combinatorial entropy measure of specificity patterns
iS~
Specificity residues - high contrastGlobally conserved residues - low contrast
-400
-350
-300
-250
-200
-150
-100
-50
0
0 30 60 90 120 150 180 210 240 270
Specificity region
Conserved region
Rank of residue position
Contrast entropy difference Family of 390 protein kinases
G E K Q E S S S S Y E P K E E F A Q C V L LG E S L E E A S V N G P F Q Y F Y T V E C LG E S S E V A A Q N V P M L W F Y Q R H V MG E Q V E S S E S Q E P H E E F Y Q I R T LW E S K E E N A V N V P H Q K F F T V L T MK E T N E V P W F K K P M R E F Y S AW G LE E Q S E S A E S Q Q P E E P F Y Q I L E LG E K N E V E A F K L P F R E F Y S V Q R VH E R V E S A A S N V P M E T F Y Q I A E LW E E K E E F A V Y I P L Q P F L T F G R LR E C H E V K A Q Y V P M L E F Y Q V K P WG E T N E E E A F N V P R R V F F S V S N LG E S P E E N F V N V P H Q Y F Y T V E P MT E N P E V E L F K V P F R V F F S L S H YS G W K E E L A V N Q P V Q E F E T F E I EG E A S E V E H Q N V P H L K F Y Q E G P PR E A Q E S Q A S N V P M E T F Y Q V R T L
S G W K E E L A V N Q P V Q E F E T F E I EW E E K E E F A V Y I P L Q P F L T F G R LG E S P E E N F V N V P H Q Y F Y T V E P MG E S L E E A S V N G P F Q Y F Y T V E C LW E S K E E N A V N V P H Q K F F T V L T MT E N P E E E L F K V P F R V F F S L S H YK E T N E E P W F K K P M R E F Y S AW G LG E T N E E E A F N V P R R V F F S V S N LG E K N E E E A F K L P F R E F Y S V Q R VE E Q S E S A E S Q Q P E E P F Y Q I L E LG E Q V E S S E S Q E P H E E F Y Q I R T LG E K Q E S S S S Y E P K E E F A Q C V L LR E A Q E S Q A S N V P M E T F Y Q V R T LH E R V E S A A S N V P M E T F Y Q I A E LR E C H E V K A Q Y V P M L E F Y Q V K P WG E S S E V A A Q N V P M L W F Y Q R H V MG E A S E V E H Q N V P H L K F Y Q E G P P
Defining subfamilies and specificity residues
Input Output
Sub-F
amilies
1
2
3
4
Specificity Residues
Clustering
Conserved Residues
Variant Annotation Top Spec/Cons Probability (%)G719S (in lung cancer; somatic mutation) Yes 99G724S (in lung cancer) Yes 100E734K (in lung cancer) 74L747F (in lung cancer) 98R748P (in lung cancer) 99Q787R (in lung cancer) Yes 73T790M (in lung cancer) Yes 98L833V (in lung cancer) Yes 96V834L (in lung cancer) 98L858R (in lung cancer; somatic mutation) 100L861Q (in lung cancer) 99G873E (in lung cancer) Yes 78R962G (in dbSNP:17337451) 100D761Y (in lung cancer, MSKCC) 96
Assessing the functional consequences of mutations
EGFR_human
OMA - Online Mutation Analysis
www.cbio.mskcc.org/cancergenomics
www.proteinfunction.org
Functional implications of cancer mutationsat the protein level
ERBB2 mutations
L49H no alignment data available
C311R strong functional impact, conserved residue
N319D likely functional impact, conserved and specificity residue