Structure Prediction and Modeling of a Eukaryotic Member of the Major Facilitator Superfamily Gaurav Narale
Jan 13, 2016
Structure Prediction and Modeling of a Eukaryotic
Member of the Major Facilitator Superfamily
Gaurav Narale
Major Facilitator Superfamily (MFS)
• MEMBRANE TRANSPORT
• Largest secondary transporter protein family known so far with more than 1000 members identified.1
• Use a solute gradient to drive the translocation of substrates such as ions, sugars, amino acids, peptides and other hydrophilic solutes.2
• Typically 400-600 amino acids long.
• 12 transmembrane -helices, with both the N- and C-termini in the cytosol.3
– Two six-helix halves connected by a central loop.
• Found in all three kingdoms of living organisms.
Identifying Templates and Targets
• TEMPLATES - Two known structures:– Lactose Permease (LacY) E. Coli
– Glycerol-3-Phosphate Transporter (GlpT) E. Coli
• Sequence identity between the two is negligible (~9%).
• CE algorithm for structural alignment indicates that they superimpose over most of their chain length (RMSD~3.7Å)
• 1st GOAL: To find a Eukaryotic member of the MFS that shows enough sequence identity with one of the known structures to allow reasonable alignment.
Function and Mechanism of LacY and GlpTBoth use a solute gradient to drive translocation of substrate:
- LacY mediates the coupled transport of lactose and H+
- GlpT catalyzes the exhange of glycerol-3-phosphate for phosphate
Alternating-Access Model-Outward-facing conformation exposed to the extracellular side.-Inward-facing conformation exposed to the cytoplasm.
Ribbon Representation-Amino-terminal domain (blue).-Carboxyl-terminal domain (green).-Bends and other irregularities in the -helices are indicated by deviations from ideally straight and continuous helical ribbon.
Identifying Templates and Targets• Lactose Permease (LacY)
– Obtained protein pdb file from protein data bank (1PV6) and extracted amino acid sequence in FASTA format. www.rcsb.org/pdb
– Searched for a TARGET with high sequence identity using NCBI BLAST. www.ncbi.hlm.nih.gov
1. General search against all organisms: 2 iterations, threshold 0.005- hits were mainly bacterial proteins.
2. Saved the results as a profile (PSSM)3. More sensitive search using the original sequence as well as the saved
profile as input while limiting to a eukaryotic search: 2 iterations, threshold 0.01
– Unable to identify a suitable target.
Identifying Templates and Targets• Glucose-3-Phosphate Transporter (GlpT)
– Obtained protein pdb file from protein data bank (1PW4) and extracted amino acid sequence in FASTA format. www.rcsb.org/pdb
– Searched for a TARGET with high sequence identity using NCBI
BLAST. www.ncbi.hlm.nih.gov
1. General search against all organisms: 2 iterations, threshold 0.005
2. Obtained a suitable TARGET: Glucose-6-Phosphate Translocase
Homo Sapien
3. Utilized BLink to identify several eukaryotic “close targets” for use in multiple sequence alignments.
Multiple sequence alignment
• Only template and target - initial review• Both templates, target and close targets
– 15 proteins similar to the target selected from different species to get a better alignment
– Only template and target extracted• Around 30 % similarity between template and
target• Well distributed alignment
Alignment using FUGUE 10 20 30 40 50
hs1pw4a ( 5 ) fkpaphkarlpaaeidptYrrlrwqIflGIffGyaAYylVRkNFALAMpyQUERY g6pt -------------MAAQGYGYYRTVIFSAMFGGYSLYYFNRKTFSFVMPS aaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaa 60 70 80 90 100 hs1pw4a ( 55 ) L-veqgfsrgDLGfALSGISiAygfSkfimgsvSdrsnPrvfLPaGLilAQUERY g6pt LVEEIPLDKDDLGFITSSQSAAYAISKFVSGVLSDQMSARWLFSSGLLLV aaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaa 110 120 130 140 150 hs1pw4a ( 104 ) AavMlfMGfvpwATssiavMfvlLflCGwfQGmGwpPCgrTmvhwwsqkeQUERY g6pt GLVNIFFAWSSTV----PVFAALWFLNGLAQGLGWPPCGKVLRKWFEPSQ aaaaaaaaa aaaa aaaaaaaaaaaaaaa aaaaaaaaa a 160 170 180 190 200 hs1pw4a ( 154 ) rggivsVwncAhNvggGiPPllFllGmawfndwhAALYmPAfcAilvAlfQUERY g6pt FGTWWAILSTSMNLAGGLGPILATILAQSY-SWRSTLALSGALCVVVSFL aaaaaaaaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaaaa 210 220 230 240 250 hs1pw4a ( 204 ) AfamMrdTpqsCglppiee-----ykndtakqifmqyVlpnklLwyIAiAQUERY g6pt CLLLIHNEPADVGLRNLDPMPSEGKKGSLKEESTLQELLLSPYLWVLSTG aaaa aaaaaa aaaaaaaaa 260 270 280 290 300 hs1pw4a ( 262 ) NvfVyLLRYGiLDwSPtylkevKhfaldkSSwAYflYEyagipGTllCgwQUERY g6pt YLVVFGVKTCCTDWGQFFLIQEKGQSALVGSSYMSALEVGGLVGSIAAGY aaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa 310 320 330 340 350 hs1pw4a ( 312 ) msdkv----------frgnrGaTGvfFMtlVtiaTivywmnpagNptvdmQUERY g6pt LSDRAMAKAGLSNYGNPRHGLLLFMMAGMTVSMYLFRVTVTSDSPKLWIL aaaa aaaaaaaaaaaaaaaaaa aaaaa 360 370 380 390 400 hs1pw4a ( 352 ) iCmivIGflIyGPvmLIglHAleLApkkAagtAagfTglfGylgGSvaAsQUERY g6pt VLGAVFGFSSYGPIALFGVIANESAPPNLCGTSHAIVGLMANVGGFL-AG aaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa 410 420 430 440 450 hs1pw4a ( 402 ) aiVGytvdffgwdgGfmvMigGSilAvilLivVmigekrrheqllqelvpQUERY g6pt LPFSTIAKHYSWSTAFWVAEVICAASTAAFFLLRNIRTKMGRVSKKAE-- aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa33333
MPSA - only template and targetP_1P4W FKPAPHKARLPAAEIDPTYRRLRWQIFLGIFFGYAAYYLVRKNFALAMPYLVEQG-FSRGGLUCOSE6HUMAN -------------MAAQGYGYYRTVIFSAMFGGYSLYYFNRKTFSFVMPSLVEEIPLDKD * * ** .:* **: **: **.*::.** ***: :.:.
P_1P4W DLGFALSGISIAYGFSKFIMGSVSDRSNPRVFLPAGLILAAAVMLFMGFVPWATSSIAVMGLUCOSE6HUMAN DLGFITSSQSAAYAISKFVSGVLSDQMSARWLFSSGLLLVGLVNIFFAWS----STVPVF **** *. * **.:***: * :**: ..* ::.:**:*.. * :*:.: *::.*:
P_1P4W FVLLFLCGWFQGMGWPPCGRTMVHWWSQKERGGIVSVWNCAHNVGGGIPPLLFLLGMAWFGLUCOSE6HUMAN AALWFLNGLAQGLGWPPCGKVLRKWFEPSQFGTWWAILSTSMNLAGGLGPILATI-LAQS .* ** * **:******:.: :*:. .: * :: . : *:.**: *:* : :*
P_1P4W NDWHAALYMPAFCAILVALFAFAMMRDTPQSCGLP-----PIEEYKNDTAKQIFMQYVLPGLUCOSE6HUMAN YSWRSTLALSGALCVVVSFLCLLLIHNEPADVGLRNLDPMPSEGKKGSLKEESTLQELLL .*:::* :.. .::*:::.: :::: * . ** * * *.. :: :* :*
P_1P4W NKLLWYIAIANVFVYLLRYGILDWSPTYLKEVKHFALDKSSWAYFLYEYAGIPGTLLCGWGLUCOSE6HUMAN SPYLWVLSTGYLVVFGVKTCCTDWGQFFLIQEKGQSALVGSSYMSALEVGGLVGSIAAGY . ** :: . :.*: :: **. :* : * : .* * .*: *:: .*:
P_1P4W MSDKVFRGN--------RGATGVFFMTLVTIATIVYWMNPAGN--PTVDMICMIVIGFLIGLUCOSE6HUMAN LSDRAMAKAGLSNYGNPRHGLLLFMMAGMTVSMYLFRVTVTSDSPKLWILVLGAVFGFSS :**:.: * . :*:*: :*:: :: :. :.: :: *:**
P_1P4W YGPVMLIGLHALELAPKKAAGTAAGFTGLFGYLGGSVAASAIVGYTVDFFGWDGGFMVMIGLUCOSE6HUMAN YGPIALFGVIANESAPPNLCGTSHAIVGLMANVGGFLAGLPFSTIAKHYSWSTAFWVAEV ***: *:*: * * ** : .**: .:.**:. :** :*. .: : .: . ::. :
P_1P4W GGSILAVILLIVVMIGEKRRHEQLLQELVPGLUCOSE6HUMAN ICAASTAAFFLLRNIRTKMGRVSKKAE--- : :. :::: * * : . *
Extracted template-targetP_1PW4 -------FKPAPHKARLPAAEIDPTYRRLRWQIFLGIFFGYAAYYLVRKNFALAMPYLVEgi|2765461|e --------------------MAAQGYGYYRTVIFSAMFGGYSLYYFNRKTFSFVMPSLVE . P_1PW4 QGFS---RGDLGFALSGISIAYGFSKFIMGSVSDRSNPRVFLPAGLILAAAVMLFMGFVPgi|2765461|e EIPLD--KDDLGFITSSQSAAYAISKFVSGVLSDQMSARWLFSSGLLLVGLVNIFFAWSS : . : . *. : P_1PW4 WATSS--IAVMFVLLFLCGWFQGMGWPPCGRTMVHWWSQKERGGIVSVWNCAHN--VGGGgi|2765461|e TVP------VFAALWFLNGLAQGLGWPPCGKVLRKWFEPSQFGTWWAILSTSMN--LAGG . : : . .. P_1PW4 IPP-------LLFLLGMAWFN-----------DWHAALYMPAFCAILVALFAFAMMRDTPgi|2765461|e LGP-------ILATILAQSYS------------WRSTLALSGALCVVVSFLCLLLIHNEP : . .. : P_1PW4 QSCGLPPIEEYKNDT-------------------AKQIFMQYVLPNKLLWYIAIANVFVYgi|2765461|e ADVGLRNLDPMPSEG--------------KKGSLKEESTLQELLLSPYLWVLSTGYLVVF :. . : . P_1PW4 LLRYGILDWSPTYLKEVKHFALDK-SSWAYFLYEYAGIPGTLLCGWMSDKVFR-------gi|2765461|e GVKTCCTDWGQFFLIQEKGQSALV-GSSYMSALEVGGLVGSIAAGYLSDRAMAKAGLSNY . . P_1PW4 -GNRGATGVFFMTLVTIATIVYWMNPAG---------------NPTVDMICMIVIGFLIYgi|2765461|e GNPRHGLLLFMMAGMTVSMYLFRVTVTSD-----------S--PKLWILVLGAVFGFSSY P_1PW4 GP-VMLIGLHALELAPKKAAGTAAGFTGLFGYLGGSVAASAIVGYTVDF-FGWDGGFMVMgi|2765461|e GP-IALFGVIANESAPPNLCGTSHAIVGLMANVG-GFLAGLPFSTIAKH-YSWSTAFWVA : P_1PW4 IGGSILAVILLIVVMIGEKRRHEQLLQELVP-----------------------------gi|2765461|e EVICAASTAAFFLLRNIRTKMGRVSKKAE-------------------------------
Checking alignment in MODELER
Using chk_align.top script
_aln.pos 210 220 230 240 250 260 270
1PW4 MRDTPQSCGLPPIEEYKND/T-----AKQIFMQYVLPNKLLWYIAIANVFVYLLRYGILDWSPTYLKE
G6PT IHNEPADVGLRNLDPMPSE-GKKGSLKEESTLQELLLSPYLWVLSTGYLVVFGVKTCCTDWGQFFLIQ
_consrvd * ** * * ** * ** *
Problem near chain break
_aln.pos 210 220 230 240 250 260 270
1PW4 MRDTPQSCGLPPIEEYKND/----TAKQIFMQYVLPNKLLWYIAIANVFVYLLRYGILDWSPTYLKEV
G6PT IHNEPADVGLRNLDPMPSEGKKGSLKEESTLQELLLSPYLWVLSTGYLVVFGVKTCCTDWGQFFLIQE
_consrvd * ** * * ** * ** *
Modeler Runs
• Using extracted template and target alignment• Sequence for template extracted from structure
using Insight• Missing residues in structure appear as chain
breaks• Parameters:
– OUTPUT_CONTROL = 1 1 1 1 1– STARTING_MODEL= 1– ENDING_MODEL = 5 – LIBRARY_SCHEDULE = 4– MD_LEVEL = 'refine_1'
PROSA 2 runs
• Used to evaluate models
• Models with best scores from MODELER were compared using PROSA
• Z value used for initial comparison
• Graph used to identify location of major violations
Model Selection Criteria
• MODELER log file– Minimum energy
– Number of violations
– Number of really bad violations
– Location of violations with respect to alignment and structure
• PROSA 2 log file– Z score closest to template
– Peaks and troughs in graph relative to template
Adjusting the alignment
• Comparison of structures obtained from modeler in Insight
• Alignment violations clearly visible• Criteria for modifying alignment:
– Unequal number of residues in loop– Unsatisfied structural similarity constraints– Residues violating constraints as generated by
modeler
1st run - adjustment in Insight
Loop Modeling
• Modeler Run 2
• Loop Modeling Run 1
Loop modeling
• Generate models based on adjusted alignment• 25 models obtained• Models selected based on minimum energy and
constraint violations• Parameters:
– OUTPUT_CONTROL = 1 1 1 1 1– STARTING_MODEL= 1– ENDING_MODEL = 5 – LIBRARY_SCHEDULE = 2– MD_LEVEL = 'refine_3’– DO_LOOPS = 1 – LOOP_ENDING_MODEL = 5– LOOP_MD_LEVEL = 'refine_3’
Loop Modeling Run 1Best 4 Models Picked
ID1, ID2 : 1 5Current energy : 192PROSA Z score : -6.60( Z score of template : -7.3 )
ID1, ID2 : 3 2Current energy : 387PROSA Z score : -6.57
ID1, ID2 : 4 2Current energy : 363PROSA Z score : -6.76
ID1, ID2 : 5 4Current energy : 242PROSA Z score : -6.3
-------------------------------------------------------------------------------------------------
Feature 25 : Phi/Psi pair of dihedral restraints List of the RVIOL violations larger than : 6.5000
# ICSR RESNO1/2 ATM1/2 INDATM1/2 FEAT restr viol rviol RESTR VIOL RVIOL
7 1360 45D 46K C N 368 370 -68.99 -70.20 30.80 2.20 -62.90 150.55 19.23 7 46K 46K N CA 370 371 109.62 140.40 -40.80 8 1361 46K 47D C N 377 379 173.18 54.50 123.21 12.43 -63.30 132.44 18.20 8 47D 47D N CA 379 380 7.79 40.90 -40.00 9 1362 47D 48D C N 385 387 -138.58 -63.30 76.02 11.52 -63.30 76.02 11.52 9 48D 48D N CA 387 388 -29.45 -40.00 -40.00
12 1369 103F 104A C N 811 813 -69.81 -68.20 21.24 1.77 -62.50 165.18 26.73 12 104A 104A N CA 813 814 124.12 145.30 -40.90 13 1370 104A 105A C N 816 818 -169.75 -62.50 107.58 21.02 -62.50 107.58 21.02 13 105A 105A N CA 818 819 -49.29 -40.90 -40.90
ID1, ID2 : 1 5Current energy : 192.1849
# RESTRAINT_GROUP NUM NUMVI NUMVP RMS_1 RMS_2 MOL.PDF S_i-------------------------------------------------------------------------------------------------25 Phi/Psi pair of dihedral restraints: 64 44 11 36.170 140.638 79.036 1.000
Violations - MODELER log file
1st loop model - violations in Insight
Residue 46 Residue 104
Loop Model Run 1 - adjustment
Loop Modeling 2
• Refinement of Loop Model 1
• Loop Modeling 2
• Modeler Run 3
Loop Modeling Run 2Best 5 Models
ID1, ID2 : 5 1Current energy : 237.4322PROSA Z score : -5.82
ID1, ID2 : 3 1Current energy : 222.2522PROSA Z score : -6.27
ID1, ID2 : 1 1Current energy : 195.7286PROSA Z score : -6.32
ID1, ID2 : 2 4Current energy : 226.8002PROSA Z score : -6.09
ID1, ID2 : 2 2Current energy : 198.0359PROSA Z score : -6.15
-------------------------------------------------------------------------------------------------
Feature 25 : Phi/Psi pair of dihedral restraints List of the RVIOL violations larger than : 6.5000
# ICSR RESNO1/2 ATM1/2 INDATM1/2 FEAT restr viol rviol RESTR VIOL RVIOL
3 1430 45D 46K C N 368 370 -103.79 -118.00 33.92 1.76 -62.90 154.80 22.53 3 46K 46K N CA 370 371 169.89 139.10 -40.80 4 1431 46K 47D C N 377 379 -95.02 -70.90 59.16 2.00 -63.30 119.95 16.85 4 47D 47D N CA 379 380 -155.68 150.30 -40.00 5 1432 47D 48D C N 385 387 -63.33 -70.90 31.08 1.19 -63.30 160.16 19.77 5 48D 48D N CA 387 388 120.16 150.30 -40.00
9 1441 103F 104A C N 811 813 -122.41 -134.00 20.39 1.24 -62.50 166.47 30.50 9 104A 104A N CA 813 814 163.78 147.00 -40.90 10 1442 104A 105A C N 816 818 -64.90 -68.20 29.69 2.28 -62.50 156.71 25.57 10 105A 105A N CA 818 819 115.80 145.30 -40.90
# RESTRAINT_GROUP NUM NUMVI NUMVP RMS_1 RMS_2 MOL.PDF S_i------------------------------------------------------------------------------------------------- 4 Stereochemical improper torsion pot: 156 1 2 1.943 1.943 16.723 1.00025 Phi/Psi pair of dihedral restraints: 67 40 11 34.260 132.074 73.358 1.000
Violations - MODELER log fileID1, ID2 : 1 1Current energy : 195.7286
Loop Model Violation Sites
Refinements in Final Model
• Some regions can be realigned and refined further taking into consideration their energy violations.
• Other tools could be used such as PROCHECK etc in addition to Modeler and PROSA to get further insight into energy details.
• Structural alignment of model with other known transport protein structures might be of some help.