The Magic Fit
The Magic Fit
Goals:
1. Introduction to NCBIs resources and educational tools.
2. Use NCBI Entrez to search for a gene sequence or a protein
sequence.
3. Use NCBI Blast to search for homologous sequences.
4. Use PDB to download protein coordinates.
5. Use Swiss-PDB viewer to analyze protein structures.
6. Use Swiss-PDB viewer to overlay related structures.
California Grade Seven Science Content Standards
Genetics
2d. Students know plant and animal cells contain many thousands
of different genes.
2e. Students know DNA is the genetic material of living
organisms.
Evolution
3a. Students know genetic variation [is a] cause of evolution
and diversity of organisms.
Investigation and Experimentation
7a. Select and use appropriate tools and technology
(computers).
7b. Use a variety of print and electronic resources (including
the World Wide Web) to collect information and evidence.
7c. Communicate the logical connection among science concepts,
data collected and conclusions drawn.
7d. Construct appropriately labeled diagrams to communicate
scientific knowledge.
7e. Communicate the steps and results from an investigation in
written reports and oral presentations.
California Grades 9 to 12 Biology and Life Sciences Content
Standards
Cell Biology
1d. Students know the central dogma of molecular biology.
1h. Students know most macromolecules are synthesized from
precursors.
4. Genes are a set of instructions encoded in the DNA sequence
of each organism that specify the sequence of amino acids in
proteins.
Evolution
8f.Students know how to use DNA or protein sequence comparisons
to show probable evolutionary relationships.
Note: Make a folder on your desktop to save your search results
and files during this activity.
Search for a MAP kinase gene sequence, FUS3 from Saccharomyces
cerevisiae, using Entrez.
1. Go to http://www.ncbi.nlm.nih.gov.
2. Click on All Databases link at the top.
3. Enter Fus3 in the search box and click on Go.
You will see a number on the left side of each database
indicating the number of hits for the search. We will use the
Nucleotide database to search for the FUS3 gene sequence.
4. Click on Nucleotide.
You will see a selection of 20 hits/results listed on the page
from a total of 78 items. More results are listed on adjacent
pages.
You will also see a list of organisms related to the hits on the
top right side of the page.
At the top of the page before the hits, there is a list of three
genes related to our search from NCBI Gene. Our gene for FUS3 from
Saccharomyces cerevisiae is the second one on the list. However,
lets try to refine our results to reduce the number of hits.
5. Click on Limits tab. Select Title under Fields and Genomic
DNA/RNA under molecule. Click on Go. The number of hits should go
down to 7.
6. Click on History tab. You will see a list of two searches we
have done so far under Most Recent Queries.
7. In the search box, clear any text and enter the following
text (do not hit Go at this point):Saccharomyces
cerevisiae[Organism]
Saccharomyces cerevisiae is the name of our organism (bakers
yeast). The quotations tell NCBI to look for the words together as
one unit. The word organism in brackets tells NCBI to limit the
query to only that organism.
8. Left click the number for the latest search result (Search
Fus3 Field:Title Limits:Genomic DNA/RNA) at the top of the list in
Most Recent Queries.
9. In the menu that appears, click on AND. You will see AND (#2)
added to the search box after your text.
Selecting AND tells NCBI to combine the search results for the
new text query in the search box and the query you select from the
history and only return hits that meet both criteria.
10. Click on Go. You should see only a single result.
We have now refined our query to give us exactly what we wanted.
The result is the coding sequence for FUS3 without
upstream/downstream regions or introns.
11. Click on M31132 (accession number) for the result to browse
for further information.
Try the options in the Display and Show drop down menus to see
the possibilities.
Further down on the page, you will see information such as gi
number, some links, the locus for the gene, number of base pairs,
organism, authors and publication reference including link to
PubMed, related protein sequence and finally, the coding sequence
(CDS) for the gene.
An important step in the analysis of genome information is
deciphering the complete coding potential or protein coding
sequence (CDS) region of each gene. CDS is a sequence of
nucleotides that corresponds with the sequence of amino acids in a
protein. A typical CDS starts with ATG and ends with a stop codon.
CDS can be a subset of an open reading frame (ORF) [1].
12. Click back on your browser to return to the results page.
Copy the sequence identification number in the first line
(gi|171532). We will use this for the BLAST search.
Note: You can also get the GI number from the gene information
page but the format is not correct to use in a BLAST search.
Performing a BLAST search
BLAST = Basic Local Alignment Search Tool
1. Go to NCBI homepage. Click on BLAST at the top.
2. Under Basic BLAST, select nucleotide blast.
3. In the box labeled Enter Query Sequence/ Enter accession
number, gi, or FASTA sequence, paste the GI number you copied for
the gene or enter it manually (See figure below).
4. Enter a name for the job. You can keep the default name or
assign your own.
5. Choose Human Genomic + Transcript for the database.
We want to find related genes/transcripts in the Human
Genome/Transcriptome.
6. Under Program Selection, select More dissimilar sequences
(discontiguous megablast).
Food for thought: Selecting Highly similar sequences for program
selection will give you zero results for the search. Message on
screen reads No significant similarity found. Why? (Answer is at
the end of this guided tour on the last page).
7. Click on BLAST.
8. You will see a page similar to below during the search.
9. And finally results with sequence alignments. MAPK1 is the
gene we are interested in.
Search for Fus3 protein sequence using Entrez and BLAST human
proteome for similar proteins.
1. Use the steps above to search for the Fus3 protein
sequence.
2. BLAST it (GI|536007) against human protein database (protein
blast) to search for similar proteins. Enter Homo Sapiens under
Organism to restrict the search to human proteins. You should find
MAPK1 (also known as Erk2) as the top hit.
Take a look at the sequence alignment for Fus3 and Erk2 on the
results page.From the BLAST results and sequence alignment we know
that 50% sequence identity and 68% sequence similarity. Lets
download the protein coordinates for Erk2 and Fus3 from PDB and
look at the structures.
Downloading PDB files for Fus3 and Erk2
You can do this by searching for Fus3 under Structure on Entrez
or directly from PDB website. We will use the RCSB Protein Data
Bank (PDB) database.
RCSB = Research Collaboratory for Structural Bioinformatics
(RCSB); more information at http://home.rcsb.org/.
1. Got to http://www.pdb.org.
2. Enter Fus3 in the search box and click on Site Search.
You will see 7 structure hits for your search. We will look at
Crystal structure of non-phosphorylated Fus3 at the bottom of the
page (PDB ID: 2b9f).
Each structure in the PDB is represented by a 4 character
identifier of the form [0-9][a-z,0-9][a-z,0-9][a-z,0-9]. For
example, 4HHB, 9INS are identification codes for PDB entries for
hemoglobin and insulin. Many of PDB WWW pages, including the PDB
home page, allow you to enter a PDB ID and retrieve information for
the corresponding structure. Historically, 30% of queries to the
PDB sites are of this type [2].
3. Click on 2b9f.
Browse the page to look at the available information such as
title, author information, date the structure information was
deposited in the database, experimental method, molecule, source
and related structures.
4. Click on Display Files in the left panel and then click on
PDB File to open the file.
Take a look at the contents of a typical PDB file. The left
column identifies the type of information in the right column. PDB
files contain a Header, Title, Compound information (protein name,
source, experimental data gathering technique), author, journal,
remarsk, etc. The last part is the list of 3-D coordinates for each
atom in the protein and related heteroatoms such as from water.
5. Save the file in your folder on your desktop. You can save
the file from this page by using Save As or you can go back and use
Download Files feature on the page for 2b9f. Note the location of
the file on your desktop.
Download the PDB file for Erk2 using the steps above. You will
find only one structure that is not in a complex: Structure of
Signal-Regulated Kinase (1erk) from Rat. This structure is fine for
our purpose.
Looking at the 3-D structures of Fus3 and Erk2
Download and extract Swiss-PDB Viewer DeepView into your folder
on the desktop from http://spdbv.vital-it.ch/. The download link is
in the left panel. Once extracted, the viewer is ready for use.
1. Double-click on spdbv application to open the viewer.
2. Select the File menu and Open PDB File to open the pdb file
for Fus3 (2b9f.pdb) from your folder on the desktop.
Take a look at the 3-D structure of Fus3. Select Display menu
and Render in 3D and Render in Solid 3D. Try different color
options under the Color menu (suggestion: Color ( act on ribbon AND
Color ( Secondary Structure). Try the features in the Control Panel
(under Wind menu): Compare left clicks vs. right clicks under
different columns. Left-click selects individual amino acid
residues, Right-click selects all. Right click under the columns
labeled show, side and label to see what happens. I find it easier
to look at the backbone structure without the sidechains.
3. Select the File menu and Open PDB File to open the pdb file
for Erk2 (1erk.pdb) from your folder on the desktop.
Now that you have two structures open, control panel needs to
know which structure you ate working with at the moment. Check or
uncheck the box for Visible to see or hide a structure at the top
of the control panel.
Left-click the name 1erk to select the proper structure.
4. Remove sidechains/labels from the view. Render in 3D and
Solid 3D.
5. Select Fit menu ( Magic Fit ( CA only ( Layers 2b9f and 1erk
( OK.
Watch the structures align in space. Center the structures on
screen using button under the file menu. Compare the two structures
for similarity.
Answer for BLAST search:
Highly similar sequence feature does not work for our blast
search because yeast has relatively less number of introns compared
to the human genome. The introns in the sequence makes the
sequences dissimilar. Selecting this feature makes the search
highly stringent. On the other hand selecting more dissimilar
feature allows for discontinuity in the sequences (discontiguous
sequences).
References:
1. Furuno M et al. CDS annotation in full-length cDNA sequence.
Genome Res. June 2003. [PMID: 12819146]
2.
http://www.rcsb.org/robohelp_f/#site_navigation/introduction_to_site_navigation.htm
OR
Help link from PDB website.
Kandarp Shah (UCI GK-12)Page 33/5/2009