Top Banner
1. PLASMAPPER AIM: To generates and annotate high-quality circular plasmid maps. DESCRIPTION: A particular feature of PlasMapper is its capacity to automatically identify and label the plasmid control sequences found in both eukaryotic and prokaryotic vectors using its own database of common plasmid sequences and common plasmid subsequences. PlasMapper is also able to generate plasmid maps of sufficient quality and resolution that they may be used directly in publications or presentations. The underlying concept behind PlasMapper is to make plasmid annotation trivially simple and to make the sharing of plasmid images and plasmid data as easy as possible for as many computer platforms as possible. SOURCE: wishart.biology.ualberta.ca/PlasMapper METHOD: 1. Collect the sequence for which plasmapper has to design, in Fasta format from NCBI home page. 1
114
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ((abbas bio info soft copy...))

1. PLASMAPPER

AIM: To generates and annotate high-quality circular plasmid maps.

DESCRIPTION: A particular feature of PlasMapper is its capacity to automatically

identify and label the plasmid control sequences found in both eukaryotic and

prokaryotic vectors using its own database of common plasmid sequences and

common plasmid subsequences. PlasMapper is also able to generate plasmid maps of

sufficient quality and resolution that they may be used directly in publications or

presentations. The underlying concept behind PlasMapper is to make plasmid

annotation trivially simple and to make the sharing of plasmid images and plasmid

data as easy as possible for as many computer platforms as possible.

SOURCE: wishart.biology.ualberta.ca/PlasMapper

METHOD:

1. Collect the sequence for which plasmapper has to design, in Fasta format from

NCBI home page.

2. Open the source website: wishart.biology.ualberta.ca/PlasMapper

3. Paste the sequence in fasta format in the space of the home page of the

website.

4. Set the defaults and click ‘graphic map’ to get the result.

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

1

Page 2: ((abbas bio info soft copy...))

OUTPUT:

INTERPRETATION:

PlasMapper uses sequence pattern matching and BLAST alignment to automatically

identify and label common promoters, terminators, cloning sites, restriction sites,

2

Page 3: ((abbas bio info soft copy...))

reporter genes, affinity tags, selectable marker genes, replication origins and open

reading frames

2.RESTRICTION MAPPING BY USING BIO EDIT TOOL

AIM: To do the restriction mapping of the given sequence by using Bio Edit tool.

DESCRIPTION:

Bioedit is a biological sequence alignment editor written for windows of

51981 NT. A rich intuitive multiple document interface with many convenient

features makes alignment, manipulation and viewing of sequences relatively quick

and easy on desktop. Several sequences manipulation and analysis options and fully

automated links to local and www- based analysis programs facilitate an integrated

working environment which allows to view, align and analyze sequences from a

single application with simple point and click operations.

SOURCE:

http://www.mbio.ncsu.edu/bio edit/page2.Html

METHOD:

1) Collect the sequence for which restriction mapping has to be done in ‘fasta’

format from NCBI.

2) Open the source website www.mbio.ncsu.edu/bio edit/page2.Html

3) Download the bio edit tool by using the source website.

4) Open the query sequence inside the tool in the given space.

5) Select the sequence and then do the editing and restriction mapping by

clicking restriction mapping.

6) Save the result page in which sequence has been mapped.

3

Page 4: ((abbas bio info soft copy...))

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

OUTPUT:

4

Page 5: ((abbas bio info soft copy...))

INTERPRETATION:

Restriction mapping of the given sequence has been done, it gives the cutting

number of the various restriction enzymes like BsmI, XcaI, etc. It shows the location

of the restriction site of various enzymes also. This tool is used for recombinant DNA

technology for finding the cutting sites of restriction enzymes present in particular

sequence.

3.PRIMER DESIGNING

AIM: To design the primer of the given query sequence by the using ‘PRIMER 3’

primer design tool.

DESCRIPTION:

Primer 3 is a tool used to choose primes for PCR reactions. Primer 3’s design

is heavily based on earlier implementations of similar programs: Prime (0.5) and

primer V2. Primer 3 can also design hybridization probes and sequencing primers.

SOURCE:

http:// biotools.umassmed.edu/bioapps/primer 3_www.cgi.

METHOD:

1) Collect the sequence for which primer has to design, in Fasta format from

NCBI home page.

2) Open the source website: biotools.umassmed.edu/bioapps/primer 3_www.cgi.

3) Paste the sequence in fasta format in the space of the home page of the

website.

4) Set the defaults and click ‘pick primers’ to get the result.

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

5

Page 6: ((abbas bio info soft copy...))

OUTPUT:

INTERPRETATION:

Primers were designed by using tools. Left primer and Right primer have

designed, some other oligos also used for designing.

6

Page 7: ((abbas bio info soft copy...))

4.SEQUENCE RETRIVEL

NCBI

AIM: To retrieve the nucleotide for the given accession number from the NCBI

nucleotide sequence database

.DESCRIPTION

:ethods for determining DNA sequences were first described in 1972. since then, a wealth of

sequence information has been obtained and deposited in several essential centralized

locations. These generalized databases includes:

Genbank

EMBL

DDBJ

Databases and databases analysis tools allow a researcher to probe for a desired

sequence. The National Center for Biotechnology Information (NCBI) is part of

the United States National Library of Medicine (NLM), a branch of the National Institutes

of Health. The NCBI has had responsibility for making available the GenBank DNA

sequence database since 1992. GenBank coordinates with individual laboratories and

other sequence databases such as those of the European Molecular Biology Laboratory

(EMBL) and the DNA Database of Japan (DDBJ).

SOURCE : http://www.ncbi.nlm.nih.gov/

METHOD:

1. The NCBI home page in logged on using the websites.

2. On the home page, nucleotide option was clicked to retrieve nucleotide sequence

respectively.

3. The accession no. or our gene of intrest of our query sequence is entered in the search

page.

4. ‘Go’ button next to search tool bar was clicked.

5. The page containing the result matching to our query was displayed.

6. The required result is obtained by clicking on the link provided in the result page.

7. The sequence of our interest was selected and copied to a note pad and save.

7

Page 8: ((abbas bio info soft copy...))

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

.OUTPUT:

INTERPRETATION:

Nucleotide and protein sequence has been retrieved using NCBI sequence database.

EMBL

AIM: To retrieve the nucleotide for the given accession number from the EMBL

nucleotide sequence database.

DESCRIPTION:The European Molecular Biology Laboratory (EMBL) is a molecular biology

research institution supported by 20 countries comprising nearly all of western

Europe and Israel. The cornerstones of EMBL's mission are: to perform basic research

8

Page 9: ((abbas bio info soft copy...))

in molecular biology, to train scientists, students and visitors at all levels, to offer vital

services to scientists in the member states, to develop new instruments and methods in

the life sciences, and to actively engage in technology transfer.

SOURCE: http://www.ebi.ac.uk/embl/

METHOD:

1. The EMBL home page in logged on using the websites.

2. The accession no. or our gene of intrest of our query sequence is entered in the search

page.

3. ‘Go’ button next to search tool bar was clicked.

4. The page containing the result matching to our query was displayed.

5. The required result is obtained by clicking on the link provided in the result page.

6. The sequence of our interest was selected and copied to a note pad and save.

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

OUTPUT:

INTERPRETATION:

Nucleotide sequence has been retrieved using NCBI sequence database.

9

Page 10: ((abbas bio info soft copy...))

Swissprot

AIM: To retrieve the nucleotide for the given accession number from the Swissprot nucleotide sequence database.

DESCRIPTION:

Swiss-Prot is a manually curated biological database of protein sequences. Swiss-Prot was

created in 1986 by Amos Bairoch during his PhD and developed by the Swiss-Prot and its

automatically curated supplement TrEMBL, have joined with the Protein Information

Resource protein database to produce the UniProt Knowledgebase, the world's most

comprehensive catalogue of information on proteins.[2] As of 3 April 2007, UniProtKB/Swiss-

Prot release 52.2 contains 263,525 entries. As of 3 April 2007, the UniProtKB/TrEMBL

release 35.2 contains 4,232,122 entries.

SOURCE: http://www.ebi.ac.uk/swissprot /

METHOD:

1.The PIR home page in logged on using the websites.

2.The accession no. or our gene of interest of our query sequence is entered in the search

page.

3.‘Go’ button next to search tool bar was clicked.

4.The page containing the result matching to our query was displayed.

5.The required result is obtained by clicking on the link provided in the result page.

6.The sequence of our interest was selected and copied to a note pad and save.

INPUT:ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

OUTPUT:

10

Page 11: ((abbas bio info soft copy...))

v

INTERPRETATION:

The sequence for the given accession number has been retrieved from the swissprot

protein database.

11

Page 12: ((abbas bio info soft copy...))

5.SEQUENCE FORMAT CONVERSION

SQUIZZ

AIM : To convert the given sequence in NCBI format to EMBL format using SQUIZZ as

format conversion tool.

DESCRIPTION:

All the tools available for analysis of biological data(sequences), requires data in different

formats. T o change the same data in different formats to make it acceptable to different

sequence analysis tools, we require the sequence format conversion tools. There are different

tools available at the web site.

SQUIZZ allows the verification of sequence or sequence alignment format and conversion in

To the following formats:-

CLUSTAL

EMBL

FASTA

GCG

GDE

GENBANK

NBRF

MSF

Phyllip

SOURCE; http://bioweb.pasteur.fr/sequenal/interface/squizz.html

METHOD:

1. The home page of sequence conversion tool was opened by typing “sequence

conversion tool” in the google search tool bar.

2. Then the sequence format conversion hyperlink was clicked on open page.

3. SQUIZZ hyperlink was clicked to open this page.

4. A nucleotide sequence was taken in NCBI format and put in hyperlink Actual data

here

5. SQUIZZ was run

6 . Format was converted into changed format from hyperlink Convert into format.

12

Page 13: ((abbas bio info soft copy...))

7. Results in changed format were obtained and saved to notepad.

INPUT:ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

OUTPUT:

INTERPRETATION:Given nucleotide sequence was converted from genbank to EMBL format using SQUIZZ

sequence format conversions tool.

READSEQ

13

Page 14: ((abbas bio info soft copy...))

AIM : To convert the given sequence in EMBL format to FASTA format using READSQ

format conversion tool.

DESCRIPTION:

Sequence format conversion inputs DNA or amino acid sequence of specified format. Input

format is determined automatically. Automatically detects input format and converts into

following formats:

CLUSTAL

EMBL

FASTA

GCG

GDE

GENBANK

NBRF

MSF

Phyllip

In the present exercise we have converted EMBL format to FASTA using READseq

conversion tool.

SOURCE:

http://bioweb.pasteur.fr/sequenal/interface/readseq.cgi

METHOD:

1. The home page of sequence conversion tool was opened by typing “sequence

conversion tool” in the google search tool bar.

2. Then the sequence format conversion hyperlink was clicked on open page.

3. READSEQ hyperlink was clicked to open this page.

4. A protein sequence was taken in EMBL format and put in hyperlink Actual data

here.

5. SQUIZZ was run.

6. Format was converted into fasta format from hyperlink Convert into format.

7. Results in changed format were obtained and saved to notepad.

14

Page 15: ((abbas bio info soft copy...))

INPUT:ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

vvv

OUTPUT;

INTERPRETATION:

Given protein sequence was converted from EMBL to FASTA format using READSEQ

sequence format conversions tool.

15

Page 16: ((abbas bio info soft copy...))

FMTSEQ

AIM : To convert the given sequence in EMBL format to CLUTAL format using FMTSEQ

format conversion tool.

DESCRIPTION:

Format conversion tool converts sequence between 22 sequence format types. FMTSEQ

converts sequence between many formats including among

CLUSTAL

EMBL

FASTA

GCG

GDE

GENBANK

NBRF

MSF

Phyllip

SOURCE;

http://evol.biology.mcmaster.ca/seqanal/tmp/fmt.seq/A27358120711907/fmtseq.out

METHOD:

1. The home page of sequence conversion tool was opened by typing “sequence

conversion tool” in the google search tool bar.

2. Then the sequence format conversion hyperlink was clicked on open page.

3. FMTSEQ hyperlink was clicked to open this page.

4. A nucleotide sequence was taken in EMBL format and put in hyperlink Actual data

here.

5. FMTSEQ was run.

16

Page 17: ((abbas bio info soft copy...))

6. Format was converted into format from hyperlink Convert into format.

7. Results in changed format were obtained and saved to notepad.

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

OUTPUT;

INTERPRETATION:Given nucleotide sequence was converted from EMBL to CLUSTAL format using FMTSEQ

sequence format conversions tool.

SREFORMAT

AIM : To convert the given sequence in NCBI format to PIR format using SREFORMAT as

17

Page 18: ((abbas bio info soft copy...))

format conversion tool.

DESCRIPTION:

SreFormat allows the user to convert one sequence format conversion to another conversion.

It can accept the sequence in following format :

CLUSTAL

EMBL

FASTA

GCG

GDE

GENBANK

NBRF

MSF

Phyllip

SOURCE; http://bioweb.pasteur.fr/sequenal/interface/SreFormat.html

METHOD:

1. The home page of sequence conversion tool was opened by typing “sequence

conversion tool” in the google search tool bar.

2. Then the sequence format conversion hyperlink was clicked on open page.

3. SreFormat hyperlink was clicked to open this page.

4. A Protein sequence was taken in NCBI format and put in hyperlink Actual data

here.

5. Sreformat was run.

6. Format was converted into PIR format from hyperlink Convert into format.

7. Results in changed format were obtained and saved to notepad.

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

18

Page 19: ((abbas bio info soft copy...))

OUTPUT;

INTERPRETATION:

Given protein sequence was converted from NCBI to PIR format using SreFormat sequence

format conversions tool.

SMS

AIM: To convert the given sequence in GenBank format to the FASTA format.

19

Page 20: ((abbas bio info soft copy...))

DESCRIPTION:

sequence into any type of the required format

Using this tool, sequences in any format can be converted into the

following listed format. MVIEW tool is used to convert the given

In this it has the input option and output option.

INPUT OPTION: Pearson/FASTA MSF(GCG) CLUSTALW Max Hom/ HSSP Plain Multa: MULTAS/MULTAL Mips: MIPS-ALN

OUTPUT OPTION: HTML GCG/MSF Pearson/FASTA PIR RDB table for storaqe/manipulation in relational database form

METHODOLOGY:A. Given sequence in FASTA format is pasted in the table provided.B. PIR format is selected from the options provided.C. 3. Email I.D Is Provided when it is required.D. Tool is performed and result obtained is saved.

INPUT: Seq name: Rattus norvegicusAccession number: :NM_053814

WEB PAGE

20

Page 21: ((abbas bio info soft copy...))

output

INTERPRETATION: GenBank formatted query sequence had been converted into FASTA format by

using sequence conversion tool SMS

21

Page 22: ((abbas bio info soft copy...))

6. ORF FINDER

AIM: To find the open reading frame for the direct and the reverse strand

DESCRIPTION:

ORF Finder searches for open reading frames (ORFs) in the DNA sequence you

enter. The program returns the range of each ORF, along with its protein translation. Use

ORF Finder to search newly sequenced DNA for potential protein encoding segments. ORF

Finder supports the entire IUPAC alphabet and several genetic codes.

SOURCE:

www.bioinformatics.org/sms2/

INPUT: ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

v

v

22

Page 23: ((abbas bio info soft copy...))

OUTPUT:

INTERPRETATION:

By using ORF finder tool we have fond out the open reading frame for Oryza sativa gene of

accesseion no: EF 183474.

HOMOLOGY SEARCH

The term sequence analysis in biology implies subjecting a DNA or peptide sequence to

sequence alignment, sequence database, repeated sequence searches or other bioinformatics

methods on a computer.

In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for

comparing primary biological sequence information, such as the amino-acid sequences

of different proteins or the nucleotides of DNA sequences. A BLAST search enables a

researcher to compare a query sequence with a library or database of sequences, and

identify library sequences that resemble the query sequence above a certain threshold.

23

Page 24: ((abbas bio info soft copy...))

The BLAST program can either be downloaded and run as a command-line utility

"blastall" or accessed for free over the web. The BLAST web server, hosted by the

NCBI, allows anyone with a web browser to perform similarity searches against

constantly updated databases of proteins and DNA that include most of the newly

sequenced organisms. BLAST is actually a family of programs (all included in the blastall

executable). The following are some of the programs, ranked mostly in order of importance:

Nucleotide-nucleotide BLAST (blastn) :This program, given a DNA query, returns the most

similar DNA sequences from the DNA database that the user specifies.

Protein-protein BLAST (blastp) :This program, given a protein query, returns the most

similar protein sequences from the protein database that the user specifies.

Nucleotide 6-frame translation-protein (blastx) :

This program compares the six-frame conceptual translation products of a nucleotide query

sequence (both strands) against a protein sequence database.

Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx) :

This program is the slowest of the BLAST family. It translates the query nucleotide sequence

in all six possible frames and compares it against the six-frame translations of a nucleotide

sequence database. The purpose of tblastx is to find very distant relationships between

nucleotide sequences.

Protein-nucleotide 6-frame translation (tblastn) :

This program compares a protein query against the six-frame translations of a nucleotide

sequence database.

NUCLEOTIDE BLAST

Search a nucleotide database using a nucleotide query

AIM: To search a nucleotide similar to more sequences.

DESCRIPTION :

BLAST is one of the most widely used bioinformatics programs, because it addresses a

fundamental problem and the algorithm emphasizes speed over sensitivity.

24

Page 25: ((abbas bio info soft copy...))

To run, BLAST requires two sequences as input: a query sequence (also called the target

sequence) and a sequence database. BLAST will find subsequences in the query that are

similar to subsequences in the database.

Nucleotide-nucleotide BLAST (blastn) :

This program, given a DNA query, returns the most similar DNA sequences from the DNA

database that the user specifies.

METHOD:

1. Go to NCBI home page.

2. Click on Blast.

3. Click on nucleotide blast.

4. Paste a query sequence in FASTA format.

5. Choose nucleotide collection (nr/nt) in the database.

6. Run blast.

7. Select the most similar sequence which has maximum identity percentage and least

‘e’ value.

SOURCE: http://www.ncbi.nlm.nih.gov/blast/Blast.cgl

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

25

Page 26: ((abbas bio info soft copy...))

OUTPUT:

v

26

Page 27: ((abbas bio info soft copy...))

INTERPRETATION:

By using nucleotide blast we are able to get nucleotide sequence with maximum similarity.

The accession number for homologous sequence is: AY532754

PROTEIN BLAST

Search a Protein database using a protein query

AIM: To search a protein similar to more sequences.

DESCRIPTION :

BLAST is one of the most widely used bioinformatics programs, because it addresses a

fundamental problem and the algorithm emphasizes speed over sensitivity.

To run, BLAST requires two sequences as input: a query sequence (also called the target

sequence) and a sequence database. BLAST will find subsequences in the query that are

similar to subsequences in the database.

Protein-protein BLAST (blastp):

This program, given a protein query, returns the most similar protein sequences from the

protein database that the user specifies

METHOD:

1. Go to NCBI home page.

2. Click on Blast.

3. Click on protein blast.

4. Paste a query sequence in FASTA format.

5. Choose protein collection (nr) in the database.

6. Run blast.

7. Select the most similar sequence which has maximum identity percentage and least

‘e’ value.

SOURCE:

http://www.ncbi.nlm.nih.gov/blast.cgi#24657901

27

Page 28: ((abbas bio info soft copy...))

INPUT: ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

OUTPUT:

28

Page 29: ((abbas bio info soft copy...))

INTERPRETATION:

By using proetin blast we are able to get protein sequence with maximum similarity.

The accession number for homologous sequence is: NP_563915

BLASTX

Search a protein database using a translated nucleotide query

AIM: To search a protein similar to more sequences.

DESCRIPTION :

BLAST is one of the most widely used bioinformatics programs, because it addresses a

fundamental problem and the algorithm emphasizes speed over sensitivity.

To run, BLAST requires two sequences as input: a query sequence (also called the target

sequence) and a sequence database. BLAST will find subsequences in the query that are

similar to subsequences in the database.

Nucleotide 6-frame translation-protein (blastx)

This program compares the six-frame conceptual translation products of a nucleotide query

sequence (both strands) against a protein sequence database.

METHOD:

1. Go to NCBI home page.

2. Click on Blast.

3. Click on blastx.

4. Paste a query EST sequence in FASTA format.

5. Choose non-reductant protein sequence (nr) in the database.

6. Run blast.

7. Select the most similar sequence which has maximum identity percentage and least

‘e’ value.

SOURCE:

http://www.ncbi.nih.gov/blast/Blast.cgi

29

Page 30: ((abbas bio info soft copy...))

INPUT:ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

OUTPUT:

s

30

Page 31: ((abbas bio info soft copy...))

INTERPRETATION:

By using blastx we are able to get protein sequence with maximum similarity.

The accession number for homologous sequence is: AAT40013

tBLAST N

Search a translated nucleotide database using a protein query

AIM: To search a translated nucleotide similar to more sequences.

DESCRIPTION :

BLAST is one of the most widely used bioinformatics programs[2], because it

addresses a fundamental problem and the algorithm emphasizes speed over

sensitivity.

Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx)

This program is the slowest of the BLAST family. It translates the query nucleotide

sequence in all six possible frames and compares it against the six-frame translations

of a nucleotide sequence database. The purpose of tblastx is to find very distant

relationships between nucleotide sequences.

METHOD:

1. Go to NCBI home page.

2. Click on Blast.

3. Click on tblastN..

4. Paste a query sequence in FASTA format.

5. Choose nucleotide collection (nr/nt) in the database.

6. Run blast.

7. Select the most similar sequence which has maximum identity percentage and

least ‘e’ value.

SOURCE:

31

Page 32: ((abbas bio info soft copy...))

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

INPUT: ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

OUTPUT:

32

Page 33: ((abbas bio info soft copy...))

INTERPRETATION:

By using tblastN we are able to get translated nucleotide sequence with maximum similarity.

The accession number for homologous sequence is: NM_1135932

tBLASTX

Search a translated nucleotide database using a translated nucleotide query

AIM: To search a translated nucleotide similar to more sequences.

DESCRIPTION :

BLAST is one of the most widely used bioinformatics programs[2], because it addresses a fundamental problem and the algorithm emphasizes speed over sensitivity.

33

Page 34: ((abbas bio info soft copy...))

To run, BLAST requires two sequences as input: a query sequence (also called the target sequence) and a sequence database. BLAST will find subsequences in the query that are similar to subsequences in the database.

Protein-nucleotide 6-frame translation (tblastn) This program compares a protein query against the six-frame translations of a nucleotide sequence database.

METHOD:

1. Go to NCBI home page.2. Click on Blast.3. Click on tblast..4. Paste a query sequence in FASTA format.5. Choose nucleotide collection (nr/nt) in the database.6. Run blast.7. Select the most similar sequence which has maximum identity percentage and least

‘e’ value.

SOURCE:

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

INPUT: ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

34

Page 35: ((abbas bio info soft copy...))

OUTPUT:

INTERPRETATION:

By using tblastX we are able to get translated nucleotide sequence with maximum similarity.

The accession number for homologous sequence is: NM_001080098

35

Page 36: ((abbas bio info soft copy...))

FASTA

FASTA stands for FAST- all, reflecting the fact that it can be used for a fast

protein comparison or a fast nucleotide comparison. It is a DNA and protein sequence

alignment software package first described as (FASTAP) by David. J. Lipman and

William.R. Pearson in 1985. This program achieves a high level of sensitivity for

similarity searching at high speed. This is achieved by performing optimized searches

for local alignments using a substitution matrix. The high speed is achieved by using

the observed pattern of word hits to identify potential matches before attempting the

more time consuming optimized search. The trade – off between speed and sensitivity

is controlled by the ktup parameter, which specifies the size of the word. Increasing

the ktup decreases the number of background hits. Not every word hit is investigated

but instead initially looks for segment’s containing several nearby hits.

General FASTA Programs:

Tool Description

FASTA- protein Sequence similarity searching against protein

databases using FASTA.

FASTA- nucleotide. Sequence similarity searching against nucleotide

databases using FASTA.

36

Page 37: ((abbas bio info soft copy...))

FASTA – PROTEIN

AIM:-To find similarity in the protein sequences for the given query protein sequence

in any format using FASTA- Protein tool.

DESCRIPTION:-

It is about sequence similarity searching against protein databases using

FASTA. Provides sequence similarity searching against nucleotide and protein

databases using the FASTA programs. FASTA can be very specific when identifying

long regions of low similarity especially for highly diverged sequences. We can also

conduct sequence similarity searching against proteome or genome database using the

FASTA program.

SOURCE: htpp/www.ebi.ac.uk/Fasta33.

METHOD:

1. Type EBI in Google search (www.ebi.ac.uk/Fasta33).

2. Click on European Bioinformatics Institute.

3. Click on sequence similarity and analysis.

4. Click on FASTA.

5. Click on FATA Protein.

6. Paste or browse a protein sequence in any format in the sequence

submission box.

7. Click on Run FASTA3.

37

Page 38: ((abbas bio info soft copy...))

INPUT-

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

v

OUTPUT:

38

Page 39: ((abbas bio info soft copy...))

INTERPRETATION:

Most similar sequence to the query protein sequence was obtained.

FASTA- NUCLEOTIDE

AIM:To find similarity in the nucleotide sequences for the given query nucleotide

sequence in any format using FASTA-Nucleotide tool.

DESCRIPTION:

It is about sequence similarity searching against nucleotide databases using

FASTA. It provides sequence similarity searching against nucleotide and protein

databases using the FASTA programs. FASTA can be very specific when identifying

long regions and low similarity especially for highly diverged sequence. We can

conduct sequence similarity searching against complete proteome or genome

databases using the FASTA program.

SOURCE:

http://www.ebi.ac.uk/fasta33/

METHOD:

1. Type EBI in Google search (www.ebi.ac.uk/Fasta33).

2. Click on European Bioinformatics Institute.

3. Click on sequence similarity and analysis.

4. Click on FASTA.

5. Click on FATA Protein.

39

Page 40: ((abbas bio info soft copy...))

6. Paste or browse a protein sequence in any format in the sequence

submission box.

7. Click on Run FASTA3.

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

40

Page 41: ((abbas bio info soft copy...))

OUTPUT:

INTERPRETATION:

Most similar sequence to the query nucleotide sequence was obtained.

MULTIPLE SEQUENCE ALIGNMENT

Biologist often find a protein with approximately the same sequence in different

species, suggesting that the proteins have a closely related biological function and that the

gene encoding these protein have come from common genetic source. If we align theses genes

we find some are alike and some are almost identical.

As with aligning a pair of sequence, that difficulty in aligning a group of sequences varies

considerably, being much greater as the degree of sequence similarity decreases, when the

amount of sequence variation is great, it is difficult to find an optimal alignment of sequences

because so many combinations of substitutions, insertion and deletion, each predicting a

different alignment are possible.

Three commonly used program for multiple sequence alignment are:

Clustal W

T-Coffee

Multalin

41

Page 42: ((abbas bio info soft copy...))

MULTIPLE SEQUENCE ALIGNMENT USING CLUSTAL W

AIM: To align three sequences using Clustal W.

DESCRIPTION:

Clustal W is a more recent version of clustal with W standing for “weighting” to

represent the ability of the program to provide weights to the sequence and program to

parameters. Program is designed to provide an adequate alignment of a large number\ of more

closely related sequences and a reliable indication of the domain structure of sequences.

Once an alignment has been made, a phylogenetic tree can me made by the neighbour-joining

method.

METHOD:

1. Select more than two protein or nucleotide sequence from NCBI in FASTA format.

2. Copy the sequences and save in format.

3. Type Clustal W in google search bar.

4. Click on multiplae sequence aignmment- Clustal W.

5. Submit the sequence in enter sequence box.

6. Click on execute multiple alignment.

7. Copy and save the result on notepad.

SOURCE: http://www.ebi.ac.uk/Tools/clustalw2/index.html

42

Page 44: ((abbas bio info soft copy...))

OUTPUT:

RESULTS:

Multiple sequence alignment for three insuline protein was performed using Clustal W.

MULTIPLE SEQUENCE ALIGNMENT USINGT-COFFEE

AIM: To align three sequences using T-Coffee.

DESCRIPTION:

T-Coffee is an advanced pairwise alignment program that uses a system of sequences

position weights to generate an multiple sequence alignment that is the most consistent with

pair-wise alignments of all the component sequences ( T-Coffee stands for tree based

Consistency based objective function for alignment evaluation). T-Coffee is better than

Clustal W at reproducing known alignment of related proteins but is much slower.

METHOD:

44

Page 45: ((abbas bio info soft copy...))

1. Select more than two protein or nucleotide sequence from NCBI in FASTA format.

2. Copy the sequences and save in format.

3. Type T-Coffee in google search bar.

4. Click on multiplae sequence aignmment- T-Coffee

5. Submit the sequence in enter sequence box.

6. Run the program.

7. Copy and save the result on notepad.

SOURCE: http://www.ch.embnet.org/software/TCoffee.html

INPUT: ACCESSION NO: NM_008083 NM_001081278 NM_012519

OUTPUT:

45

Page 46: ((abbas bio info soft copy...))

RESULTS:

Multiple sequence alignment for three insuline protein was performed using T-Coffee.

MULTIPLE SEQUENCE ALIGNMENT USING MULTALIGN

46

Page 47: ((abbas bio info soft copy...))

AIM: To align three sequences using Multalign.

DESCRIPTION:

Multalign does a simultaneous alignments for two or more DNA or protein sequences.

It introduce a certain number of gaps into either pairwise aligned sequences to find minimal

global distance. The program is based on a generalization of the algorithm of Watermann-

Smith and Beyer by Kreger and Osterburg.

METHOD:

1. Select more than two protein or nucleotide sequence from NCBI in FASTA format.

2. Copy the sequences and save in format.

3. Type Multalign in google search bar.

4. Click on multiplae sequence aignmment- Multalign.

5. Submit the sequence in enter sequence box.

6. Run the program.

7. Copy and save the result on notepad.

SOURCE: http://bioinfo.genopole-toulouse.prd.fr/multalin/multalin.html

INPUT: ACCESSION NO - NM 14646 NM 001122899 NM 010704

47

Page 48: ((abbas bio info soft copy...))

OUTPUT:

INTERPRETATION:

Multiple sequence alignment for three insuline protein was performed using multalign.

48

Page 49: ((abbas bio info soft copy...))

GENE PREDICTION

With the advent of whole genome sequencing projects, It has become routine to scan

genomic DNA sequences t find genes, particularly those that encode protein. Computational

methods for gene prediction work by searching through sequences to locate the most likely

ones that encodes proteins. Predicating protein-encoding genes is generally easier in

prokaryotes than in eukaryotic organisms because prokaryotic generally lack introns and

because several quite highly conserved sequences are found in the promoter region and

around the start sites of transcription and translation.

Three commonly used programs for gene prediction are:-

Webgene

Genmark

Genscan

GENE PREDICTION USING WEBGENE

AIM:-To predict the features of eukaryotic gene using webgene.

DESCRIPTION:-

WebGene is a tool which publishes family history information on the Web. It

publishes this information from a standard file type used typically to exchange data

between genealogy software applications Rex Myer is the founder of WebGene and it

has been online since fall of 1995. WebGene indexes the information in the

GEDCOM file and presents it in an appealing graphical format suitable for the

Internet. Further, it enables the lookup and cross-referencing of surnames and family

relationships.

SOURCE:-

http://www.itb.cnr.it/sun/webgene/

METHOD:-

49

Page 50: ((abbas bio info soft copy...))

1. Sequence of human insulin was retrieved from NCBI and saved in note pad.

2. On google search bar ,webgene was typed.

3. Webgene home page was opened.

I. Gene builder:-

Gene builder bar was clicked.

Shown parameters on webpage were set.

Sequence was pasted in the given box.

Analysis was run

Results were saved.

II. Repeat view

Repeat View bar was clicked.

Shown parameters on webpage were set.

Sequence was pasted in the given box.

Analysis was run

Results were saved

III. CpG island

CpG bar was clicked.

Shown parameters on webpage were set.

Sequence was pasted in the given box.

Analysis was run

Results were saved.

IV. Splice View

Splice view bar was clicked.

Shown parameters on webpage were set.

Sequence was pasted in the given box.

Analysis was run

Results were saved.

50

Page 51: ((abbas bio info soft copy...))

V. HC polyA

HC polyA bar was clicked.

Shown parameters on webpage were set.

Sequence was pasted in the given box.

Analysis was run

Results were saved.

VI . Hctata

HCtata bar was clicked.

Shown parameters on webpage were set.

Sequence was pasted in the given box.

Analysis was run

Results were saved

VII . Gen view2

Gene view2 bar was clicked.

Shown parameters on webpage were set.

Sequence was pasted in the given box.

Analysis was run

Results were saved.

VIII . AUG_evaluator

Gene builder bar was clicked.

Shown parameters on webpage were set.

Sequence was pasted in the given box.

Analysis was run

Results were saved.

Input gene builder

51

Page 52: ((abbas bio info soft copy...))

OUTPUT: Gene Builder

OUTPUT: Repeat-View

52

Page 53: ((abbas bio info soft copy...))

OUTPUT: CpG island

OUTPUT: Splice View

53

Page 54: ((abbas bio info soft copy...))

OUTPUT: HC polyA

OUTPUT: . Hctata

54

Page 55: ((abbas bio info soft copy...))

OUTPUT: Gen view2

55

Page 56: ((abbas bio info soft copy...))

OUTPUT: AUG_evaluator

RESULTS AND INTERPRETATION:

Eight programs of Webgene were run for human insulin gene to predict:

Gene builder- protein coding gene.

Repeat view- repeated element mapping.

CpG island- CpG island.

Splice view- Splicing signal.

HcpolyA- for PolyA.

Hctata- for TATA signal prediction.

Genview- protein coding gene.

AUG_evaluator- start codon.

56

Page 57: ((abbas bio info soft copy...))

GENE PREDICTION USING GENMARK

AIM:-To predict the features of eukaryotic gene using genemark.

DESCRIPTION:-

The GeneMark. hmm algorithm presented here was designed to improve the

gene prediction quality in terms of finding exact gene boundaries. The high gene

finding accuracy has been found with genmark. This program also use the specially

derived ribosome binding site pattern to refine predictions of translation initiation

codons.

SOURCE:-

http://exon.gatech.edu/Genmark/genmark_prok_gms_plus.cgi

METHOD:-

1. Sequence of prokaryotic gene was retrieved from NCBI and saved in note pad.

2. On google search bar ,genmark was typed.

3. Genmark home page was opened.

4. Sequence was pasted in box.

5. Analysis was done.

6. Results were saved.

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

57

Page 58: ((abbas bio info soft copy...))

OUTPUT:

INTERPRETATION:

Genmark program was run to predict gene of prokaryotes.Result was saved.

58

Page 59: ((abbas bio info soft copy...))

GENE PREDICTION USING GENSCAN

AIM:-To predict the features of eukaryotic gene using Genescan.

DESCRIPTION:-

Genescan is an example of an approaches for gene prediction which integrate

multiple types of information including splice signal sensors, compositional properties of

coding and non-coding DNA and in some cases database homology searching in order to

predict entire gene structures (sets of spliceable exons) in genomic sequences. Genescan use

distinct, explicit, empirically derived sets of model parameters to capture differences in gene

structure and composition between distinct C . G compositional regions (isochores) of the

human genome. It also has the capacity to predict multiple genes in a sequence, to deal with

partial as well as complete genes, and to predict consistent sets of genes occuring on either or

both DNA strands.

SOURCE :-

http://genome.dkfz-heidelberg.de/cgi-bin/GENSCAN/genscan.call.cgi

METHOD:-

1. Sequence of human insulin was retrieved from NCBI and saved in note pad.

2. On google search bar genescan was typed.

3. Genescan home page was opened.

4. Sequence was pasted in box.

5. Analysis was done.

6. Results were saved.

INPUT: ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

59

Page 60: ((abbas bio info soft copy...))

OUTPUT:

INTERPRETATION: Genscan program was run to predict gene of eukaryotes.

Result was saved.

60

Page 61: ((abbas bio info soft copy...))

PATTERNS AND PROFILE SEARCH OF PROTEINS

AIM: To search patterns and profiles of given protein sequences using various EXPASy

tools.

DESCRIPTION:

The Expasy (Expert Protein Analysis System) is a proteomics server of the Swiss

Institute of Bioinformatics (SIB) which analyzes protein sequences and structures and two

dimensional electrophoresis. The server functions in collaboration with the EBI. Expasy also

produes the protein sequence knowledge base Uniprot and Swissprot.

For the prediction of patterns and profiles of proteins Expasy produces tools like

1. ELM

2. FingerPRINTScan

3. Motif Scan

4. Proscan

5. PRATT

Profiles are numerical representation of a multiple sequence alignment. Profiles help find the

similarities between these sequences and help in identification and analysis of distant related

proteins.

Patterns also represent the common characterstics of a protein family but it does not contain

any weighing information. Thus, the user can specify what kind of patterns should be

searched for, and how many sequences should match a pattern to be repeated- there are option

fot pattern conservation, restrictions, number of pattern symbols, flexible spacers etc.

61

Page 62: ((abbas bio info soft copy...))

Prosite

AIM: To perform profile and pattern search using Prosite tool.

DESCRIPTION: PROSITE consists of documentation entries describing protein

domains, families and functional sites as well as associated patterns and profiles to

identify them

SOURCE: http://www.expasy.ch/prosite/

METHOD:

1. A protein query sequence is retrieved from NCBI in FASTA format.

2. The retrieved protein sequence is pasted on the Prosite submission form.

3. Click the scan button

4. The tool Prosite was run and the results viewed by clicking on Rich view and saved.

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

62

Page 63: ((abbas bio info soft copy...))

OUTPUT:

INTERPRETATION:

UsingProsite tool we are able to predict the secondary structure for

cellulase ; AAA23226B.Tool has shown the number of disulphide bridges, active sites and

other details of protein structure.

63

Page 64: ((abbas bio info soft copy...))

ELM

AIM: To perform profile and pattern search using ELM tool.

DESCRIPTION:

ELM stands for Eukaryotic Linear Motif search and is a resource for finding functional

sites in proteins. It can find Pfam domain, signal peptide, coiled coil prediction,

transmembrane helix as well as loop, helix and strand prediction.

SOURCE:

http://elm.eu.org/

METHOD:

1. A protein query sequence is retrieved from NCBI in FASTA format.

2. The retrieved protein sequence is pasted on the ELM submission form.

3. The e-mail id was entered.

4. The tool ELM was run and the results viewed and saved.

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

64

Page 65: ((abbas bio info soft copy...))

OUTPUT:

INPRETATION:

Using EML tool we are able to find number of helixes, strands, loops which are present in

the secondary structure of chitinase , Accession No. AAA32461

65

Page 66: ((abbas bio info soft copy...))

FingerPRINTScan

AIM: To perform profile and pattern search using FingerPRINTScan tool.

DESCRIPTION:

FingerPRINTScan tool scans a protein sequence against the PRINTS protein

finger database. It tells the number of motifs matched to the query sequence, its length

and position.

SOURCE:

http://www.bioinf.man.ac.uk/fingerPRINTScan/

METHOD:

1. A protein query sequence is retrieved from NCBI in FASTA format.

2. The retrieved protein sequence is pasted on the FingerPRINTScan submission

form.

3. The e-mail id was entered.

4. The tool FingerPRINTScan was run and the results viewed and saved.

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

66

Page 67: ((abbas bio info soft copy...))

OUTPUT:

67

Page 68: ((abbas bio info soft copy...))

INFERENCE:

The Fingerprint scan tool was used in order to find out the number of motifs and their positionsin the sequence

Motif Scan

AIM: To perform profile and pattern search using Motif Scan tool.

DESCRIPTION:

Motif or family comparisons are more sensitive because motifs represent a higher

level generalization of the features that are imporatnat for a given structural or functional

feature. This tool scans a sequence against protein profile databases [including PROSITE].

SOURCE:

http://mybits.icb.sib.ch/cgi-bin/motif-scan

METHODOLGY:

1. A protein query sequence is retrieved from NCBI in FASTA format.

2. The retrieved protein sequence is pasted on the Motif Scan submission form.

3. The e-mail id was entered.

4. The tool Motif Scan was run and the results viewed and saved.

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

68

Page 69: ((abbas bio info soft copy...))

OUTPUT:

69

Page 70: ((abbas bio info soft copy...))

INTERPRETATION:

we are able to find number of helixes, strands, loops which are present in the secondary

structure of chitinase , Accession No. AAA32461 by the using of motif scan

PROSCAN

AIM: To perform profile and pattern search using PROSCAN tool.

DESCRIPTION:

This tool developed and run by PBIL in University of Lyon, France scans a sequence

against PROITE and allows mismatches as well. It can give information regarding

phosphorylation, amidation or any other specific identity characterstic of the given sequence.

SOURCE:

http://npsa-phil.ibcp.fr/cgi-bin/npsa_automat.pI?page=npsa_prosite.html

METHOD:

1. A protein query sequence is retrieved from NCBI in FASTA format.

70

Page 71: ((abbas bio info soft copy...))

2. The retrieved protein sequence is pasted on the PROSCAN submission form.

3. The e-mail id was entered.

4. The tool PROSCAN was run and the results viewed and saved.

INPUT:

ACCESSION NO –IH4P B

CHAIN B , CRYSTAL PROTEIN

71

Page 72: ((abbas bio info soft copy...))

OUTPUT:

INTERPRETATION:

Using the tool proScan the functional sites of a protein sequence can be found . The results are viewed and saved.

72

Page 73: ((abbas bio info soft copy...))

VISUALIZATION OF PROTEIN STRUCTURE BY USING RASMOL

AIM: To visualize the structure of protein sequence by using visualization tool RasMol.

DESCRIPTION:

RasMol 2 is a molecular graphics program intended for the visualization of proteins,

nucleic acids and small molecules. The program is aimed at display, teaching and generation

of publication quality images. RasMol runs on Microsoft Windows, Apple, Macintosh, UNIX

and VMS systems. The UNIX and VMS systems require an 8,24 or 32 bit colour X Windows

display (X11R4 or later). The program reads in a molecule co-ordinate file and interactively

displays the molecule on the screen in a variety of colour schemes and molecular

representations. Currently available representations include depth cued wireframes, ‘drieding’

sticks, spacefilling (CPK) spheres, ball and stick, solid and strand biomolecular ribbons, atom

labels and dot surfaces.

SOURCES:

1. http://wbiomed.curtin.edu.au/teach/biochem/help/download.html

2. http://mc2.cchem.berkeley.edu/rasmol/v2.6/

protein structure (.pdb) http://www.pdb.org/pdb/home/home.do

METHOD:

1. The NCBI website is logged on.

2. The given accession no. is entered and searched for it. The nucleotide sequence is

got from the CoreNucleotide database.

3. The pdb id is collected for the given sequence in the CDS section of the

sequence. PDB ID found is.eg.2MM1

4. The PDB website is logged on.

5. The pdb id .is entered and searched for it.

6. The .pdb.gz file is downloaded from the options on the left of the page.

7. The .pdb was extracted from the .pdb.gz file.

8. This .pdb file was opened using RasMol.

73

Page 74: ((abbas bio info soft copy...))

9. The structure is viewed with different Display options like wireframe, Backbone,

Sticks, Spacefill, Ball & Stick, Ribbons, Strands, cartoons that are available on

RasMol.

10. In RasMol Command Line, some of the commands like “select helix’ and

“colour yellow” are used to view helix structure in that molecule.

11. several other commands can also be used like “set picking distance”, “set picking

angle”, set picking tortion”, etc.

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

OUTPUT:

74

Page 75: ((abbas bio info soft copy...))

INTERPRETATION:

1. This protein has total of …atoms.

2. This protein has ..helix structure with …atoms.

3. this protein has no sheets or loops…

4. This protein has ..HOH molecues.

Picture with ligands

75

Page 76: ((abbas bio info soft copy...))

SECONDARY STRUCTURE PREDICTION

AIM: Secondary structure prediction of the given protein sequences using Expasy tools.

DESCRIPTION:

The Expasy (Expert Protein Analysis System) is a proteomics server of the Swiss

Institute of Bioinformatics (SIB) which analyzes protein sequences and structures and two

dimensional electrophoresis. The server functions in collaboration with the EBI. Expasy also

produes the protein sequence knowledge base Uniprot and Swissprot.

For the prediction of secondary structure of proteins Expasy produces tools like

1. GOR

2. HNN

3. SOPMA

4. JPred

5. GOR

GOR

AIM: To predict secondary structure of a given protein using GOR tool from Expasy.

DESCRIPTION:

GOR predicts the secondary structure of a given amino acid by looking at a window

of 8 amino acids before and 8 after the position of interest. This program (named after

Garnier, Osguthorpe and Robson) is in its fourth version.

SOURCE:

http://npsa-pbil.ibcp.fr.cgi-bin/npsa_automat.pi?page=npsa_gor4.html

METHOD:

5. A protein query sequence is retrieved from NCBI in FASTA format.

6. The retrieved protein sequence is pasted on the GOR4 submission form.

7. The e-mail id was entered.

8. The tool GOR4 was run and the results viewed and saved.

76

Page 77: ((abbas bio info soft copy...))

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

OUTPUT:

77

Page 78: ((abbas bio info soft copy...))

INTERPRETATION:

Using GOR tool we are able to predict the secondary structure for chitinase Accession no.

IH4P B.Tool has shown the number of helixes , alpha helixes and beta bridges and other

details of protein structure.

HNN

78

Page 79: ((abbas bio info soft copy...))

AIM: To predict secondary structure of a given protein using HNN tool from Expasy.

DESCRIPTION:

Hierarchial Neural Networks can be used to predict protein structure. The protein

sequence is translated into patterns by shifting a window of n adjacent residues(typical value

of n=13-21) through the protein.

SOURCE:

http://npsa-pbil.ibcp.fr.cgi-bin/npsa_automat.pi?page=npsa_nn.html

METHOD:

1. A protein query sequence is retrieved from NCBI in FASTA format.

2. The retrieved protein sequence is pasted on the HNN submission form.

3. The e-mail id was entered.

4. The tool HNN was run and the results viewed and saved.

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

OUTPUT:

79

Page 80: ((abbas bio info soft copy...))

INTERPRETATION:

Using HNN tool we are able to prerdict the secondary structure for chitinase Accession

number.JH4P B.

Tool has predicted the number of helixes, alpha helixes and beta bridges and other details of

protein structure.

80

Page 81: ((abbas bio info soft copy...))

SOPMA

AIM: To predict secondary structure of a given protein using SOPMA tool from Expasy.

DESCRIPTION:

SOPMA is a secondary structure prediction program ( Self Optimized Prediction

Method) that uses multiple alignments. SOPMA correctly predicts 69.57% of amino acids for

a secondary structure (alpha helix, beta sheet and coil).

SOURCE:

http://npsa_pbil.ibcp.fr.cgi_bin/npsa_automat.pi?page=npsa_sopma.html

METHOD:

1. A protein query sequence is retrieved from NCBI in FASTA format.

2. The retrieved protein sequence is pasted on the SOPMA submission form.

3. The e-mail id was entered.

4. The tool SOPMA was run and the results viewed and saved.

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

81

Page 82: ((abbas bio info soft copy...))

OUTPUT:

82

Page 83: ((abbas bio info soft copy...))

INTERPRETATION:

Using SOPMA tool we are able to prerdict the secondary structure for chain b Accession

number. JH4P B.

Tool has predicted the number of helixes, alpha helixes and beta bridges and other details of

protein structure.

JPred

AIM: To predict secondary structure of a given protein using JPred tool from Expasy.

DESCRIPTION:

It is a consensus to find secondary structure of protein put forth by University of Dundee.

SOURCE:

http://www.compbio.dundee.ac.uk/~www-jpred/

METHOD:

83

Page 84: ((abbas bio info soft copy...))

1. A protein query sequence is retrieved from NCBI in FASTA format.

2. The retrieved protein sequence is pasted on the JPred submission form.

3. The e-mail id was entered.

4. The tool JPred was run and the results viewed and saved.

INPUT:

ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

OUTPUT:

84

Page 85: ((abbas bio info soft copy...))

INTERPRETATION:

Using Jpred tool we are able to prerdict the secondary structure for chitinase Accession

number. AA32461.

TO COSTRUCT THE PHYLOGENTIC REALTIONSHIP

BETWEEN DIFFERENT ORGANISMS AIM: To draw the phylogenetic tree of the given sequences using the software phylodraw.

DESCRIPTION:

85

Page 86: ((abbas bio info soft copy...))

The sequences whose phylogenetic relationship is to be known are retrieved from NCBI by

keyword search or by the accession number. The tool Phylodraw available on the net is used

for drawing the phylogenetic tree. The input format is Dialign which is obtained by doing a

multiple sequence alignment using the dialign tool.

For this phylogenetic treedrawing Phylodraw and Dialign are the tools used.

SOURCES:

Dialign: http://bibiserve.techfak.uni-bielefeld.de/dialign/sumission.html

Phylodaw: http://pearl.cs.pusan.ac.kr/phylodraw/

NCBI: www.ncbi.nlm.nih.gov

METHOD :

1. The sequences with the following accession numbers are retrieved from the NCBIs

biological database.

2. The sequences are used as the input in the Dialign tool for multiple sequence alignment.

3. The output and the result of dialign is used as the input in the phylodraw tool.

4. Phylodraw is the tool used to draw phylogeetic trees. It has the following types of trees.

a. Unrooted tree

b. Rooted tree

c. Radial tree

d. Slated cladogram

e. Rectangular cladogram

f. Phylogram.

5. The results are displayed in Radial tree, Slated cladogram, rectangle cladogram and

Phylogram tree formats.

INPUT:ACCESSION NO –NM_006272.2

Homo sapiens S100 calcium binding protein B

86

Page 87: ((abbas bio info soft copy...))

PHYLODRAW INPUT:

87

Page 88: ((abbas bio info soft copy...))

OUTPUT:

88

Page 89: ((abbas bio info soft copy...))

The phylodraw tool is used to draw the phylogenetic tree of genetically related species. It

can display the trees in various formats.

The tree formats thet are displayed are:

a. Radial tree

b. Slated cladogram

89

Page 90: ((abbas bio info soft copy...))

c. Rectangular cladogram

d. Phylogram

INTERPRETATION:

The phylogenetic tree for the sequences has been drawn using the Phylodraw tool with

the result of Dialign as the input.

90