Databases II Sucheta Tripathy,
Databases IISucheta Tripathy,
Biological databases◦ MetaBase ( A database of Biological databases)◦ http://metadatabase.org/
Bibliographic databases Chemical databases Numerous other databases.
Types of Databases
Sequence databases.◦ Nucleotide◦ Protein
Structure Databases. Genome databases. Transcriptome databases Model organism databases.
◦ PlasmoDB, TAIR, FlyBase etc.
Biological Databaseshttp://en.wikipedia.org/wiki/List_of_biological_databases
NA and protein databases. Animal and plant databases Ensembl Genome project TIGR Database. Biotechnological databases Database for species identification and
classification Database retrieval and deposition schemes Literature search databases
Topics To be taught
Nucleotide Databases
Nucleotide Databases (TIGR)
Founded Celera Genomics to fund Shot gun sequencing.Created a synthetic organism called as Mycoplasma laboratorium
Nucleotide Databases
Nucleotide Databases (INSDC)
Genbank DDBJ
EBI
Data type DDBJ EMBL-EBI NCBI
Next generation reads
Sequence Read Archive
European Nucleotide
Archive (ENA)
Sequence Read Archive
Capillary reads Trace Archive Trace Archive
Annotated sequences DDBJ GenBank
Samples BioSample BioSample
Studies BioProject BioProject
Nucleotide Databases (INSDC)
http://www.insdc.org/documents/feature-table
http://asia.ensembl.org/Help/Movie?id=210 http://ensemblgenomes.org/
Ensembl Genome Projectwww.ensembl.org
Gbrowse UCSC Genome Browser Vista Browser Ensembl browser Integrated Genome Browser (IGV)
Genome Browsers
Encyclopedia of life (www.eol.org ) Education + EOL (http://education.eol.org )
http://indiabiodiversity.org
Plant and Animal Databases
Trusted comprehensive information on every species on earth.
Has about 2 million pages and each page catalogues a species.
Community driven.
Encyclopedia Of Life
India Bio diversity
17 countries out of XXX contains 70% biodiversity: “Megadiverse”
India Bio diversity
The Bar Code of Life (BOLD Systems)International Barcode of Life Projects
V3 is released: V2 will be maintained till 2012 Dec.
Data Portal; Barcode Cluster; Data Collection
Animal Identification◦ COI (cytochrome C oxidase subunit 1)
Fungi Identification◦ (ITS – internal transcribed spacer)
Plant Identification◦ Rbcl (ribulose bisphosphate carboxylase)◦ Mat k (maturase k)
Barcode Of Life
Barcode of life
http://www.youtube.com/watch?v=ZImiXgU6bCk&feature=related
Data Retrieval and deposition schemes
•Genbank•Entrez
•CoreNucleotide•DbEST•dbGSS
•NCBI-eutilities
Data Retrieval and deposition schemes
•NCBI-eutilitieshttp://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=60&datetype=edat&retmax=100&usehistory=y
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&term=biomol+trna[prop]
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=70000:90000[molecular+weight]
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=structure&id=19923,12120
Ref: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch
Data Retrieval and deposition schemes
BankIt : Single or a simple group of sequences: web based
Sequin : Simple to complex submission ; < 10,000 sequences
Tbl2Asn : Template File; Sequence file; Feature Table
PUBMED◦ 22.1 million records◦ eTBLAST
CABI SCOPUS Google Scholar
Bibliographic databases
Database Nucleic Acids Research BMC Genomics Bioinformatics Nature Cell Plant Cell
Database journals