Top Banner
Pharmacoinformatics Pharmacoinformatics
109

Pharmacoinformatics Database basics(sree)

Jan 23, 2018

Download

Education

Sreekanth Gali
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Pharmacoinformatics Database basics(sree)

PharmacoinformaticsPharmacoinformatics

Page 2: Pharmacoinformatics Database basics(sree)

Pharmacoinformatics is an emerging field that draws upon both Pharmacoinformatics is an emerging field that draws upon both Bioinformatics and Cheminformatics. Bioinformatics and Cheminformatics.

The scientific or research aspect deals with the use of technology The scientific or research aspect deals with the use of technology in drug discovery while the service aspect deals with monitoring in drug discovery while the service aspect deals with monitoring patients of a drugpatients of a drug

The scope for jobs is essentially with companies involved in drug The scope for jobs is essentially with companies involved in drug research and clinical research. research and clinical research.

National Institute of Pharmaceutical Education and Research National Institute of Pharmaceutical Education and Research (NIPER) in Punjab appears to be the only structured course in this (NIPER) in Punjab appears to be the only structured course in this area at the post graduate and the Ph D level. area at the post graduate and the Ph D level.

Bioinformatics Institute of India in NOIDA, Uttar Pradesh also Bioinformatics Institute of India in NOIDA, Uttar Pradesh also claims to offer a Ph D in this area. claims to offer a Ph D in this area.

This is an emerging field, placements are not clear and companies This is an emerging field, placements are not clear and companies would probably view pharmacoinformatics at par with would probably view pharmacoinformatics at par with cheminformatics and bioinformatics. cheminformatics and bioinformatics.

Most pharma and biotech companies are adopting a wait and Most pharma and biotech companies are adopting a wait and watch policy and don't have full fledged department, IBM, Sun watch policy and don't have full fledged department, IBM, Sun Microsystems and Oracle are significant players in the biosystems Microsystems and Oracle are significant players in the biosystems domain. domain.

Page 3: Pharmacoinformatics Database basics(sree)

PharmacoinformaticsPharmacoinformatics

Agenda:Agenda:

•Database DesignDatabase Design•Information ManagementInformation Management•Drug Information ServicesDrug Information Services

Page 4: Pharmacoinformatics Database basics(sree)

DatabaseDatabase Design:Design:

Structure of databasesStructure of databases

Sequence databasesSequence databases

Relational databasesRelational databases

Sequence analysisSequence analysis

Software resourcesSoftware resources

Sequence alignmentSequence alignment

Database searchesDatabase searches

Phylogentic analysisPhylogentic analysis

Page 5: Pharmacoinformatics Database basics(sree)

Fundamentals of Fundamentals of Database DesignDatabase Design

Page 6: Pharmacoinformatics Database basics(sree)

AgendaAgenda

Introduction and participants needsIntroduction and participants needs We will review “what is a database;”We will review “what is a database;” Understand the difference between Understand the difference between

data and information;data and information; What is the purpose of a database What is the purpose of a database

system;system; How to select a database system;How to select a database system; Database definitions and Database definitions and

fundamental building blocks;fundamental building blocks;

Page 7: Pharmacoinformatics Database basics(sree)

Agenda (2)Agenda (2)

Database development: the first Database development: the first steps;steps;

Quality control issues;Quality control issues; Data entry considerations;Data entry considerations;

Page 8: Pharmacoinformatics Database basics(sree)

What is a databaseWhat is a database

A database is any organized collection of A database is any organized collection of data.data. Some examples of databases you Some examples of databases you may encounter in your daily life are: may encounter in your daily life are: – a telephone book a telephone book – T.V. Guide T.V. Guide – airline reservation system airline reservation system – motor vehicle registration records motor vehicle registration records – papers in your filing cabinet papers in your filing cabinet – files on your computer hard drive.files on your computer hard drive.– BankingBanking

Page 9: Pharmacoinformatics Database basics(sree)

Data vs. information:Data vs. information:What is the difference?What is the difference?

What is data?What is data?– Data can be defined in Data can be defined in

many ways. many ways. Information science Information science defines data as defines data as unprocessed unprocessed information.information.

What is What is information?information?– Information is data that Information is data that

have been organized have been organized and communicated in a and communicated in a coherent and coherent and meaningful manner. meaningful manner.

– Data is converted into Data is converted into information, and information, and information is information is converted into converted into knowledge.knowledge.

– Knowledge; information Knowledge; information evaluated and evaluated and organized so that it can organized so that it can be used purposefully.be used purposefully.

Page 10: Pharmacoinformatics Database basics(sree)

Why do we need a database?Why do we need a database?

Keep records of our:Keep records of our:– ClientsClients– StaffStaff– VolunteersVolunteers

To keep a record of To keep a record of activities and activities and interventions;interventions;

Keep sales records;Keep sales records; Develop reports;Develop reports; Perform researchPerform research Longitudinal trackingLongitudinal tracking

Page 11: Pharmacoinformatics Database basics(sree)

What is the ultimate purpose of What is the ultimate purpose of a database management a database management

system?system?

Data Information Knowledge Action

Is to transformIs to transform

Page 12: Pharmacoinformatics Database basics(sree)

More about database definitionMore about database definition

What is a database? What is a database? Quite simply, it’s an organized collection of Quite simply, it’s an organized collection of

data. A database management system data. A database management system (DBMS) such as Access, FileMaker, Lotus (DBMS) such as Access, FileMaker, Lotus Notes, Oracle or SQL Server which Notes, Oracle or SQL Server which provides you with the software tools you provides you with the software tools you need to organize that data in a flexible need to organize that data in a flexible manner. It includes tools to add, modify manner. It includes tools to add, modify or delete data from the database, ask or delete data from the database, ask questions (or queries) about the data questions (or queries) about the data stored in the database and produce stored in the database and produce reports summarizing selected contents.reports summarizing selected contents.

Page 13: Pharmacoinformatics Database basics(sree)

For example: For example: Databases in BioinformaticsDatabases in Bioinformatics

Outlook contactsOutlook contacts Aspira Association Aspira Association MISMIS KidTraxKidTrax GIS-GPS systemsGIS-GPS systems

Page 14: Pharmacoinformatics Database basics(sree)

Example: 2Example: 2

Page 15: Pharmacoinformatics Database basics(sree)

What is a database?What is a database?

A collection of...A collection of...– structured structured – searchable (index)searchable (index) -> table of -> table of

contentscontents– updated periodically (release)updated periodically (release) -> new edition -> new edition– cross-referenced (cross-referenced (hyperlinkshyperlinks) ) -> links with -> links with

other dbother db … …datadata

Includes also associated tools (software) Includes also associated tools (software) necessary for db access, db updating, db necessary for db access, db updating, db information insertion, db information deletion….information insertion, db information deletion….

Page 16: Pharmacoinformatics Database basics(sree)

Types of DatabasesTypes of Databases

Non-relational databasesNon-relational databasesNon-relational databases place information in field categories that Non-relational databases place information in field categories that we create so that information is available for sorting and we create so that information is available for sorting and disseminating the way we need it. The data in a non-relational disseminating the way we need it. The data in a non-relational database, however, is limited to that program and cannot be database, however, is limited to that program and cannot be extracted and applied to a number of other software programs, or extracted and applied to a number of other software programs, or other database files within a school or administrative other database files within a school or administrative system. The data can only be "copied and pasted.“ system. The data can only be "copied and pasted.“ Example: a spread sheetExample: a spread sheet

Relational databasesRelational databasesIn relational databases, fields can be used in a number of In relational databases, fields can be used in a number of ways (and can be of variable length), provided that they ways (and can be of variable length), provided that they are linked in tables. It is developed based on a database are linked in tables. It is developed based on a database model that provides for logical connections among files model that provides for logical connections among files (known as tables) by including identifying data from one (known as tables) by including identifying data from one table in another tabletable in another table

Page 17: Pharmacoinformatics Database basics(sree)

Data structureData structure In In computer sciencecomputer science, a , a data structuredata structure is a particular way is a particular way

of storing and organizing of storing and organizing datadata in a in a computercomputer so that it can so that it can be used be used efficiently.efficiently.

Data structures are used in almost every program or Data structures are used in almost every program or software systemsoftware system

Different kinds of data structures are suited to different Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to kinds of applications, and some are highly specialized to specific tasks. For example, specific tasks. For example, B-treesB-trees are particularly well- are particularly well-suited for implementation of databases, while suited for implementation of databases, while compilercompiler implementations usually use implementations usually use hash tableshash tables to look up to look up identifiers.identifiers.

PrinciplePrinciple::• Data structures are generally based on the ability of a Data structures are generally based on the ability of a

computer to fetch and store data at any place in its computer to fetch and store data at any place in its memory, specified by an memory, specified by an addressaddress--a bit string that can be --a bit string that can be itself stored in memory and manipulated by the program.itself stored in memory and manipulated by the program.

• The implementation of a data structure usually requires The implementation of a data structure usually requires writing a set of writing a set of proceduresprocedures that create and manipulate that create and manipulate instances of that structure.instances of that structure.

Page 18: Pharmacoinformatics Database basics(sree)

Common data structuresCommon data structures• ArrayArray, --An , --An arrayarray is a systematic arrangement of is a systematic arrangement of

objects, usually in rows and columns.objects, usually in rows and columns.• • linked listlinked list, --, --linked listlinked list (or more clearly, "singly- (or more clearly, "singly-

linked list") is a linked list") is a data structuredata structure that consists of a that consists of a sequence of sequence of nodesnodes each of which contains a each of which contains a referencereference (i.e., a (i.e., a linklink) to the next node in the ) to the next node in the sequence.sequence.

• hash-tablehash-table,-,-hash tablehash table or or hash maphash map is a is a data structuredata structure that uses a that uses a hash functionhash function to map to map identifying values, known as identifying values, known as keyskeys (e.g., a (e.g., a person's name), to their associated person's name), to their associated valuesvalues (e.g., (e.g., their telephone number).their telephone number).

• heapheap, --, --heapheap is a specialized is a specialized treetree-based -based data structuredata structure that satisfies the that satisfies the heap property.heap property.

Page 19: Pharmacoinformatics Database basics(sree)

• B-treeB-tree, --a , --a B-treeB-tree is a is a tree data structuretree data structure that that keeps data sorted and allows searches, keeps data sorted and allows searches, sequential access, insertions, and deletions in sequential access, insertions, and deletions in logarithmic logarithmic amortized timeamortized time..

• red-black treered-black tree, -- a type of , -- a type of self-balancing binary search treeself-balancing binary search tree, a , a data structuredata structure used in used in computing sciencecomputing science, typically used to , typically used to implement implement associative arraysassociative arrays. ---organize pieces . ---organize pieces of comparable of comparable datadata, such as text fragments or , such as text fragments or numbersnumbers

• trietrie.--a .--a trietrie, or , or prefix treeprefix tree, is an , is an ordered treeordered tree data structuredata structure that is used to store an that is used to store an associative arrayassociative array where the keys are usually where the keys are usually stringsstrings..

Page 20: Pharmacoinformatics Database basics(sree)

Language support:Language support:

Most Assembly languagesMost Assembly languages and some low-level and some low-level languages ex: languages ex: BCPLBCPL generally lack support for generally lack support for data structuresdata structures

Many Many high-level programming languageshigh-level programming languages, and some , and some higher-level assembly languages, ex: higher-level assembly languages, ex: MASMMASM, on , on the other hand, have special syntax or other the other hand, have special syntax or other built-in support for certain data structures,built-in support for certain data structures,

Programming languages: supported with standard Programming languages: supported with standard libraries that implement the most common data libraries that implement the most common data structures ex: the structures ex: the C++C++ Standard Template LibraryStandard Template Library, the , the Java Collections FrameworkJava Collections Framework, and , and MicrosoftMicrosoft's 's .NET Framework.NET Framework..

Page 21: Pharmacoinformatics Database basics(sree)

Sequence database:Sequence database:

---In the field of ---In the field of bioinformaticsbioinformatics, a , a sequence sequence databasedatabase is a large collection of computerized (" is a large collection of computerized ("digitaldigital") ") nucleic acid sequencesnucleic acid sequences, , protein sequencesprotein sequences, or other sequences stored on , or other sequences stored on a computer. A database can include sequences a computer. A database can include sequences from only one organism (e.g., a database for all from only one organism (e.g., a database for all proteins in proteins in SaccharomycesSaccharomyces cerevisiaecerevisiae), or it can ), or it can include sequences from all organisms whose include sequences from all organisms whose DNADNA has been sequenced.has been sequenced.

Ex: Ex: Protein structure database-Protein structure database--- In In biologybiology, a , a protein structure databaseprotein structure database is a is a databasedatabase that that is is modeledmodeled around the various around the various experimentally determinedexperimentally determined protein structuresprotein structures. . The aim of most protein structure databases is to The aim of most protein structure databases is to organize and annotate the protein structures, organize and annotate the protein structures, providing the biological community access to the providing the biological community access to the experimental data in a useful way.experimental data in a useful way.

Page 22: Pharmacoinformatics Database basics(sree)

Examples of protein structure databases include (in alphabetical order);Examples of protein structure databases include (in alphabetical order); Database of Macromolecular MovementsDatabase of Macromolecular Movements describes the motions that occur in describes the motions that occur in

proteins and other macromolecules, particularly using movies proteins and other macromolecules, particularly using movies JenaLibJenaLib the Jena the Jena Library of Biological Macromolecules is aimed at a better dissemination of information Library of Biological Macromolecules is aimed at a better dissemination of information on three-dimensional biopolymer structures with an emphasis on visualization and on three-dimensional biopolymer structures with an emphasis on visualization and analysis. analysis. MODBASEMODBASE a database of three-dimensional protein models calculated by a database of three-dimensional protein models calculated by comparative modeling comparative modeling PDBePDBe the European resource for the collection, organisation the European resource for the collection, organisation and dissemination of data on biological macromolecular structures, and a member of and dissemination of data on biological macromolecular structures, and a member of the the Worldwide Protein Data BankWorldwide Protein Data Bank. . OCAOCA a browser-database for protein a browser-database for protein structure/function - The OCA integrates information from structure/function - The OCA integrates information from KEGGKEGG, , OMIMOMIM, , PDBselectPDBselect, , PfamPfam, , PubMedPubMed, , SCOPSCOP, , SwissProtSwissProt, and others. , and others. OPMOPM provides spatial positions of provides spatial positions of protein three-dimensional structures with respect to the protein three-dimensional structures with respect to the lipid lipid bilayerbilayer. . PDB PDB LiteLite derived from OCA, PDB Lite was provided to make it as easy as possible to find and derived from OCA, PDB Lite was provided to make it as easy as possible to find and view a macromolecule within the PDB view a macromolecule within the PDB PDBsumPDBsum provides an overview provides an overview macromolecular structures in the PDB, giving schematic diagrams of the molecules in macromolecular structures in the PDB, giving schematic diagrams of the molecules in each structure and of the interactions between them each structure and of the interactions between them PDBTMPDBTM the Protein Data Bank the Protein Data Bank of of TransmembraneTransmembrane Proteins Proteins — a selection of the PDB. — a selection of the PDB. PDBWikiPDBWiki a community a community annotated knowledge base of biological molecular structures annotated knowledge base of biological molecular structures [1][1] ProteinProtein the the NIHNIH protein database, a collection of sequences from several sources, including protein database, a collection of sequences from several sources, including translations from annotated coding regions in translations from annotated coding regions in GenBankGenBank, , RefSeqRefSeq and and TPATPA, as well as , as well as records from records from SwissProtSwissProt, , PIRPIR, , PRFPRF, and , and PDBPDB ProteopediaProteopedia the collaborative, 3D the collaborative, 3D encyclopedia of proteins and other molecules. A wiki that contains a page for every encyclopedia of proteins and other molecules. A wiki that contains a page for every entry in the PDB (>50,000 pages), with a entry in the PDB (>50,000 pages), with a JmolJmol view that highlights functional sites view that highlights functional sites and ligands. Offers an easy-to-use scene-authoring tool so you don't have to learn and ligands. Offers an easy-to-use scene-authoring tool so you don't have to learn Jmol script language to create customized molecular scenes. Custom scenes are Jmol script language to create customized molecular scenes. Custom scenes are easily attached to "green links" in descriptive text that display those scenes in Jmol. easily attached to "green links" in descriptive text that display those scenes in Jmol. SCOPSCOP the Structural Classification of Proteins the Structural Classification of Proteins [2][2] a detailed and comprehensive a detailed and comprehensive description of the structural and evolutionary relationships between all proteins description of the structural and evolutionary relationships between all proteins whose structure is known. whose structure is known. SWISS-MODEL RepositorySWISS-MODEL Repository a database of annotated a database of annotated protein models calculated by homology modeling protein models calculated by homology modeling TOPSANTOPSAN the Open Protein the Open Protein Structure Annotation Network — a wiki designed to collect, share and distribute Structure Annotation Network — a wiki designed to collect, share and distribute information about protein three-dimensional structures. Retrieved from "information about protein three-dimensional structures. Retrieved from "http://http://en.wikipedia.org/wiki/Protein_structure_databaseen.wikipedia.org/wiki/Protein_structure_database""

Page 23: Pharmacoinformatics Database basics(sree)

Sequence analysisSequence analysisDef: Def: The term "The term "sequence analysissequence analysis" in biology implies " in biology implies

subjecting a subjecting a DNADNA or or peptide sequencepeptide sequence to to sequence alignmentsequence alignment, , sequence databasessequence databases, , repeated sequencerepeated sequence searches, or other searches, or other bioinformaticsbioinformatics methods on a computer.methods on a computer.

Sequence analysis in Sequence analysis in molecular biologymolecular biology and bioinformatics is an and bioinformatics is an automated, computer-based examination of characteristic fragments, e.g. automated, computer-based examination of characteristic fragments, e.g. of a DNA strand. It basically includes relevant topics:of a DNA strand. It basically includes relevant topics:

The comparison of sequences in order to find similarity and dissimilarity in The comparison of sequences in order to find similarity and dissimilarity in compared sequences (sequence alignment) compared sequences (sequence alignment)

Identification of Identification of gene-structuresgene-structures, , reading framesreading frames, distributions of , distributions of intronsintrons and and exonsexons and and regulatory elementsregulatory elements

Finding and comparing Finding and comparing point mutationspoint mutations or the or the single nucleotide polymorphismsingle nucleotide polymorphism (SNP) in organism in order to get the (SNP) in organism in order to get the genetic marker. genetic marker.

Revealing the evolution and Revealing the evolution and genetic diversitygenetic diversity of organisms. of organisms. Function Function annotationannotation of genes. of genes. In In chemistrychemistry, sequence analysis comprises techniques used to do , sequence analysis comprises techniques used to do

determine the sequence of a determine the sequence of a polymerpolymer formed of several formed of several monomersmonomers. In . In molecular biologymolecular biology and and geneticsgenetics, the same process is called simply ", the same process is called simply "sequencingsequencing".".

In In marketingmarketing, sequence analysis is often used in analytical customer , sequence analysis is often used in analytical customer relationship management applications, such as NPTB models (Next Product relationship management applications, such as NPTB models (Next Product to Buy).to Buy).

Page 24: Pharmacoinformatics Database basics(sree)

Sequence Analysis in Molecular Biology:Sequence Analysis in Molecular Biology:

Sequence Alignment Sequence Alignment is a way of arranging the sequences of is a way of arranging the sequences of DNADNA, , RNARNA, , or or proteinprotein sequences to identify regions of similarity. It generally falls into sequences to identify regions of similarity. It generally falls into two types:two types:

-Pairwise alignment: Alignment between two sequences -Pairwise alignment: Alignment between two sequences

-Multiple alignment: Alignment between more than two sequences -Multiple alignment: Alignment between more than two sequences Existing methods for pairwise alignment include: Existing methods for pairwise alignment include: Needleman-Needleman-WunschWunsch

algorithm algorithm, , Smith-Waterman algorithmSmith-Waterman algorithm, and , and BLASTBLAST Existing methods for multiple alignment include: Existing methods for multiple alignment include: ClustalWClustalW, , PROBCONSPROBCONS, ,

MUSCLEMUSCLE, , MAFFTMAFFT, , DIALIGNDIALIGN, , T-CoffeeT-Coffee, POA, and , POA, and MANGOMANGO..

Motif FindingMotif Finding

Motif PredictionMotif Prediction

MethodologyMethodology

The tasks that lie in the space of sequence analysis are often non-trivial The tasks that lie in the space of sequence analysis are often non-trivial to resolve and require the use of relatively complex approaches. Of the to resolve and require the use of relatively complex approaches. Of the many types of methods used in practice, the most popular include:many types of methods used in practice, the most popular include:

Artificial Neural NetworkArtificial Neural Network, , Hidden Markov ModelHidden Markov Model Support Vector MachineSupport Vector Machine ClusteringClustering Bayesian NetworkBayesian Network Regression AnalysisRegression Analysis

Page 25: Pharmacoinformatics Database basics(sree)

List of Computational Chemistry Software – ResourcesList of Computational Chemistry Software – Resources

Bioinformatics SoftwareBioinformatics Software

CheminformaticsCheminformatics Software Software

LIMS SoftwareLIMS Software

Computer-Assisted Molecular Modeling SoftwareComputer-Assisted Molecular Modeling Software

CADD - Biopolymer Modeling SoftwareCADD - Biopolymer Modeling Software

CADD - General Modeling SoftwareCADD - General Modeling Software

CADD - Conformational Search SoftwareCADD - Conformational Search Software

CADD - General ToolsCADD - General Tools

CADD - Molecular Mechanics/Dynamics SoftwareCADD - Molecular Mechanics/Dynamics Software

CADD - Quantum Chemistry SoftwareCADD - Quantum Chemistry Software

CADD - Display SoftwareCADD - Display Software

Structural Chemistry SoftwareStructural Chemistry Software

Structural Chemistry Software for Structural Chemistry Software for XrayXray Analysis Analysis

Structural Chemistry Software for IR AnalysisStructural Chemistry Software for IR Analysis

Structural Chemistry Software for MS AnalysisStructural Chemistry Software for MS Analysis

Structural Chemistry Software for NMR AnalysisStructural Chemistry Software for NMR Analysis

General Software ToolsGeneral Software Tools

Page 26: Pharmacoinformatics Database basics(sree)

Lists of Software for Bioinformatics:Lists of Software for Bioinformatics:

Sequence Databases Sequence Databases : ex: : ex: AceDB (AceDB (genome database ); genome database ); The BioCycThe BioCyc (databases provides electronic reference sources on the pathways and (databases provides electronic reference sources on the pathways and genomes of different organisms ); genomes of different organisms ); Biopendium:Biopendium: (brings together (brings together information on sequence, structure and function relationships for all gene information on sequence, structure and function relationships for all gene products in the public domain.); products in the public domain.); CAMELEONCAMELEON is a set of multiple sequence is a set of multiple sequence alignment tools with links to databases of known 3D structural alignment tools with links to databases of known 3D structural fragments ); fragments ); ERGO LightERGO Light is a curated database of public and proprietary is a curated database of public and proprietary genomic DNA, with connected similarities, functions, pathways, functional genomic DNA, with connected similarities, functions, pathways, functional models, clusters and more ; models, clusters and more ; ExpasyExpasy site site contains a 2-D gel data contains a 2-D gel data database, searching engine and links to several gel databases throughout database, searching engine and links to several gel databases throughout the world. ); the world. ); GAIA 22GAIA 22 is a Chromosome 22 specific version of the GAIA is a Chromosome 22 specific version of the GAIA database. GAIA is a data analysis and storage system for genomic database. GAIA is a data analysis and storage system for genomic sequence and its annotation. As a data analysis engine it accepts raw sequence and its annotation. As a data analysis engine it accepts raw genomic sequence and automatically adds significant annotation );genomic sequence and automatically adds significant annotation ); GeneCardsGeneCards is a database of human genes, their products and their is a database of human genes, their products and their involvement in diseases );involvement in diseases ); GENESEQ GENESEQ was a database of protein and was a database of protein and nucleic acid sequences extracted from world-wide patent documents ; nucleic acid sequences extracted from world-wide patent documents ; GeneWorksGeneWorks - was an integrated sequence analysis and database - was an integrated sequence analysis and database searching ; searching ; ISYS(TM)ISYS(TM), is the National Center for Genome Resources' new , is the National Center for Genome Resources' new product that integrates independent bioinformatic software tools and product that integrates independent bioinformatic software tools and databases ); databases ); OligoMasterOligoMaster is a multi-user oligonucleotide cataloguing is a multi-user oligonucleotide cataloguing application designed to help biologists manage and organise their application designed to help biologists manage and organise their oligonucleotide collections, available in versions for Windows, Macintosh oligonucleotide collections, available in versions for Windows, Macintosh and Linux); and Linux); PhyloPatPhyloPat provides phylogenetic pattern analysis of eukaryotic provides phylogenetic pattern analysis of eukaryotic genes.; genes.; ProteinCenter(™)ProteinCenter(™) integrates the contents of a large number of integrates the contents of a large number of public protein sequence databases and your experimental systems biology public protein sequence databases and your experimental systems biology data.data. Relibase Relibase is a web-based tool for searching and analysing protein is a web-based tool for searching and analysing protein ligand structures in the PDB); ligand structures in the PDB);

Page 27: Pharmacoinformatics Database basics(sree)

ResNetResNet is a comprehensive database of molecular networks and protein is a comprehensive database of molecular networks and protein interactions, derived from automatic analysis of the whole PubMed.; interactions, derived from automatic analysis of the whole PubMed.; The The Rosetta Resolver SystemRosetta Resolver System, provides high-capacity data storage, retrieval , provides high-capacity data storage, retrieval and analysis of gene expression data. The system is ideal for life science and analysis of gene expression data. The system is ideal for life science research organizations that need to assess compound specificity or research organizations that need to assess compound specificity or toxicity, identify new genes or therapeutic targets, or compare and analyze toxicity, identify new genes or therapeutic targets, or compare and analyze large databases of expression profiles.; large databases of expression profiles.; SGDSGD is a scientific database of the is a scientific database of the molecular biology and genetics of the yeast Saccharomyces cerevisiae, molecular biology and genetics of the yeast Saccharomyces cerevisiae, which is commonly known as baker's or budding yeast; which is commonly known as baker's or budding yeast; SRSSRS is a database is a database integration and biological information search system. It is capable of integration and biological information search system. It is capable of quering 400 different molecular biology, bibliographic, compound data, quering 400 different molecular biology, bibliographic, compound data, genetic and medical databases via a single interface. ; genetic and medical databases via a single interface. ; Software Solution Software Solution for BioMedicine (SSBM)for BioMedicine (SSBM) offers high-speed analysis of both public and offers high-speed analysis of both public and proprietary genetic databases within the security of the corporate firewall; proprietary genetic databases within the security of the corporate firewall; Vector NTIVector NTI is a Macintosh- and Windows-based molecular biology support is a Macintosh- and Windows-based molecular biology support system .system .

Pathway Analysis Tools Pathway Analysis Tools

Structure Prediction and Analysis ToolsStructure Prediction and Analysis Tools

Sequence Analysis ToolsSequence Analysis Tools

Sequence Management ToolsSequence Management Tools

Visualization ToolsVisualization Tools

Page 28: Pharmacoinformatics Database basics(sree)

Sequence Analysis Tools: Software resources:Sequence Analysis Tools: Software resources:

AAT - Analysis and Annotation ToolAAT - Analysis and Annotation Tool used to identify genes by comparing cDNA used to identify genes by comparing cDNA and protein sequence databases.and protein sequence databases.

ABI PRISM ; AcaClone pDRAW32ABI PRISM ; AcaClone pDRAW32 ; ; AGCTAGCT ; ; AlleleID ; AlleleID ; AntheprotAntheprot Protein analysis Protein analysis software ; software ; Array Designer 4Array Designer 4: : arraySCOUTarraySCOUTTMTM is a gene expression data analysis is a gene expression data analysis application ; application ; ArtemisArtemis is a free genome viewer and annotation tool ; is a free genome viewer and annotation tool ; AsteriasAsterias is a is a suite of freely- accessible web-based genomic data analysis programs ; suite of freely- accessible web-based genomic data analysis programs ; Bio ImageBio Image is a life sciences software information company which carries a wide variety of is a life sciences software information company which carries a wide variety of electrophoresis image analysis software for Windows, Powermac, and UNIX ; electrophoresis image analysis software for Windows, Powermac, and UNIX ; BioinformatiXBioinformatiX is an enterprise software which provides an environment for the is an enterprise software which provides an environment for the analysis of microarray data. ; analysis of microarray data. ; BioRainbow Analysis ToolsBioRainbow Analysis Tools are a collection of are a collection of software tools for binding site prediction, weight matrix search, regulatory sequences software tools for binding site prediction, weight matrix search, regulatory sequences analysis, microarray analysis, footprint ; analysis, microarray analysis, footprint ; bioSCOUT®bioSCOUT® is a comprehensive and is a comprehensive and customizable bioinformatics package ; customizable bioinformatics package ; BioToolsBioTools offers three primary bioinformatics offers three primary bioinformatics products: GeneTool for DNA sequence analysis, PepTool for protein sequence products: GeneTool for DNA sequence analysis, PepTool for protein sequence analysis, and ChromaTool for chromatogram analysis; analysis, and ChromaTool for chromatogram analysis; BlockSearchBlockSearch is a quantitative is a quantitative method for the elucidation of unknown protein functions; method for the elucidation of unknown protein functions; BosqueBosque ( (http://http://bosque.udec.clbosque.udec.cl) is a distributed software environment oriented to manage the ) is a distributed software environment oriented to manage the computational resources involved in typical phylogenetic analyses computational resources involved in typical phylogenetic analyses Clann:Clann: Software Software for investigating phylogenomic information using supertrees ; for investigating phylogenomic information using supertrees ; CURVESCURVES, by Richard , by Richard Lavery and Heinz Sklenar is a very useful nucleic acid helical analysis program. Lavery and Heinz Sklenar is a very useful nucleic acid helical analysis program. DNADynamoDNADynamois a general purpose software for DNA and Protein sequence analysis is a general purpose software for DNA and Protein sequence analysis DNASISDNASIS is a robust sequence analysis software package that delivers industry is a robust sequence analysis software package that delivers industry standard functionality standard functionality DNPTrapperDNPTrapper is a shotgun sequencing assembly editing tool, is a shotgun sequencing assembly editing tool, specifically designed for finishing and analysis of repeated regions. specifically designed for finishing and analysis of repeated regions. EuGene and EuGene and SAmSAm is a menus based DNA and protein sequence analysis package is a menus based DNA and protein sequence analysis package GenchekGenchek , , developed by Ocimum Biosolutions is a comprehensive, LIMS based, user friendly developed by Ocimum Biosolutions is a comprehensive, LIMS based, user friendly Nucleotide and Polypeptide Sequence Analysis Tool with a backend Relational Nucleotide and Polypeptide Sequence Analysis Tool with a backend Relational Database Database Genehound(™)Genehound(™) offers a new, innovative, and exciting apporach to offers a new, innovative, and exciting apporach to identifying coding regions in prokaryotic genomes identifying coding regions in prokaryotic genomes GeneInformGeneInform is an easy-to- is an easy-to-operate gene expression management and analysis tool that saves cost and time by operate gene expression management and analysis tool that saves cost and time by facilitating the collection, storage, analysis, and sharing of gene expression data facilitating the collection, storage, analysis, and sharing of gene expression data

Page 29: Pharmacoinformatics Database basics(sree)

Gene Inspector(™)1.5:Gene Inspector(™)1.5: A powerful and versatile combination of an electronic A powerful and versatile combination of an electronic laboratory notebook and sequence analysis package for biologists. laboratory notebook and sequence analysis package for biologists. GeneLinkerGeneLinker products are the easiest way for researchers to start analyzing gene expression data. products are the easiest way for researchers to start analyzing gene expression data. GeneJockeyGeneJockey is a program for editing, manipulation, and analysis of nucleic acid and is a program for editing, manipulation, and analysis of nucleic acid and protein sequences. protein sequences. GENEMARKGENEMARK is a genefinding tool available from the Georgia is a genefinding tool available from the Georgia Institute of Technology that uses an algorithm based on non-homogenous Markov Institute of Technology that uses an algorithm based on non-homogenous Markov chain models.chain models. GENEPARSER GENEPARSER is a coding region recognition program from the is a coding region recognition program from the University of Colorado that uses potential similarity between query sequence and University of Colorado that uses potential similarity between query sequence and known amino acid sequences.known amino acid sequences. GeneSifterGeneSifter™, a Web-based microarray analysis ™, a Web-based microarray analysis system that combines data management and analytical functions with integrated, system that combines data management and analytical functions with integrated, current gene annotation from databases such as Unigene and LocusLink. current gene annotation from databases such as Unigene and LocusLink. GeneSolveGeneSolve is a single-User desktop sofware package for analyzing nucleic acid sequence is a single-User desktop sofware package for analyzing nucleic acid sequence infromationinfromationGeneStudio ProGeneStudio Pro from GeneStudio, Inc. ( from GeneStudio, Inc. (http://http://www.genestudio.comwww.genestudio.com) is ) is a newly developed suite of molecular biology programs for Windowsa newly developed suite of molecular biology programs for WindowsGeneWorksGeneWorks - an - an integrated sequence analysis and database searching on the Macintosh previously integrated sequence analysis and database searching on the Macintosh previously marketed by Oxford Molecular Groupmarketed by Oxford Molecular GroupGenomeBrowserGenomeBrowser is a powerful software tool is a powerful software tool that simplifies the proccess of analysis, annotation, and manipulation of genetic that simplifies the proccess of analysis, annotation, and manipulation of genetic sequences.sequences. GenieGenie, from LBNL, is a gene finder based on generalized hidden , from LBNL, is a gene finder based on generalized hidden Markov models to locate multi-exon genes. Etc…Markov models to locate multi-exon genes. Etc…

Page 30: Pharmacoinformatics Database basics(sree)

Relational DatabaseRelational Database

Definition:Definition:– Data stored in tables that are associated Data stored in tables that are associated

by shared attributes (keys). by shared attributes (keys). – Any data element (or entity) can be Any data element (or entity) can be

found in the database through the name found in the database through the name of the table, the attribute name, and the of the table, the attribute name, and the value of the primary key.value of the primary key.

Page 31: Pharmacoinformatics Database basics(sree)

Relational Database DefinitionsRelational Database Definitions

Entity:Entity: Object, Concept or event Object, Concept or event (subject)(subject)

Attribute:Attribute: a Characteristic of an a Characteristic of an entityentity

Row or Record:Row or Record: the specific the specific characteristics of one entitycharacteristics of one entity

Table:Table: a collection of records a collection of records Database:Database: a collection of tables a collection of tables

Page 32: Pharmacoinformatics Database basics(sree)
Page 33: Pharmacoinformatics Database basics(sree)
Page 34: Pharmacoinformatics Database basics(sree)
Page 35: Pharmacoinformatics Database basics(sree)
Page 36: Pharmacoinformatics Database basics(sree)
Page 37: Pharmacoinformatics Database basics(sree)
Page 38: Pharmacoinformatics Database basics(sree)
Page 39: Pharmacoinformatics Database basics(sree)
Page 40: Pharmacoinformatics Database basics(sree)
Page 41: Pharmacoinformatics Database basics(sree)
Page 42: Pharmacoinformatics Database basics(sree)
Page 43: Pharmacoinformatics Database basics(sree)
Page 44: Pharmacoinformatics Database basics(sree)
Page 45: Pharmacoinformatics Database basics(sree)
Page 46: Pharmacoinformatics Database basics(sree)
Page 47: Pharmacoinformatics Database basics(sree)
Page 48: Pharmacoinformatics Database basics(sree)
Page 49: Pharmacoinformatics Database basics(sree)
Page 50: Pharmacoinformatics Database basics(sree)
Page 51: Pharmacoinformatics Database basics(sree)
Page 52: Pharmacoinformatics Database basics(sree)
Page 53: Pharmacoinformatics Database basics(sree)
Page 54: Pharmacoinformatics Database basics(sree)
Page 55: Pharmacoinformatics Database basics(sree)
Page 56: Pharmacoinformatics Database basics(sree)
Page 57: Pharmacoinformatics Database basics(sree)
Page 58: Pharmacoinformatics Database basics(sree)
Page 59: Pharmacoinformatics Database basics(sree)
Page 60: Pharmacoinformatics Database basics(sree)
Page 61: Pharmacoinformatics Database basics(sree)
Page 62: Pharmacoinformatics Database basics(sree)
Page 63: Pharmacoinformatics Database basics(sree)
Page 64: Pharmacoinformatics Database basics(sree)

Overview of Phylogenetic AnalysisOverview of Phylogenetic Analysis

• Phylogenetic analysis is the process you use to determine the evolutionary Phylogenetic analysis is the process you use to determine the evolutionary relationships between organisms. relationships between organisms.

• The results of an analysis can be drawn in a hierarchical diagram called a The results of an analysis can be drawn in a hierarchical diagram called a cladogram or phylogram (phylogenetic tree). cladogram or phylogram (phylogenetic tree).

• The branches in a tree are based on the hypothesized evolutionary The branches in a tree are based on the hypothesized evolutionary relationships (phylogeny) between organisms. relationships (phylogeny) between organisms.

• Each member in a branch, also known as a monophyletic group, is Each member in a branch, also known as a monophyletic group, is assumed to be descended from a common ancestor. assumed to be descended from a common ancestor.

• Originally, phylogenetic trees were created using morphology, but now, Originally, phylogenetic trees were created using morphology, but now, determining evolutionary relationships includes matching patterns in determining evolutionary relationships includes matching patterns in nucleic acid and protein sequences.nucleic acid and protein sequences.

Example:Example:

-----phylogenetic tree is constructed from mitochondrial DNA (mtDNA) -----phylogenetic tree is constructed from mitochondrial DNA (mtDNA) sequences for the sequences for the

family Hominidae. This family includes gorillas, chimpanzees, orangutans, and family Hominidae. This family includes gorillas, chimpanzees, orangutans, and humans.humans.

Searching NCBI for Phylogenetic DataSearching NCBI for Phylogenetic Data The NCBI taxonomy Web site includes phylogenetic and taxonomic The NCBI taxonomy Web site includes phylogenetic and taxonomic

information from many sources. These sources include the published information from many sources. These sources include the published literature, Web databases, and taxonomy experts. And while the NCBI literature, Web databases, and taxonomy experts. And while the NCBI taxonomy database is not a phylogenetic or taxonomic authority, it can be taxonomy database is not a phylogenetic or taxonomic authority, it can be useful as a gateway to the NCBI biological sequence databasesuseful as a gateway to the NCBI biological sequence databases

Page 65: Pharmacoinformatics Database basics(sree)
Page 66: Pharmacoinformatics Database basics(sree)
Page 67: Pharmacoinformatics Database basics(sree)
Page 68: Pharmacoinformatics Database basics(sree)
Page 69: Pharmacoinformatics Database basics(sree)
Page 70: Pharmacoinformatics Database basics(sree)
Page 71: Pharmacoinformatics Database basics(sree)

Principles of data organizationPrinciples of data organization

Database --a collection of related structured information about entitiesDatabase --a collection of related structured information about entities

File -- a collection of recordsFile -- a collection of records

Record--a set of fieldsRecord--a set of fields

Field --a single characteristic of an entityField --a single characteristic of an entity

Character--a symbol used in data fieldCharacter--a symbol used in data field

Page 72: Pharmacoinformatics Database basics(sree)
Page 73: Pharmacoinformatics Database basics(sree)
Page 74: Pharmacoinformatics Database basics(sree)
Page 75: Pharmacoinformatics Database basics(sree)
Page 76: Pharmacoinformatics Database basics(sree)
Page 77: Pharmacoinformatics Database basics(sree)
Page 78: Pharmacoinformatics Database basics(sree)
Page 79: Pharmacoinformatics Database basics(sree)
Page 80: Pharmacoinformatics Database basics(sree)
Page 81: Pharmacoinformatics Database basics(sree)
Page 82: Pharmacoinformatics Database basics(sree)
Page 83: Pharmacoinformatics Database basics(sree)
Page 84: Pharmacoinformatics Database basics(sree)
Page 85: Pharmacoinformatics Database basics(sree)

Selecting a Database Selecting a Database Management SystemManagement System

Database management systems (or DBMSs) can be Database management systems (or DBMSs) can be divided into two categories -- desktop databases and divided into two categories -- desktop databases and server databases. server databases.

Generally speaking, desktop databases are oriented Generally speaking, desktop databases are oriented toward single-user applications and reside on toward single-user applications and reside on standard personal computers (hence the term standard personal computers (hence the term desktop). desktop).

Server databases contain mechanisms to ensure the Server databases contain mechanisms to ensure the reliability and consistency of data and are geared reliability and consistency of data and are geared toward multi-user applications.toward multi-user applications.

Page 86: Pharmacoinformatics Database basics(sree)
Page 87: Pharmacoinformatics Database basics(sree)

Selecting a database system: Selecting a database system: Need AnalysisNeed Analysis

The needs analysis process will be specific to your The needs analysis process will be specific to your organization but, at a minimum, should answer the organization but, at a minimum, should answer the following questions:following questions:

How many records we will warehouse and for how long?How many records we will warehouse and for how long? Who will be using the database and what tasks will they Who will be using the database and what tasks will they

perform? perform? How often will the data be modified? Who will make these How often will the data be modified? Who will make these

modifications?modifications? Who will be providing IT support for the database?Who will be providing IT support for the database? What hardware is available? Is there a budget for What hardware is available? Is there a budget for

purchasing additional hardware?purchasing additional hardware? Who will be responsible for maintaining the data?Who will be responsible for maintaining the data? Will data access be offered over the Internet? If so, what Will data access be offered over the Internet? If so, what

level of access should be supported?level of access should be supported?

Page 88: Pharmacoinformatics Database basics(sree)

Some DefinitionsSome Definitions

A A File:File: A group or collection of similar records, like A group or collection of similar records, like INST6031 Fall Student File, American History 1850-1866 INST6031 Fall Student File, American History 1850-1866 file, Basic Food Group Nutrition Filefile, Basic Food Group Nutrition File

AA record book: record book: a "rolodex" of data records, like address a "rolodex" of data records, like address lists, inventory lists, classes or thematic units, or groupings lists, inventory lists, classes or thematic units, or groupings of other unique records that are combined into one list of other unique records that are combined into one list (found in (found in AppleWorks, FileMaker ProAppleWorks, FileMaker Pro software). software).

A A fieldfield: one category of information, i.e., Name, Address, : one category of information, i.e., Name, Address, Semester Grade, Academic topic Semester Grade, Academic topic

AA record record: one piece of data, i.e., one student's : one piece of data, i.e., one student's information, a recipe, a test questioninformation, a recipe, a test question

AA layout layout: a design for a database that contains field names : a design for a database that contains field names and possibly graphics.and possibly graphics.

Database Database glossaryglossary

Page 89: Pharmacoinformatics Database basics(sree)

Tables comprise the fundamental building blocks of any database. If you're Tables comprise the fundamental building blocks of any database. If you're familiar with spreadsheets, you'll find database tables extremely similar. Take a familiar with spreadsheets, you'll find database tables extremely similar. Take a look at this example of a table sample database:look at this example of a table sample database:

The table above contains the employee information for our organization -- The table above contains the employee information for our organization -- characteristics like name, date of birth and title. Examine the construction of characteristics like name, date of birth and title. Examine the construction of the table and you'll find that each column of the table corresponds to a specific the table and you'll find that each column of the table corresponds to a specific employee characteristic (or employee characteristic (or attributeattribute in database terms). Each row in database terms). Each row corresponds to one particular employee and contains his or her information. corresponds to one particular employee and contains his or her information. That's all there is to it! If it helps, think of each one of these tables as a That's all there is to it! If it helps, think of each one of these tables as a spreadsheet-style listing of information.spreadsheet-style listing of information.

Fundamental building blocksFundamental building blocks

Page 90: Pharmacoinformatics Database basics(sree)

Where do we start?Where do we start?

Let’s explore your Let’s explore your “paper system”“paper system”– Client intake formsClient intake forms– Job application formJob application form– Funders reportsFunders reports

Database modeling:Database modeling:– Define required fields Define required fields

from “forms” or from “forms” or required reportsrequired reports

– Avoid repetitionAvoid repetition– Keep it simpleKeep it simple– Identify a unique Identify a unique

identifier or primary keyidentifier or primary key

Page 91: Pharmacoinformatics Database basics(sree)

Some Quality Control Some Quality Control ConsiderationsConsiderations

Remember “garbage in – Remember “garbage in – garbage out”. Some examples garbage out”. Some examples and how to prevent this.and how to prevent this.

Quality management Quality management encompasses three distinct encompasses three distinct processes: quality planning, processes: quality planning, quality control, and quality quality control, and quality improvementimprovement

Quality PlanningQuality Planning in relation to in relation to database systems design:database systems design:– Who will perform data Who will perform data

entry?entry?– Training? On-line help? Training? On-line help? – How data entry will be How data entry will be

performed?performed?

Page 92: Pharmacoinformatics Database basics(sree)

Data entry considerationsData entry considerations

Define “must” enter fields – no record is complete Define “must” enter fields – no record is complete unless: such and such is entered;unless: such and such is entered;

Make data entry fool proof. Example: Grade level Make data entry fool proof. Example: Grade level can be entered as a number (8 or 8can be entered as a number (8 or 8thth or eight). or eight). By using a pull-down menu with the correct data By using a pull-down menu with the correct data format these mistakes can be avoided.format these mistakes can be avoided.

Page 93: Pharmacoinformatics Database basics(sree)

Data Entry – additional Data Entry – additional considerationsconsiderations

Barcode scannersBarcode scanners– USB orUSB or– Wireless attached Wireless attached

to a Palm or Pocket to a Palm or Pocket PCPC

Pocket PCPocket PC– WiFi 802.11g, WiFi 802.11g,

BluetoothBluetooth– Wireless networks Wireless networks

(real-time on (real-time on demand systems) demand systems)

Page 94: Pharmacoinformatics Database basics(sree)

PEOPLE THAT WORK WITH DATABASESPEOPLE THAT WORK WITH DATABASES

System AnalystsSystem Analysts Database DesignersDatabase Designers Application DevelopersApplication Developers Database AdministratorsDatabase Administrators End UsersEnd Users

Page 95: Pharmacoinformatics Database basics(sree)

System AnalystsSystem Analysts

communicate with each prospective database communicate with each prospective database user group in order to understand itsuser group in order to understand its– information needsinformation needs– processing needsprocessing needs

develop a specification of each user group’s develop a specification of each user group’s information and processing needsinformation and processing needs

develop a specification integrating the develop a specification integrating the information and processing needs of the user information and processing needs of the user groupsgroups

document the specificationdocument the specification

Page 96: Pharmacoinformatics Database basics(sree)

Database DesignersDatabase Designers

choose appropriate structures to represent the choose appropriate structures to represent the information specified by the system analystsinformation specified by the system analysts

choose appropriate structures to store the choose appropriate structures to store the information in a normalized manner in order to information in a normalized manner in order to guarantee integrity and consistency of dataguarantee integrity and consistency of data

choose appropriate structures to guarantee an choose appropriate structures to guarantee an efficient systemefficient system

document the database designdocument the database design

Page 97: Pharmacoinformatics Database basics(sree)

Application DevelopersApplication Developers

implement the database designimplement the database design implement the application programs to meet the implement the application programs to meet the

program specificationsprogram specifications test and debug the database implementation and test and debug the database implementation and

the application programsthe application programs document the database implementation and the document the database implementation and the

application programsapplication programs

Page 98: Pharmacoinformatics Database basics(sree)

Database AdministratorsDatabase Administrators

Manage the database structureManage the database structure Manage data activityManage data activity Manage the database management systemManage the database management system

– generate database application performance reportsgenerate database application performance reports– investigate user performance complaintsinvestigate user performance complaints– assess need for changes in database structure or application assess need for changes in database structure or application

designdesign– modify database structuremodify database structure– evaluate and implement new DBMS featuresevaluate and implement new DBMS features– tune the databasetune the database

Establish the database data dictionaryEstablish the database data dictionary– data names, formats, relationshipsdata names, formats, relationships– cross-references between data and application programscross-references between data and application programs

Page 99: Pharmacoinformatics Database basics(sree)

End UsersEnd Users

Parametric end users Parametric end users constantly query and update the constantly query and update the database. They use canned transactions to support database. They use canned transactions to support standard queries and updates.standard queries and updates.

Casual end users Casual end users occasional access the database, but occasional access the database, but may need different information each time. They use may need different information each time. They use sophisticated query languages and browsers.sophisticated query languages and browsers.

Sophisticated end users Sophisticated end users have complex requirement and have complex requirement and need different information each time. They are thoroughly need different information each time. They are thoroughly familiar with the capabilities of the DBMS.familiar with the capabilities of the DBMS.

Page 100: Pharmacoinformatics Database basics(sree)
Page 101: Pharmacoinformatics Database basics(sree)
Page 102: Pharmacoinformatics Database basics(sree)
Page 103: Pharmacoinformatics Database basics(sree)
Page 104: Pharmacoinformatics Database basics(sree)
Page 105: Pharmacoinformatics Database basics(sree)
Page 106: Pharmacoinformatics Database basics(sree)
Page 107: Pharmacoinformatics Database basics(sree)
Page 108: Pharmacoinformatics Database basics(sree)
Page 109: Pharmacoinformatics Database basics(sree)