Oracle Life Sciences User Group Meeting – Reston, VA 2004 Having a BLAST Data Mining in Oracle 10g: Implementing A Bioinformatics Target Database John Burke, Ph.D. UCB Research, Inc. Having a BLAST Data Mining in Oracle 10g: Implementing A Bioinformatics Target Database John Burke, Ph.D. UCB Research, Inc.
39
Embed
Having a BLAST Data Mining in Oracle 10g · Oracle Life Sciences User Group Meeting ... VA 2004 Having a BLAST Data Mining in Oracle 10g: Implementing A Bioinformatics ... Oracle
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Having a BLAST Data Mining in Oracle 10g:
Implementing A Bioinformatics Target Database
John Burke, Ph.D.UCB Research, Inc.
Having a BLAST Data Mining in Oracle 10g:
Implementing A Bioinformatics Target Database
John Burke, Ph.D.UCB Research, Inc.
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Having a BLAST Data Mining in Oracle 10gPreviewPreview
UCB Discovery Research
Designing the Target Database
Building the Target Database
Looking Forward
Oracle Life Sciences User Group Meeting – Reston, VA 2004
UCB Discovery ResearchUCB Discovery Research
Oracle Life Sciences User Group Meeting – Reston, VA 2004
UCB Pharma
Discovery ResearchDiscovery Research
StructureChemistry
BiologyN
NCl OOH
O
ClH2
Oracle Life Sciences User Group Meeting – Reston, VA 2004
UCB Pharma Discovery Research
Discovery Research SitesDiscovery Research Sites
Lille
Cambridge Braine-l’Alleud
?
Oracle Life Sciences User Group Meeting – Reston, VA 2004
UCB Pharma Discovery Research
Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Protein db
LS-Graph
MascotProtein Prospector
Biotools
SwisProtGenbank
...
MALDI-TOFQ-TOF
SIMS +ProteinMine
SAN
Custom on Oracle 10g
GeneXpressSpotfire
Sequencher andOmiga
GCG and SeqwebHuman genome browser (UCSC)
UnigeneTIGR
Proteome PSD
GeneXpress Proteinscape
UCB Pharma Discovery Research
Bioinformatics, Proteomics, and GenomicsBioinformatics, Proteomics, and Genomics
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Designing the Target DatabaseDesigning the Target Database
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Designing the Target DatabaseGeneral RequirementsGeneral Requirements
Purpose: to store and manage target discovery research information efficiently and effectively
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Designing the Target Database
Typical QueriesTypical Queries
• Find all targets similar to this protein with size x in gate y or therapeutic area z.
• Find all targets with a specified (or unknown) function.
• Find all targets scheduled to be reviewed on a specified date .
• Find all projects and targets managed by a given person.
• Find all targets from Affy study x, or literature search, cell line y or species z.
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Designing the Target Database
Critical Factors in Choosing Oracle 10gCritical Factors in Choosing Oracle 10g
Oracle already a UCB standard
Confidence in Oracle product and support
Smaller resource requirement
Shorter development time
Inclusion of BLAST in database• No need to build interface between DB and BLAST• No need to move data from DB to BLAST• Ability to execute other queries combined with BLAST
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Subroutine Descriptionblastp Compares an amino acid query sequence against a protein
sequence database.
blastn Compares a nucleotide query sequence against a nucleotide sequence database.
blastxCompares a nucleotide query sequence translated in all reading frames against a protein sequence database. You could use this option to find potential translation products of an unknown nucleotide sequence.
tblastnCompares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.
tblastxCompares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Designing the Target Database
System ArchitectureSystem Architecture
Application Server 10G
Web Client
OS: Windows XP
Platform: HP Workstation
Web Client
Web Client
Oracle Database 10GOS: Solaris 8
Platform: Sun Enterprise 250
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target DatabaseBuilding the Target Database
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Oracle System Components InstalledOracle System Components Installed
Application Server Tier • JSP Pages• Jakarta Struts Framework • BC4J• Java Beans• Portal
EIS Tier• Oracle 10g Database• BLAST Data Mining
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
JSP Model 2 Architecture – MVC PatternJSP Model 2 Architecture – MVC Pattern
Web Browser
Servlet(Controller)
JSP(View)
User Action
ResponseRedirect
Instantiates
Java Beans(Model)
Data
Oracle 10g Database(Database Server)
Web Container(Application Server)
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Page Flow Page Flow
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Classes - Jakarta Struts FrameworkClasses - Jakarta Struts Framework
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
An Issue with SQL in JavaAn Issue with SQL in Java
Nested IN-Clause Statement failed in Java.
OraclePreparedStatement pstmt = (OraclePreparedStatement)conn.prepareStatement("Select genesymbol from proteins where proteinid " +" IN(Select proteinid from projects_proteins where project_projectid " +" IN(Select projectid from projects where status LIKE :1))");
Identical SELECT statement worked in SQL Plus.
Equivalent statement implemented as Stored Procedure.
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
An Issue with SQL in JavaAn Issue with SQL in Java
PROJECTS_PROTEINS.PROTEIN_PROTEINID AND PROJECTS_PROTEINS.PROJECT_PROJECTID = PROJECTS.PROJECTID AND PROJECTS.PROJECTID = THERAPEUTIC_AREAS.PROJECTIDAND (PROJECTS.status = query OR query IS NULL)AND (THERAPEUTIC_AREAS.AREA_NAME = areaName OR areaName IS NULL) ;
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
JSP interacts with Database via Stored ProceduresJSP interacts with Database via Stored Procedures
Use of stored procedures:
Centralizes SQL, facilitating reuse
Allows the DBA to tune SQL statements
Leverages Oracle’s dependency tracking mechanism
Provides greater security since JSP user unable to directly modify base tablesProvides precompiled code
Offers better performance• Stored procedures load once into the shared pool and remain there unless
they become paged out.
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
BLASTN Stored ProcedureBLASTN Stored ProcedurePROCEDURE "BLASTNTARGETS" IS
-- Using the default parameters in BlastCURSOR blastn_cursor is
select * from TABLE(BLASTN_MATCH ((select seq_data from targets), CURSOR(selectgenesymbol,clonesequence from cdnas, genes where genes.cdnaid=cdnas.cdnaid))) t
where t.score > 25;
BEGIN--OPEN blastn_cursor;OPEN blastn_cursor;--delete the rows in the blastn tableDELETE FROM BLASTN;LOOP
FETCH blastn_cursor INTO T_SEQ_ID,SCORE,EXPECT;EXIT WHEN blastn_cursor%NOTFOUND;INSERT INTO BLASTN VALUES(T_SEQ_ID,SCORE,EXPECT);
END LOOP; CLOSE blastn_cursor;
END blastntargets;
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
An Issue with 10g ASAn Issue with 10g AS
Attempts to deploy application to AS gave server error.Identifying proper expertise and mode of resolution proved difficult.Teamwork ultimately solved problem.
• Oracle Life Sciences• Oracle Customer Service• OLSUG membership• Oracle Consulting Practice
Solution SIMPLE, but of course NOT OBVIOUS
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
An Issue with 10g ASAn Issue with 10g AS
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Request PageRequest Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
BLAST Query PageBLAST Query Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
BLAST Query PageBLAST Query Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
BLAST Result PageBLAST Result Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Request PageRequest Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Query PageQuery Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Query PageQuery Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Building the Target Database
Query Result PageQuery Result Page
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Looking ForwardLooking Forward
Oracle Life Sciences User Group Meeting – Reston, VA 2004
Looking Forward
Short TermShort TermAdditional Features and Improvements• Data input page
• Sexy new name for Target Database
• Integrated BLAST and query
• Report pages
• Integration with other systemsLong TermLong Term
Bioinformatics Portal
Integrated Knowledge Base
Oracle Life Sciences User Group Meeting – Reston, VA 2004
UCB Team
MISMIS ResearchResearch
Prasoon Kejriwal, Cambridge
David Wei, Cambridge
Bob Johnson, Cambridge
Didier Generet, Braine
Didier Chalon, Braine
Karl Nocka, Cambridge
Bob Coopersmith, Cambridge
Zhidong Zhang, Cambridge
Rich Fisher, Cambridge
Pierre Chatelain, Braine
Oracle Life Sciences User Group Meeting – Reston, VA 2004