Top Banner
Motif Space Database Design Kiranjit Sidhu
12

Motif Space Database Design Kiranjit Sidhu. 2 Outline Schema Design Content of Database Functionality Future Plans.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Motif Space Database Design Kiranjit Sidhu. 2 Outline  Schema Design  Content of Database  Functionality  Future Plans.

Motif Space Database Design

Kiranjit Sidhu

Page 2: Motif Space Database Design Kiranjit Sidhu. 2 Outline  Schema Design  Content of Database  Functionality  Future Plans.

2

Outline Schema Design Content of Database Functionality Future Plans

Page 3: Motif Space Database Design Kiranjit Sidhu. 2 Outline  Schema Design  Content of Database  Functionality  Future Plans.

3

Sample PDB File

Sample PDB File

Each PDB File represented as a text file (~ 60K Lines)

Inefficient for pattern matching Relational Database required for

most efficient solution

Page 4: Motif Space Database Design Kiranjit Sidhu. 2 Outline  Schema Design  Content of Database  Functionality  Future Plans.

4

Structure of Database DB divided into two major components:

Protein Data Motif (Occurrence) Data

Protein Data Obtained from PDB Files (Protein Data Bank) Derived Data

Motif Data Obtained from Luke’s FFSM technique Derived Data

Page 5: Motif Space Database Design Kiranjit Sidhu. 2 Outline  Schema Design  Content of Database  Functionality  Future Plans.

5

Schema Design

Page 6: Motif Space Database Design Kiranjit Sidhu. 2 Outline  Schema Design  Content of Database  Functionality  Future Plans.

6

Schema Design - Protein

Page 7: Motif Space Database Design Kiranjit Sidhu. 2 Outline  Schema Design  Content of Database  Functionality  Future Plans.

7

Schema Design - Motif

Page 8: Motif Space Database Design Kiranjit Sidhu. 2 Outline  Schema Design  Content of Database  Functionality  Future Plans.

8

Tools Used Obtaining Data

Perl Scripts Database:

SQL Server 2000 and SQL Server 2005 T-SQL (Bulk Import Data)

Page 9: Motif Space Database Design Kiranjit Sidhu. 2 Outline  Schema Design  Content of Database  Functionality  Future Plans.

9

Obtaining Data

PDB File Temp Tables (T-SQL)

T-SQL Procedures

CSV FileExtract Import

Final DB Convert and Derive

Page 10: Motif Space Database Design Kiranjit Sidhu. 2 Outline  Schema Design  Content of Database  Functionality  Future Plans.

10

Uploading Protein Data Input dataset: ~ 70,000 PDB/Chain

Combinations Entries in tables:

E.g. Approx. 800 Million Rows in the proteinchaindistance table

Initial version imported 10 PDB files in 1 day

Current version: under 3 minutes

Page 11: Motif Space Database Design Kiranjit Sidhu. 2 Outline  Schema Design  Content of Database  Functionality  Future Plans.

11

Current Functionality Protein (PDB) data has been completely

uploaded into both: Production Database (MotifSpace) Development Database (MotifSpaceDev)

Visualize protein structure using data from database (data available)

Data can be obtained from Server using SOAP or web services.

Basic Queries such as Different PDBs a specific motif occurs in? Histograms to compute statistics.

Page 12: Motif Space Database Design Kiranjit Sidhu. 2 Outline  Schema Design  Content of Database  Functionality  Future Plans.

12

Demo