Top Banner
Introduction to Databases Vetle I. Torvik
23

Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Jan 02, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Introduction to Databases

Vetle I. Torvik

Page 2: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

DNA was the 20th century - Databases are the 21st century

Quantum leaps in the evolution of human brain power– Way-back-when: information in books - phone books,

dictionaries, lab notebooks, journals– Recently: information at your fingertips– Now: scientific discovery at your fingertips

• data mining bio-informatics databases

• data mining text data bases

Page 3: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

How do you find a good movie?

New releases only? Browsing shelves by category (comedy,

action, drama, foreign, etc.)? Browsing through a book at blockbuster

– by titles alphabetically?– by actors alphabetically?– by category?– by year?

Page 4: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.
Page 5: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

A step up...

querying a database

Page 6: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Now imagine this…

Visualizing the entire movie database in ONE figure across ALL dimensions– year, category, actor, director, popularity, rating,

length, language, country, awards, etc.

and drilling down to find your movie(s)

PS: You don’t have to imagine...

Page 7: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.
Page 8: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.
Page 9: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.
Page 10: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Why not do the same in the scientific literature?

Page 11: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Benefits of DBs

Over paper books… a quantum leap– Speed, space, less drudgery

Over spreadsheets … another quantum leap– Maintenance (less redundancy, etc)– Currency (accuracy, up-to-date, on-demand)– Access (across time and space, sharing)– Security (recovery, restrict others’ access)– Facilitates data mining: encode meaning,

inferences, pooling/sharing, visualization

Page 12: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

A Database system to store, retrieve, and manipulate data consists of 4 parts

– Data - collection of linked data files – Hardware - for storage and execution– Software - DB management system (e.g.,

Access, MySQL, Filemaker, Oracle)– Users - DB administrator, data administrator,

application programmers, end users

A Database – an electronic repository for persistent data

Page 13: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Relational DBMSs

Dominates market Data is perceived by users as tables only

• representing, manipulating, and enforcing integrity of data so that operations function correctly

• no duplicate records, rows and columns are unordered, each entry has a single value

SQL = “structured query language”• a standard language for querying databases

• independent of how the data is stored/accessed

Page 14: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Database design - a subjective exercise

Entity/Relationship diagramming– identify entities or

“things that can be distinctly identified”• e.g. movie, category, individual(director, actor)

– identify relationships • e.g. a movie has one director, zero or more actors,

belongs to one category

– draw the diagram

Then “normalize” the database

Page 15: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Ontologies - the basis upon which the truth of the world is viewed

E.g. a movie has one director, zero or more actors, belongs to one category

makes databases a bit more intelligent

allows for making inferences– “the artist formerly known as Prince” - without an artist

name, nobody can make any name related inferences about him…

Page 16: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Metadata - data about the data

It would be nice if SQL knew that actors and directors are both individuals so that (e.g.) querying movies by actor = director makes sense (and this type of query could be optimized)

Page 17: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Data mining

Searching for novel patterns, rules or relationships in data, e.g.:– correlations– classification– clustering – visualization

Versus traditional statistics: hypothesis testing

Page 18: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Data mining - correlations

Searching through many possible pairs of associations to find novel ones, e.g.:– phenotypes versus genotypes

Page 19: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Data mining - classification

find rules that discriminate between predefined categories– e.g., breast cancer diagnosis– RULE #1: IF the following conditions hold ALL true at the SAME TIME,

THEN the case is: "intra-ductal carcinoma”– CONDITIONS:

• The volume of the calcifications is more than 0.03 cm^3.• AND The total number of calcifications is greater than 10.• AND The variation in shape is moderate or marked.• AND The irregularity in size of calcifications is marked.• AND The variation of the density of calcifications is moderate or marked.• AND There is no ductal orientation.• AND The number of calcifications per cm^3 is less than 20.• AND A comparison with previous exams shows a change in the number or

character of calcifications or it is newly developed.– RULE #2: ...

Page 20: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Data mining - clustering

organizing information by naturally occurring groups, e.g.:– cluster languages by similarity of words to assess

their evolution– organizing webpages into themes by word usage

(e.g., www.vivisimo.com)– grouping genes by expression level in DNA

microarrays to find a subset of differentially expressed genes

Page 21: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Data mining - clustering

Page 22: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Data mining - visualization

Looking for patterns across multiple dimensions, and levels of resolution e.g.:– scientific collaboration behavior across time

and subjects– map of power outage over time (what was the

chain of events causing a major outage?)

Page 23: Introduction to Databases Vetle I. Torvik. DNA was the 20 th century - Databases are the 21 st century 4 Quantum leaps in the evolution of human brain.

Data mining begins at home

Your lab notebook is a database. Can you data mine your lab notebook?