Top Banner
CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network [email protected] [email protected]
23

CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network [email protected] [email protected].

Jan 29, 2016

Download

Documents

Meghan Cooper
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

CTFS Asia Region Workshop 2014

Shameema EsufaliSuzanne Lao

Data coordinators and technical resources for the network

[email protected]@si.edu

Page 2: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

CTFS WorkshopRelational database basics Why relational databases?

Why MySQL?

What about R?

Page 3: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

From an input sheet to a database

What is a database?Why do we need to convert our input sheet/

excel spreadsheet to a database?What are the advantages and

disadvantages?How does a data entry program help?

Page 4: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

Input form / Excel sheet

How best to store data for

Accuracy Easy retrieval

Page 5: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.
Page 6: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

Relational Theory

In order to work with MySQL it is necessary to understand the basics of relational theory.

i.e how and why data is stored and managed in a relational database.

The guiding principle behind a relational database is to store data once and only

once.

Page 7: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

What is a Relation?

A table. Columns are fields (attributes) of data related to other fields on the

same row (tuple).

Page 8: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

Primary Key

Identifies the row of a table without duplicates.

Tells you what the row contains Eg. If treeid is the primary key then the row

has information about that tree

Page 9: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

Candidate Primary Key

Any attribute(s) which together would serve as the primary key.

Must uniquely identify a row of data. Each part of the key must be essential to unique

identification. No redundancy.

Page 10: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

Foreign Key

A foreign key is a column in a table that matches the primary key column of another table. Its function is to link the basic data of two entities on demand, i.e. when two tables are joined using the common key.

Page 11: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

First Normal Form

One piece of information per column. No repeated rows. Eliminate fused data eg Code1,Code2

Tag Species Code

1234 SHORME A

1234 SHORME BA

Tag Species Code

1234 SHORME A, BA

Wrong!

Right

Page 12: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

Second Normal FormEach column depends on the entire primary key.

Tag Census Species Seedsize X Y DBH

1234 1 SHORTR Medium 11.3

15.4 12

Tag Species Seedsize X Y

1234 SHORTR Medium 11.3

15.4

Wrong

Right

Page 13: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

Third Normal FormEach column depends ONLY on the primary key. i.e.

there are no transitive dependencies

Tag Species Seedsize X Y

1234 SHORTR Medium 11.3 15.4

Tag Species X Y

1234 SHORTR 11.3 15.4

Wrong

Right

Page 14: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

Fourth Normal FormThe table must contain no more than one

multi-valued dependency

Tag DBH Code

1234 10 A

1234 11 A

1234 11 BA

Page 15: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

Entity Relationship diagram (ERD)

Shows in a diagram how entities (tables) are related to one another.

One to One One to many Many to many

Page 16: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

One to one Extension of number of attributes in a single

table Rarely required

Tree More tree attributes

Page 17: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

One to Many

Most common

Requires two tables.

Linked by Foreign Key

Parent Child

Family Genus Species

Page 18: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

Many to many

Need to break down to one to many

Requires three tables

Associative table provides common key

Measurement Code

Tree Code

Measurement

Page 19: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

Reassembling data

Data was broken down into tables to preserve integrity

How can we put it together to derive information?

Use Structured Query Language (SQL) to JOIN tables using a common attribute

Page 20: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

Joins Two tables may be

joined when they share at least one common attribute

The Primary key of the Parent table is stored in the Child table as a cross reference. This is called a Foreign Key.

GenusID Genus FamilyID

1 Acacia 4

2 Acalypha 3

3 Adelia 3

4 Aegiphila 3

5 Alchornea 3

SpeciesID Species GenusId

1 melanoceras 1

2 diversifolia 2

3 macrostachya 2

4 triloba 3

5 panamensis 4

6 costaricensis 5

7 latifolia 5

Primary Key in Parent

Foreign Key in Child

Page 21: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

Table joined on Foreign Key GenusID

SpeciesID Species GenusId GenusID Genus FamilyID1 melanoceras 1 ⇿ 1 Acacia 42 diversifolia 2 ⇿ 2 Acalypha 33 macrostachya 2 ⇿ 2 Acalypha 34 triloba 3 ⇿ 3 Adelia 35 panamensis 4 ⇿ 4 Aegiphila 36 costaricensis 5 ⇿ 5 Alchornea 37 latifolia 5 ⇿ 5 Alchornea 3

The Genus ID in the Species table is used to pick up information for the corresponding Genus. It looks for a row with the matching Primary Key

Page 22: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.

Extend to join many tables

With SQL you can join as many tables as you need to in order to get the set of information you need. Thus the previous example can be extended to include Family which is a parent table of Genus and/or extended in the another direction to include Tree which is a child of Species as long as there is a linking attribute. This attribute is called a Foreign Key.

Page 23: CTFS Asia Region Workshop 2014 Shameema Esufali Suzanne Lao Data coordinators and technical resources for the network shameemaesufali@gmail.com laoz@si.edu.