Top Banner
Environmental Data Analysis with MatLab Lecture 2: Looking at Data
36

Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Dec 14, 2015

Download

Documents

Roy Endicott
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Environmental Data Analysis with MatLab

Lecture 2:Looking at Data

Page 2: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Lecture 01 Using MatLabLecture 02 Looking At DataLecture 03 Probability and Measurement Error Lecture 04 Multivariate DistributionsLecture 05 Linear ModelsLecture 06 The Principle of Least Squares Lecture 07 Prior InformationLecture 08 Solving Generalized Least Squares Problems Lecture 09 Fourier SeriesLecture 10 Complex Fourier SeriesLecture 11 Lessons Learned from the Fourier TransformLecture 12 Power SpectraLecture 13 Filter Theory Lecture 14 Applications of Filters Lecture 15 Factor Analysis Lecture 16 Orthogonal functions Lecture 17 Covariance and AutocorrelationLecture 18 Cross-correlationLecture 19 Smoothing, Correlation and SpectraLecture 20 Coherence; Tapering and Spectral Analysis Lecture 21 InterpolationLecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-TestsLecture 24 Confidence Limits of Spectra, Bootstraps

SYLLABUS

Page 3: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

purpose of the lecture

get you started

looking critically at data

Page 4: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Objectiveswhen taking a first look at data

Understand the general character of the dataset.

Understand the general behavior of individual parameters.

Detect obvious problems with the data.

Page 5: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Tools for Looking at Datacovered in this lecture

reality checks

time plots

histograms

rate information

scatter plots

Page 6: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Black Rock Forest Temperature

I downloaded the weather station data from the International Research Institute (IRI) for Climate and Society at Lamont-Doherty Earth Observatory, which is the data center used by the Black Rock Forest Consortium for its environmental data. About 20 parameters were available, but I downloaded only hourly averages of temperature. My original file, brf_raw.txt has time in a format that I thought would be hard to work with, so I wrote a MatLab script, brf_convert.m, that converted it into time in days, and wrote the results into the file that I gave you.

Page 7: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

format conversion

calendar date/time

days from start of first year of data

sequential time variable need for data analysisbut

format conversions provide opportunity for error to creep into dataset

0100-0159 2 Jan 1997 1.042

Page 8: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Reality Checks

properties that your experience tells you that the data must have

check you expectations against the data

Page 9: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Reality ChecksWhat do you expect the data to look like?

hourly measurements

thirteen years of data

location in New York (moderate climate)

Page 10: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

take a moment ...

to sketch a plot of what you expect the data to look like

Page 11: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Reality ChecksWhat do you expect the data to look like?

hourly measurements

thirteen years of data

location in New York (moderate climate)

time increments by 1/24 day per sample

about 24*365*13 = 113880 lines of data

temperatures in the -20 to +35 deg C range

diurnal and seasonal cycles

Page 12: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Does time increment by 1/24 days per sample?

0 17.2700 0.0417 17.8500 0.0833 18.4200 0.1250 18.9400 0.1667 19.2900

1/24 = 0.0417

Yes

D(1:5,:)

Page 13: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Are there about 24*365*20 = 113880 lines of data ?

length(D)

110430

Yes

Page 14: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

temperatures in the -20 to +35 deg C range?

diurnal and seasonal cycles?

Page 15: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

annual cycle

cold spikes

hot spike

data drop-outs

-20

to +

35 r

ange

Temperatures in the -20 to +35 deg C range? MostlyDiurnal and seasonal cycles? Certainly seasonal.

Page 16: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Data Drop-outs common in datasets

the instrument wasn’t working for a while …

take two forms:

missing rows of table

data set to some default value

0

n/a

-999all common

Page 17: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

cold spike

diurnal cycle

data drop-out

50 days of data from winter 50 days of data from summer

Page 18: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Histograms

determine range of the majority of data values

quantifies the frequency of occurrence of data at different data values

easy to spot over-represented and under-represented values

Page 19: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

MatLab code for Histogram

Lh = 100; dmin = min(d); dmax = max(d); bins = dmin+(dmax-dmin)*[0:Lh-1]’/(Lh-1); dhist = hist(d, bins)’;

Page 20: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

temperature, ºC

coun

ts

Histogram of Black Rock Forest temperatures

Page 21: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

B)A)

temperature, ºC

coun

ts

Alternate ways of displaying a histogram

Page 22: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Series of histograms, each on a relatively short time interval of data

Advantage: Shows the way that the frequency of occurrence of data varies with time

Disadvantage: Each histogram is computed using less data, and so is less accurate

Moving-Window Histograms

Page 23: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

-60

0

40

tem

pera

ture

, C

0 5000time, days

Moving-Window Histogramof Black Rock Forest temperatures

Page 24: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

good use of FOR loop

offset=1000; Lw=floor(N/offset)-1; Dhist = zeros(Lh, Lw); for i = [1:Lw]; j=1+(i-1)*offset; k=j+offset-1; Dhist(:,i) = hist(d(j:k), bins)'; end

Page 25: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Rate Information

how fast a parameter is changing with time

or with distance

Page 26: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

finite-difference approximation to derivative

Page 27: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.
Page 28: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

MatLab code for derivative

N=length(d);dddt=(d(2:N)-d(1:N-1))./(t(2:N)-t(1:N-1));

Page 29: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

0 500 10000

1

2

3

4

5

6

7

8

9

10

discharge, cfs

time,

day

s

-500 0 5000

1

2

3

4

5

6

7

8

9

10

d/dt discharge, cfs / day

time,

day

s

hypothetical storm eventnote that more time has negative dd/dt

raindraining of land

Page 30: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.
Page 31: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Hypothesisrate of change in discharge

correlates with

amount of discharge

logic

a river is bigger when it has high discharge

a big river flows faster than a small river

a river that flows faster drains away water faster(might only be true after the rain has stopped)

Page 32: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

MatLab Scriptpurpose: make two separate plots, one for times of increasing discharge, one for times of decreasing dischargepos = find(dddt>0); neg = find(dddt<0); - - - plot(d(pos),dddt(pos),'k.'); - - - plot(d(neg),dddt(neg),'k.');

Page 33: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.
Page 34: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Atlantic Rock Dataset

I downloaded rock chemistry data from PetDB’s website at www.petdb.org. Their database contains chemical information about ocean floor igneous and metamorphic rocks. I extracted all samples from the Atlantic Ocean that had the following chemical species: SiO2, TiO2, Al2O3, FeOtotal, MgO, CaO, Na2O and K2O My original file, rocks_raw.txt included a description of the rock samples, their geographic location and other textual information. However, I deleted everything except the chemical data from the file, rocks.txt, so it would be easy to read into MatLab. The order of the columns is as is given above and the units are weight percent.

Page 35: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Using scatter plots to look for correlations among pairs of the eight chemical species8! / [2! (8-2!)] = 28 plots

Page 36: Environmental Data Analysis with MatLab Lecture 2: Looking at Data.

Al203

Ti02Al203

Si02

K20

Fe0

Mg0

Al203

A) B)

C) D)

four interesting scatter plot