Environmental Data Analysis with MatLab
Dec 18, 2015
Environmental Data Analysis with MatLab
Goals
Make you comfortable with the analysis of numerical data through practice
Teach you a set of widely-applicable data analysis techniques
Provide the strategies for applying what you’ve learned to your own datasets
software
MatLab
available on-line
Lecture 01 Using MatLabLecture 02 Looking At DataLecture 03 Probability and Measurement ErrorLecture 04 Multivariate DistributionsLecture 05 Linear ModelsLecture 06 The Principle of Least SquaresLecture 07 Prior InformationLecture 08 Solving Generalized Least Squares Problems Lecture 09 Fourier SeriesLecture 10 Complex Fourier SeriesLecture 11 Lessons Learned from the Fourier TransformLecture 12 Power SpectraLecture 13 Filter Theory Lecture 14 Applications of Filters Lecture 15 Factor Analysis Lecture 16 Orthogonal functions Lecture 17 Covariance and AutocorrelationLecture 18 Cross-correlationLecture 19 Smoothing, Correlation and SpectraLecture 20 Coherence; Tapering and Spectral Analysis Lecture 21 InterpolationLecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-TestsLecture 24 Confidence Limits of Spectra, Bootstraps
SYLLABUS
Today’s Lecture
Part 1: Starting to Look at Data
Part 2: Using MatLab
Part 1
Starting to Look at Data
advice
even before looking at the dataarticulate the properties
that you expect them to haveand then critically examine them
in light of your expectations
Case Study: Stream flow of the Hudson River
AlbanyWatershed:
14,000 sq mi(36,260 km2)
source: Wikipedia
Hudson River
discharge
amount of water per unit time
that passes a specific point on the river bank
measured in m3/s
What properties would you expectdischarge to have?
What properties would you expectdischarge to have?
water flows in one direction – down hill
discharge positive
stream flow fairly steady over minutes hours
more variable over days and weeks
stream flow increases after a period of rain
What about the typical size of discharge?
What about the typical size of discharge?
hw
v
slab of water of volumew×h×vflows by per unit time
What about the typical size of discharge?
10 m100 m
1 m/s
discharge = w×h×v = 1000 m3/s
What might a plot of discharge vs. time look like?
Try to sketch one.
Include units on both axesImagine that there’s a few days of rain during the time period of your sketch
actual discharge for Hudson River at Albany
(time in days after Jan 1, 2002)
What properties would you expectprecipitation to have?
precipitation is a positive quantity
time scale of rain very short – minutes to hours
rainy days and dry days
heavy rain is a few inches of rain in a day
actual precipitation in Albany NY
do the graphs meet your expectations?
pattern of peaks similar but not exact
highest discharge
highest precipitation
Why?
pattern of peaks similar but not exact
highest discharge
highest precipitation
Why? Rain at Albany an imperfect proxy for rain
over watershed
shape of peaks different
longer pulse with steep riseand slow decline
short pulse
Why?
shape of peaks different
longer pulse with steep riseand slow decline
short pulse
Why? Rain takes time to drain from the land
predict dischargefrom precipitation
predict dischargefrom precipitation
rain takes time to flowfrom the land
to the river
predict dischargefrom precipitation
rain takes time to flowfrom the land
to the river
the discharge todaydepends upon
precipitation over the last few days
now for an example ofof advanced data analysis method
(which we will eventually get to in this course)which help explore this relationship
its mathematical expression:discharge d is
a running averageof precipitation p
physical idea:discharge is delayed
since rainwater takes time to flowfrom the land to the river
present and past days
for that dayp
for that day
dischargesum precipitation
todayd
weightsin the running
average
exampled5 = w1p5 + w2p4 + w3p3 ...
discharge on day iprecipitation in
the past
weights
so the details behind the idea thatrainwater takes time to drain from the land
are captured by the weights
w1 w2 w3 w4 ...
only recent precipitation affects discharge
weights decline exponentially with time in the past
weights determined by trial and error
T1 T2
c
time j
Part 2:Using MatLab
purpose of the lecture
get you started using
MatLab
as a tool for analyzing data
MatLab Fundamentals
Place where you type commands and MatLab displays answers and other information. For example:
>> date
ans =
22-Mar-2011
prompt
The Command Window
you type this
MatLab replies with this
Files and Folders
Provide a way to organize your data and data analysis products
- use meaningful and predictable names
- design a folder hierarchy that helps you keep track of things
main folder chapter folders
. . .
chapter files and section folders
section files
eda ch01
ch02
vch03
v
file
file. . .
. . .
. . .
sec02_01 file
file
file
Example:file/folder structure used by text
Commands for Navigating Folders
pwd
cd c:/menke/docs/eda/ch01
cd ..
cd ch01
dir
displays current folder
change to a folder in a specific place
change to the parent folder
change to the named folder that within the current one
display all the files and folders in the current folder
Simple Arithmetic
a=3.5; b=4.1; c=a+b; c
c =
7.6000
you type this
MatLab replies with this
A more complicated formula
you type this
MatLab replies with this
a=3; b=4; c = sqrt(a^2 + b^2); c
c =
5
Another complicated formula
you type this
MatLab replies with this
n=2; x=3; x0=1; L=5; c = sin(n*pi*(x-x0)/L); c
c =
0.5878
MatLab ScriptCommands stored in a file with an extension ‘.m’
(an m-file) that can be run as a unit.
Advantages- Speeds up repetitive tasks
- Can be checked over for correctness
- Documents what you did
Disadvantages:
- Hides what’s actually going on.
Example of a MatLab Script
% eda01_03% example of simple algebra,% c=a+b with a=3.5 and b=4.1 a=3.5;b=4.1;c=a+b;c
in m-file eda01_03.m
comm
ents
>> eda01_03
c =
7.6000
Running a MatLab Script
you type this
MatLab replies with this
Vectors and Matrices
r = [2, 4, 6]; c = [1, 3, 5]’; M =[ [1, 4, 7]', [2, 5, 8]', [3, 6, 9]'];
Transpose Operator
Swap rows and columns of an array, so that
Standard mathematical notation: aT
MatLab notation: a’
1234
becomes [ 1, 2, 3, 4 ] (and vice versa)
Vector Multiplication
Let’s define some vectors and matrices
a = [1, 3, 5]’; c = [3, 4, 5]’; M =[ [1, 0, 2]', [0, 1, 0]', [2, 0, 1]'];N =[ [1, 0,-1]', [0, 2, 0]', [-1,0, 3]'];
Inner (or Dot) Product
s = a'*b;
Outer (or Tensor) Product
T = a*b’;
Product of a Matrix and a Vector
c = M*a;
Product of a Matrix and a Matrix
P = M*N;
Element Access
s = a(2); t = M(2,3); b = M(:,2);
Element Access
c = M(2,:)'; T = M(2:3,2:3);
LoopingA loop is a mechanism to repeat a group of commands
several times, each time with a different value of a variable.
Generally speaking, MatLab vector arithmetic is rich enough that loops usually can be avoided.
However, some people – and especially beginners - find loops to be clearer than the equivalent non-loop commands. If you’re one of them, USE LOOPS, at least at the start.
Example of a FOR Loop
a=[1, 2, 3, 4, 3, 2, 1]’;b=[3, 2, 1, 0, 1, 2, 3]’;N=length(a);
Dot product using vector arithmetic
c = a’*b;
Dot product using loop
c = 0;for i = [1:N]
c = c+a(i)*b(i);end
Another Example of a FOR Loop
without looping
N = fliplr(M);
with looping
for i = [1:3] for j = [1:3] N(i,4-j) = M(i,j); end end
Matrix Inverse
B = inv(A);
Slash and Backslash Operators
c = A\b; D = B/A;
Loading Data Files
I downloaded stream flow data from the US Geological Survey’s National Water Information Center for the Neuse River near Goldboro NC for the time period, 01/01/1974-12/31/1985. These data are in the file, neuse.txt. It contains two columns of data, time (in days starting on January 1, 1974) and discharge (in cubic feet per second, cfs). The data set contains 4383 rows of data. I also saved information about the data in the file neuse_header.txt.
A text file of tabular data is very easy to load into MatLab
D = load(‘neuse.txt’); t = D(:,1); d = D(:,2);
A Simple Plot of Dataplot(t,d);
set(gca,'LineWidth',2);
plot(t,d,'k-','LineWidth',2);
title('Neuse River Hydrograph'); xlabel('time in days');
ylabel('discharge in cfs');
A Somewhat Better Controlled Plot
make the axes thicker
plot black lines of width 2
title at top of figure
label x axis
label y axis
Writing a Data File
f=35.3146; dm = d/f; Dm(:,1)=t; Dm(:,2)=dm; dlmwrite(‘neuse_metric.txt’,Dm,’\t’);
example: convert cfs to m3
Finding and Using Documentation
MatLab Web Site is one place that your can get a description of syntax, functions, etc.
Can be very useful in finding exactly what you want if you’ve only found something close to what you want!
Example 1: the LENGTH command
. . .(two more pages below)
Example 2: the SUM commandSome commands have long, complicated explanations. But that’s because they can be applied to very complicated data objects. Their application to a vector is usually short and sweet.
Scripting Advice
#1
Think about what you want to do before starting to type in code!
Block out on a piece of scratch paper the necessary steps
Without some forethought, you can code for a hour, and then realize that what you’re doing makes no sense at all.
#2
Sure, cannibalize a program to make a new one …
But keep a copy of the old one …
And make sure the names are sufficiently different that you won’t confuse the two …
#3Be consistent in the use of variable names
amin, bmin, cmin, minx, miny, minz
Don’t use variable names that can be too easily confused, e.g xmin and minx.
(Especially important because it can interact disastrously with MatLab automatic creation of variables. A misspelled variable becomes a new variable).
guaranteed to cause trouble
#4
Build code in small section, and test each section thoroughly before going in to the next.
Make lots of plots to check that vectors look sensible.
#5
Test code on smallish simple datasets before running it on a large complicated dataset
Build test datasets with known properties. Test whether your code gives the right answer!
#6
Don’t be too clever!
Inscrutable code is very prone to error.