Top Banner
Data Standards Workflow Raw data Scripts Database Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web Extract Transform Load Charts & Maps Tools and websites Provide Add meta information Script to convert raw data into netcdf OpenEarth RawData OpenEarth OPeNDAP OpenEarth Tools
42

Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Dec 14, 2015

Download

Documents

Melvin Biglin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Data Standards Workflow

Raw data Scripts Database

Store raw data in subversion to

keep track of history

Stored files (netcdf)

accessible through the web

Extract Transform Load

Charts & Maps

Tools and websites

Provide

Add meta information

Script to convert raw data into

netcdf

OpenEarthRawData

OpenEarth

OPeNDAP

OpenEarthTools

Page 2: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Data Standards Workflow

Raw data Scripts Database

Store raw data in subversion to

keep track of history

Stored files (netcdf)

accessible through the web

Extract Transform Load

Charts & Maps

Tools and websites

Provide

Add meta information

Script to convert raw data into

netcdf

OpenEarthRawData

OpenEarth

OPeNDAP

OpenEarthTools

Page 3: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Transform

• Add metadata• Store in netcdf• Save script in subversion

Page 4: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Add metadata

• Use the inspire meta data form to store information about the dataset.• http://www.inspire-geoportal.eu/inspireEditor.htm• Click launch editor

Transform

Page 5: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Turn validation on

Transform – add metadata

validation

Page 6: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Location in subversion

micore

File identification

Transform – add metadata

Page 7: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

History of your data.

Transform – add metadata

quality

Page 8: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Please fill in limitations of use.

Transform – add metadata

constraints

Page 9: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Store in course/Pcnumber/inspire_description.xml

Transform – add metadata

Save metadata file1. Save metadata file (local)2. Add to subversion (local)3. Commit => metadata into subversion (remote)

Page 10: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Transform

• Add metadata• Store in netcdf• Save script in subversion

Page 11: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Store in netcdf

• What’s netcdf?• Write a script to transform data into netcdf• Using CF convention

Transform

Page 12: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

What is netcdf

• Data format defined by unidata• Data store used for coverage data and

multidimensional data• CF Metadata convention

Transform – store in netcdf - netcdf

Page 13: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

What is netcdf

XX

ZZ

TT

YY

• An array based data structure for storing multidimensional data

• N-dimensional coordinates systems• X coordinate (e.g. longitude)• Y coordinate (e.g. latitude)• Z coordinate (e.g. altitude)• Time dimension• … other dimensions

• Variables – support for multiple variables• Temperature, humidity, pressure, salinity, etc

• Geometry – implicit or explicit• Regular grid (implicit)• Irregular grid• Points

TransformTransform – store in netcdf - netcdf

Page 14: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Storing Multidimensional Data

X Y Z Q

1 1 1 0.5

1 1 2 0.3

1 2 1 0.6

1 2 2 0.1

2 1 1 0.4

2 1 2 0.2

2 2 1 0.9

2 2 2 0.3

0.5 0.4

0.6 0.9

0.3 0.2

0.1 0.3

1 2

1

2

1

2

X Y Z

32 numbers

14 numbers

Transform – store in netcdf - netcdf

Page 15: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Data Model

Data model for netcdf and others.

Also usable for hdf, opendap, grib, etc. See the java library for details

Data model for netcdf and others.

Also usable for hdf, opendap, grib, etc. See the java library for details

Transform – store in netcdf - netcdf

Page 16: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

ArcGis

ArcGis also reads and writes netcdf files.

ArcGis also reads and writes netcdf files.

Transform – store in netcdf – netcdf - applications

Page 17: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Your favorite text editor

xml representation of a netcdf file

xml representation of a netcdf file

Transform – store in netcdf - netcdf

Page 18: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Other Tools

NCO#diffncdiff -v time file1.nc file2.nc#compression & packingncpdq -4 -L 9 in.nc out.nc # Deflated packing (~80% lossy compression)#selecting variables by regexncks -v '^Q..' in.nc # Q01--Q99, QAA--QZZ, etc.

IDVVery usefulVery useful

Web hyperslabs, cool!Web hyperslabs, cool!

Not so stable.Not so stable.

Transform – store in netcdf - netcdf

Page 19: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Data Standards Workflow

Raw data Scripts Database

Store raw data in subversion to

keep track of history

Stored files (netcdf)

accessible through the web

Extract Transform Load

Charts & Maps

Tools and websites

Provide

Add meta information

Script to convert raw data into

netcdf

OpenEarthRawData

OpenEarth

OPeNDAP

OpenEarthTools

Page 20: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Store in netcdf

• What’s netcdf?• Write a script to transform data into netcdf• Using CF convention

Transform – store in netcdf - script

Page 21: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Write script

• Read raw data• Read header line• Read data• Read all data• Create function to read all data• Use function in Matlab

• Raw data into empty netcdf file• Create empty netcdf file• Add dimensions and variables• Store variables

• Read values

Transform – store in netcdf - script

Page 22: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Reading raw data into memory

• Use one of the following matlab functions to read the file data into an array• fscanf

Transform – store in netcdf - script

Page 23: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Example: Transect.txt file

1999 58 -135 3531 -130 3541 -125 3631 -120 4171 -115 6221 -110 8231 -105 9841 -100 10971 -95 12171 -90 12951… 200 -2415 210 -2995 220 -3595 99999999999 99999999999 2000 58 -135 3531 -130 3541 -125 3631 -120 4171 -115 6221 -110 8231 -105 9841 -100 10971 -95 12171 -90 12951

Header lineYear

number of points

PointsX Z X Z …. 9999999

Location: OpenEarthRawData\course\example\raw

Transform – store in netcdf - script

Page 24: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Read header line

>> fid = fopen('..\raw\transect.txt')fid = 15

>> header = fscanf(fid, '%d', 2)header = 2000 58

>> year = header(1)year = 2000

>> npoint = header(2)npoint = 58

Transform – store in netcdf - script

Page 25: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

% read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data';

Read data>> % read datadata = fscanf(fid, '%d', npoint*2)

data = -150 3741 -140 3581 -135

>> data = reshape(data, [2, npoint])

data = Columns 1 through 7

-150 -140 -135 -130 3741 3581 3531 3541

1

2

>> % use column vectorsdata = data'

data = -150 3741 -140 3581 -135 3531

3

Transform – store in netcdf - script

Page 26: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Read all data% preallocate all data % (time, coastward)transectseries = NaN(3, 58);coastward_distance = NaN(58, 1);time = NaN(3, 1);% open file and get file idfid = fopen('..\raw\transect.txt');i = 1;while (~feof(fid)) % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data' % store data in transect series transectseries(i,:) = data(:,2); coastward_distance(:) = data(:,1); fgetl(fid); i = i + 1;end

Transform – store in netcdf - script

Page 27: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Create a functionfunction transect = readtransect(filename)% preallocate all data % (time, coastward)transectseries = NaN(3, 58);coastward_distance = NaN(58, 1);time = NaN(3, 1);% open file and get file idfid = fopen(filename);i = 1;while (~feof(fid)) % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data'; % store data in transect series transectseries(i,:) = data(:,2); coastward_distance(:) = data(:,1); fgetl(fid); i = i + 1;endtransect = struct('series', transectseries, … 'distance', coastward_distance, 'time', time);end

Transform – store in netcdf - script

Page 28: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Use the new function

>> data = readtransect('..\raw\transect.txt')

data =

series: [3x58 double] distance: [58x1 double] time: [3x1 double]

Transform – store in netcdf - script

Page 29: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Loading data into netcdf

• What does a netcdf file look like• Required meta information

Transform – store in netcdf - script

Page 30: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Netcdf filetransect.ncnetcdf transect {dimensions: coastward = 58 ; time = 3 ;variables: float coastward_distance(coastward) ; coastward_distance:unit = "metre" ; float year(time) ; year:unit = "year" ; float height(time, coastward) ; height:unit = "metre" ;data:

coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200, 210, 220 ; year = 1999, 2000, 2001 ; height = 353, 354, … -142, -146, -170, -206, -232, -273, -309, -346, -375, -388, … -32, … -92, -110, -127, -143, -156, -177, -211, -259, -303, -334 ;}

Transform – store in netcdf - script

Page 31: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Create an empty netcdf file

>> nc_create_empty(outputfile)>> nc_dump(outputfile)netcdf transect.nc {

dimensions:

variables:

}

Transform – store in netcdf - script

Page 32: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Add dimensions

nc_add_dimension(outputfile, 'crossshore', 58)nc_add_dimension(outputfile, 'time', 3)nc_dump(outputfile)>>netcdf transect.nc {

dimensions:coastward = 58 ;time = 3 ;

variables:}

help nc_add_dimension

Transform – store in netcdf - script

Page 33: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Add variablescrossshoreVariable = struct(... 'Name', 'crossshore_distance', ... 'Nctype', 'float', ... 'Dimension', {{‘crossshore'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'metre') ... );nc_addvar(outputfile, crossshoreVariable);timeVariable = struct(... 'Name', 'year', ... 'Nctype', 'float', ... 'Dimension', {{'time'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'year') ... );nc_addvar(outputfile, timeVariable);heightVariable = struct(... 'Name', 'height', ... 'Nctype', 'float', ... 'Dimension', {{'time', ‘crossshore'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'metre') ... );nc_addvar(outputfile, heightVariable);nc_dump(outputfile)

help nc_addvar

Transform – store in netcdf - script

Page 34: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Result

netcdf transect.nc {

dimensions:coastward = 58 ;time = 3 ;

variables:float coastward_distance(coastward), shape = [58]

coastward_distance:unit = "metre" float year(time), shape = [3]

year:unit = "year" float height(time,coastward), shape = [3 58]

height:unit = "metre"

}

Transform – store in netcdf - script

Page 35: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Store variables

nc_varput(outputfile, 'height', data.series)nc_varput(outputfile, 'year', data.time)nc_varput(outputfile, 'coastward_distance', data.distance)

help nc_varput

Transform – store in netcdf - script

Page 36: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Result: Netcdf filetransect.ncnetcdf transect {dimensions: coastward = 58 ; time = 3 ;variables: float coastward_distance(coastward) ; coastward_distance:unit = "metre" ; float year(time) ; year:unit = "year" ; float height(time, coastward) ; height:unit = "metre" ;data:

coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200, 210, 220 ; year = 1999, 2000, 2001 ; height = 353, 354, … -142, -146, -170, -206, -232, -273, -309, -346, -375, -388, … -32, … -92, -110, -127, -143, -156, -177, -211, -259, -303, -334 ;}

Transform – store in netcdf - script

Page 37: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Read values

surface(nc_varget(outputfile, 'height')')

11.5

22.5

3

020

4060

-5000

0

5000

10000

15000

Transform – store in netcdf - script

Page 38: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Store in netcdf

• What’s netcdf?• Write a script to transform data into netcdf• Using CF convention

Transform – store in netcdf - convention

Page 39: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

CF convention

Standard used by USGS, NOAA, Arcgis, GDAL

Climate and Forecast (CF) Conventionhttp://www.unidata.ucar.edu/software/netcdf/docs/conventions.html

Initially developed for• Climate and forecast data• Atmosphere, surface and ocean model-generated data• Also used for observational datasets• CF is the most widely used convention for geospatial netCDF

data.

Transform – store in netcdf - convention

Page 40: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Improve output

• Store extra attributes• Title• Author• Standard_name

Transform – store in netcdf - convention

Page 41: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Transform

• Add metadata• Store in netcdf• Save script in subversion

Page 42: Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web.

Transform – save script

Save script1. Save script (local, using matlab

https://repos.deltares.nl/repos/OpenEarthRawData/course/PCnr/scipts/)2. Add to subversion (local)3. Commit => script into subversion (remote)