Intro to GEOSTAT (course overview, software installation and some examples) Tomislav Hengl & Dylan E. Beaudette ISRIC — World Soil Information, Wageningen University USDA-NRCS Soil Scientist, California Soil Resource Lab GEOSTAT course, 11-17 April 2011, Canberra
92
Embed
Intro to GEOSTAT to GEOSTAT (course overview ... ' A Practical Guide to Geostatistical Mapping' ... R.I., 2009.Data Analysis and Graphics with R. Manning publications, 375 p.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Intro to GEOSTAT(course overview, software installation and
some examples)
Tomislav Hengl & Dylan E. BeaudetteISRIC — World Soil Information, Wageningen University
USDA-NRCS Soil Scientist, California Soil Resource Lab
GEOSTAT course, 11-17 April 2011, Canberra
Who is who
I Organizers:
1. Augusto Sanabria, Geospatial & Earth Monitoring DivisionGeoscience Australia
2. Alan Welsh, Centre for Mathematics & Its Applications.Australian National University
I Lecturers:
I Tomislav (Tom) Hengl, senior researcher (ISRIC, Wageningen,NL)
I Dylan E.Beaudette, post-doctoral researcher (NRCS, USA)
I Guest lectures:
1. John Maindonald, Centre for Mathematics & Its Applications(Australian National University, Canberra)
2. Graham Williams, Senior director and Chief Data Miner(Australian Taxation Office, Canberra)
GEOSTAT course, 11-17 April 2011, Canberra
Who is who
I Organizers:
1. Augusto Sanabria, Geospatial & Earth Monitoring DivisionGeoscience Australia
2. Alan Welsh, Centre for Mathematics & Its Applications.Australian National University
I Lecturers:
I Tomislav (Tom) Hengl, senior researcher (ISRIC, Wageningen,NL)
I Dylan E.Beaudette, post-doctoral researcher (NRCS, USA)
I Guest lectures:
1. John Maindonald, Centre for Mathematics & Its Applications(Australian National University, Canberra)
2. Graham Williams, Senior director and Chief Data Miner(Australian Taxation Office, Canberra)
GEOSTAT course, 11-17 April 2011, Canberra
Who is who
I Organizers:
1. Augusto Sanabria, Geospatial & Earth Monitoring DivisionGeoscience Australia
2. Alan Welsh, Centre for Mathematics & Its Applications.Australian National University
I Lecturers:
I Tomislav (Tom) Hengl, senior researcher (ISRIC, Wageningen,NL)
I Dylan E.Beaudette, post-doctoral researcher (NRCS, USA)
I Guest lectures:
1. John Maindonald, Centre for Mathematics & Its Applications(Australian National University, Canberra)
2. Graham Williams, Senior director and Chief Data Miner(Australian Taxation Office, Canberra)
GEOSTAT course, 11-17 April 2011, Canberra
Who is who
I Organizers:
1. Augusto Sanabria, Geospatial & Earth Monitoring DivisionGeoscience Australia
2. Alan Welsh, Centre for Mathematics & Its Applications.Australian National University
I Lecturers:
I Tomislav (Tom) Hengl, senior researcher (ISRIC, Wageningen,NL)
I Dylan E.Beaudette, post-doctoral researcher (NRCS, USA)
I Guest lectures:
1. John Maindonald, Centre for Mathematics & Its Applications(Australian National University, Canberra)
2. Graham Williams, Senior director and Chief Data Miner(Australian Taxation Office, Canberra)
GEOSTAT course, 11-17 April 2011, Canberra
Who is who
I Organizers:
1. Augusto Sanabria, Geospatial & Earth Monitoring DivisionGeoscience Australia
2. Alan Welsh, Centre for Mathematics & Its Applications.Australian National University
I Lecturers:I Tomislav (Tom) Hengl, senior researcher (ISRIC, Wageningen,
NL)
I Dylan E.Beaudette, post-doctoral researcher (NRCS, USA)
I Guest lectures:
1. John Maindonald, Centre for Mathematics & Its Applications(Australian National University, Canberra)
2. Graham Williams, Senior director and Chief Data Miner(Australian Taxation Office, Canberra)
GEOSTAT course, 11-17 April 2011, Canberra
Who is who
I Organizers:
1. Augusto Sanabria, Geospatial & Earth Monitoring DivisionGeoscience Australia
2. Alan Welsh, Centre for Mathematics & Its Applications.Australian National University
I Lecturers:I Tomislav (Tom) Hengl, senior researcher (ISRIC, Wageningen,
NL)I Dylan E.Beaudette, post-doctoral researcher (NRCS, USA)
I Guest lectures:
1. John Maindonald, Centre for Mathematics & Its Applications(Australian National University, Canberra)
2. Graham Williams, Senior director and Chief Data Miner(Australian Taxation Office, Canberra)
GEOSTAT course, 11-17 April 2011, Canberra
Who is who
I Organizers:
1. Augusto Sanabria, Geospatial & Earth Monitoring DivisionGeoscience Australia
2. Alan Welsh, Centre for Mathematics & Its Applications.Australian National University
I Lecturers:I Tomislav (Tom) Hengl, senior researcher (ISRIC, Wageningen,
NL)I Dylan E.Beaudette, post-doctoral researcher (NRCS, USA)
I Guest lectures:
1. John Maindonald, Centre for Mathematics & Its Applications(Australian National University, Canberra)
2. Graham Williams, Senior director and Chief Data Miner(Australian Taxation Office, Canberra)
GEOSTAT course, 11-17 April 2011, Canberra
Who is who
I Organizers:
1. Augusto Sanabria, Geospatial & Earth Monitoring DivisionGeoscience Australia
2. Alan Welsh, Centre for Mathematics & Its Applications.Australian National University
I Lecturers:I Tomislav (Tom) Hengl, senior researcher (ISRIC, Wageningen,
NL)I Dylan E.Beaudette, post-doctoral researcher (NRCS, USA)
I Guest lectures:
1. John Maindonald, Centre for Mathematics & Its Applications(Australian National University, Canberra)
2. Graham Williams, Senior director and Chief Data Miner(Australian Taxation Office, Canberra)
GEOSTAT course, 11-17 April 2011, Canberra
Who is who
I Organizers:
1. Augusto Sanabria, Geospatial & Earth Monitoring DivisionGeoscience Australia
2. Alan Welsh, Centre for Mathematics & Its Applications.Australian National University
I Lecturers:I Tomislav (Tom) Hengl, senior researcher (ISRIC, Wageningen,
NL)I Dylan E.Beaudette, post-doctoral researcher (NRCS, USA)
I Guest lectures:
1. John Maindonald, Centre for Mathematics & Its Applications(Australian National University, Canberra)
2. Graham Williams, Senior director and Chief Data Miner(Australian Taxation Office, Canberra)
A “crash course” is a compressed version of a training course forpeople that already have full agendas. It is really intended for
people who pick up new methods/tools quickly and have ahigh motivation to learn (PhD students?). This means no longquestions, no going back, no deep discussion, no complains aboutthe speed/programme of the course. . . and of course: no promises
that you will manage to master these tools in such a short time.
GEOSTAT course, 11-17 April 2011, Canberra
Types of R courses
I You loose time, we loose time.
I You loose time (we practice teaching).
I We loose time.
I You run a similar course in 1–2 years.
GEOSTAT course, 11-17 April 2011, Canberra
Types of R courses
I You loose time, we loose time.
I You loose time (we practice teaching).
I We loose time.
I You run a similar course in 1–2 years.
GEOSTAT course, 11-17 April 2011, Canberra
Types of R courses
I You loose time, we loose time.
I You loose time (we practice teaching).
I We loose time.
I You run a similar course in 1–2 years.
GEOSTAT course, 11-17 April 2011, Canberra
Types of R courses
I You loose time, we loose time.
I You loose time (we practice teaching).
I We loose time.
I You run a similar course in 1–2 years.
GEOSTAT course, 11-17 April 2011, Canberra
Did you do your homework?
Kabacoff, R.I., 2009. Data Analysis and Graphics with R.Manning publications, 375 p.
Hengl, T. 2009. A Practical Guide for Geostatistical Mapping.University of Amsterdam, (lulu.com).
Beaudette, D., 2009. Open Source Software Tools for SoilScientists. University of California at Davis.
“To build a better world we need to replace thepatchwork of lucky breaks and arbitrary advantages thattoday determine success – the fortunate birth dates andthe happy accidents of history – with a society thatprovides opportunities for all.”
Malcom Gladwell in “Outliers”.
GEOSTAT course, 11-17 April 2011, Canberra
FOSS and academic work
GEOSTAT course, 11-17 April 2011, Canberra
What is R?
I the open source implementation of the S language forstatistical computing created by Ross Ihaka and RobertGentleman (now maintained by the R Development CoreTeam);
I why R?: the name “R” was selected for two reasons: (1)precedence —“R”is a letter before“S”, and (2) coincidence —both of the creators’ names start with a letter “R”;
I it is a computer language developed to simplify statisticalcomputing/programming;
I widely recognized as one of the fastest growing and mostcomprehensive statistical computing tools;
I the open source implementation of the S language forstatistical computing created by Ross Ihaka and RobertGentleman (now maintained by the R Development CoreTeam);
I why R?: the name “R” was selected for two reasons: (1)precedence —“R”is a letter before“S”, and (2) coincidence —both of the creators’ names start with a letter “R”;
I it is a computer language developed to simplify statisticalcomputing/programming;
I widely recognized as one of the fastest growing and mostcomprehensive statistical computing tools;
I the open source implementation of the S language forstatistical computing created by Ross Ihaka and RobertGentleman (now maintained by the R Development CoreTeam);
I why R?: the name “R” was selected for two reasons: (1)precedence —“R”is a letter before“S”, and (2) coincidence —both of the creators’ names start with a letter “R”;
I it is a computer language developed to simplify statisticalcomputing/programming;
I widely recognized as one of the fastest growing and mostcomprehensive statistical computing tools;
I the open source implementation of the S language forstatistical computing created by Ross Ihaka and RobertGentleman (now maintained by the R Development CoreTeam);
I why R?: the name “R” was selected for two reasons: (1)precedence —“R”is a letter before“S”, and (2) coincidence —both of the creators’ names start with a letter “R”;
I it is a computer language developed to simplify statisticalcomputing/programming;
I widely recognized as one of the fastest growing and mostcomprehensive statistical computing tools;
I the open source implementation of the S language forstatistical computing created by Ross Ihaka and RobertGentleman (now maintained by the R Development CoreTeam);
I why R?: the name “R” was selected for two reasons: (1)precedence —“R”is a letter before“S”, and (2) coincidence —both of the creators’ names start with a letter “R”;
I it is a computer language developed to simplify statisticalcomputing/programming;
I widely recognized as one of the fastest growing and mostcomprehensive statistical computing tools;
Approximate Dates 1990-94 1994-97 1997-Recruitment some student participation demonstrated interest semi-purposive, by invitationDivision of labour none developing semi-formalHierarchy none original developers, contributors differential participationPrincipal Mode of Cooperation direct collaboration anarchic voluntarism partly distinct roles + voluntarismPlanning none implicit partialDecision-Making joint individual modified consensusResolution of Disagreements discussion largely unnecessary discussion, preemption, avoidanceprincipal goal personal development reproduce and improve S various, partly conflicting
Table 1: Stages in the development of the R Project.
Num
ber
of C
RA
N P
acka
ges
2001
−06
−21
2001
−12
−17
2002
−06
−12
2003
−05
−27
2003
−11
−16
2004
−06
−05
2004
−10
−12
2005
−06
−18
2005
−12
−16
2006
−05
−31
2006
−12
−12
2007
−04
−12
2007
−11
−16
2008
−03
−18
2008
−10
−18
2009
−09
−17
100
200
300
400
500600
800
10001200
1500
2000
1.3
1.4
1.5
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
110129
162
219
273
357406
548647739
9111000
1300142716141952
Date
R Version
●
●
●
●
●
●●
●
●●
●●
●●
●
●
−0.
20−
0.15
−0.
10−
0.05
0.00
0.05
0.10
Date
Res
idua
ls
2002 2004 2006 2008 2010
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
Figure 3: The number of packages on CRAN (left panel) has grown roughly exponentially, with residu-als from the exponential trend (right panel) showing a recent decline in the rate of growth. The numberof packages for R version 1.6 is not shown because the count was taken only two days after that for ver-sion 1.5, and therefore indicated just one additional package. (An earlier version of the graph in the leftpanel appeared in Fox, 2008.) Sources of data: https://svn.r-project.org/R/branches/ and (for version 2.9)http://cran.r-project.org/web/checks/check_summary.html.
The R Journal Vol. 1/2, December 2009 ISSN 2073-4859
I Roger Bivand: “Because S (and its implementation R) is awell-developed, simple and effective programming languagewhich includes conditionals, loops, user-defined recursivefunctions and input and output facilities, existing functionscan be modified.” This is what is referred to as statisticalprogramming — in R we all become programmers (but muchfaster than with C++ or Java).
I The basic approach to using R is to generate scripts thatdefine the data processing steps (workflows?).
I Documenting the analysis process is a “good thing”, soprogramming scripts are not just a burden, certainly for usersdoing original research and repetitive work, arguably forstudent classes too.
I Point-and-click operations are for little children!
GEOSTAT course, 11-17 April 2011, Canberra
Why make scripts?
I Roger Bivand: “Because S (and its implementation R) is awell-developed, simple and effective programming languagewhich includes conditionals, loops, user-defined recursivefunctions and input and output facilities, existing functionscan be modified.” This is what is referred to as statisticalprogramming — in R we all become programmers (but muchfaster than with C++ or Java).
I The basic approach to using R is to generate scripts thatdefine the data processing steps (workflows?).
I Documenting the analysis process is a “good thing”, soprogramming scripts are not just a burden, certainly for usersdoing original research and repetitive work, arguably forstudent classes too.
I Point-and-click operations are for little children!
GEOSTAT course, 11-17 April 2011, Canberra
Why make scripts?
I Roger Bivand: “Because S (and its implementation R) is awell-developed, simple and effective programming languagewhich includes conditionals, loops, user-defined recursivefunctions and input and output facilities, existing functionscan be modified.” This is what is referred to as statisticalprogramming — in R we all become programmers (but muchfaster than with C++ or Java).
I The basic approach to using R is to generate scripts thatdefine the data processing steps (workflows?).
I Documenting the analysis process is a “good thing”, soprogramming scripts are not just a burden, certainly for usersdoing original research and repetitive work, arguably forstudent classes too.
I Point-and-click operations are for little children!
GEOSTAT course, 11-17 April 2011, Canberra
Why make scripts?
I Roger Bivand: “Because S (and its implementation R) is awell-developed, simple and effective programming languagewhich includes conditionals, loops, user-defined recursivefunctions and input and output facilities, existing functionscan be modified.” This is what is referred to as statisticalprogramming — in R we all become programmers (but muchfaster than with C++ or Java).
I The basic approach to using R is to generate scripts thatdefine the data processing steps (workflows?).
I Documenting the analysis process is a “good thing”, soprogramming scripts are not just a burden, certainly for usersdoing original research and repetitive work, arguably forstudent classes too.
I Point-and-click operations are for little children!
GEOSTAT course, 11-17 April 2011, Canberra
Do you speak R?
After some time you basically discovered that most of the thingsyou want to do, you can do in R, the only question is how.
Well, first, you have to learn to how to speak’n’write R.
GEOSTAT course, 11-17 April 2011, Canberra
Some important facts
I R was first time released in 1997;
I majority of the development is (still) done by prof.Brian D.Ripley;
I a the moment, there are +2000 contributed packages!
I according to Google trends, R-project.org has a communityof about 200–350k active users;
I in 2003, a group of researchers (International Workshop onDistributed Statistical Computing) decided to add spatialfunctionality to R;
I . . . now is time to use it more broadly (MSc level and PhDlevel modules, projects, reports and scientific documents);
“Once methodological problems start being perceived oreven defined in terms of what one’s favorite softwaredoes well, then the software has stopped being a tool,and has become a crutch, and at worse a shackle.”