1 Data Quality and Error Analysis in GIS Joshua Greenfeld, PhD, LS Professor emeritus, NJIT Professor, Israel Institute of Technology Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 1 ABSTRACT One of the major challenges of GIS is dealing with the uncertainty and the assessment of the quality of spatial information. The challenge is to assess the quality of spatial information not just the quality of spatial data. Many professionals are involved in providing GIS services. Surveying is only one of them. For surveying to make a mark on the GIS industry and become a prominent stake holder of GIS, it has to offer some expertise that most other professionals cannot. Unfortunately, the ability to collect spatial data is becoming a common skill and the surveyors positioning expertise is not as unique as it used to be. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 2 ABSTRACT There is one area that surveyors have an advantage over other GIS professionals is their propensity and ability to understand and quantify spatial errors and accuracies. In surveying, the uncertainty and quality assessment is mostly confined to positioning or positional accuracies. The quality of surveying results is typically assessed on the basis of measurement accuracy and the propagation of these accuracies into other computed quantities. In GIS uncertainty and quality issues are much more broad. In addition to positional accuracy there is: attribute accuracy, completeness of the data, sources and lineage of the data, logical consistency, fuzziness of the spatial phenomenon, currency of the data and other uncertainty issues. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 3 Objective The objective of this seminar is to enable surveyors to understand the broader issues of accuracy assessment beyond positional accuracies. It will outline the extended definition of uncertainty and quality as it applies to GIS. It will include an overview on the errors and uncertainties that could impact the quality of spatial data. This will be followed by discussing the impact of errors in spatial data on spatial information. The ISO geospatial standards will be reviewed as well. Finally, some practical tools and examples of numerical and statistical assessment of uncertainty and quality of spatial information will be discussed and demonstrated. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 4
30
Embed
ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Data Quality
and
Error Analysis in GIS
Joshua Greenfeld, PhD, LS Professor emeritus, NJIT
Professor, Israel Institute of Technology
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 1
ABSTRACT
One of the major challenges of GIS is dealing with the
uncertainty and the assessment of the quality of spatial
information.
The challenge is to assess the quality of spatial
information not just the quality of spatial data.
Many professionals are involved in providing GIS
services. Surveying is only one of them.
For surveying to make a mark on the GIS industry and
become a prominent stake holder of GIS, it has to offer
some expertise that most other professionals cannot.
Unfortunately, the ability to collect spatial data is becoming
a common skill and the surveyors positioning expertise is
not as unique as it used to be. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 2
ABSTRACT
There is one area that surveyors have an advantage over
other GIS professionals is their propensity and ability to
understand and quantify spatial errors and accuracies.
In surveying, the uncertainty and quality assessment is
mostly confined to positioning or positional accuracies.
The quality of surveying results is typically assessed on the
basis of measurement accuracy and the propagation of
these accuracies into other computed quantities.
In GIS uncertainty and quality issues are much more
broad. In addition to positional accuracy there is:
attribute accuracy, completeness of the data, sources and
lineage of the data, logical consistency, fuzziness of the
spatial phenomenon, currency of the data and other
uncertainty issues. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 3
Objective
The objective of this seminar is to enable surveyors to
understand the broader issues of accuracy assessment
beyond positional accuracies.
It will outline the extended definition of uncertainty and
quality as it applies to GIS.
It will include an overview on the errors and uncertainties
that could impact the quality of spatial data.
This will be followed by discussing the impact of errors in
spatial data on spatial information.
The ISO geospatial standards will be reviewed as well.
Finally, some practical tools and examples of numerical
and statistical assessment of uncertainty and quality of
spatial information will be discussed and demonstrated. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 4
2
Importance of Quality
Gain confidence in geodata
Reduce users‘ complaints
Get customer’s satisfaction
Minimize consecutive costs caused by decisions
or actions based on erroneous data
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 5
No unified definition of data quality
1. Data Quality refers to the degree of excellence
exhibited by the data in relation to the portrayal of the
actual phenomena. GIS Glossary
2. The state of completeness, validity, consistency,
timeliness and accuracy that makes data appropriate
for a specific use. Government of British Columbia
3. The totality of features and characteristics of data
that bears on their ability to satisfy a given purpose; the
sum of the degrees of excellence for factors related to
data. Glossary of Quality Assurance Terms
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 6
No unified definition of data quality
4. Information Quality : the fitness for use of
information; information that meets the requirements of
its authors, users, and administrators. (Martin Eppler)
5. Data quality: The processes and technologies
involved in ensuring the conformance of data values to
business requirements and acceptance criteria
6.ISO/PAS 26183:2006 defines product data quality as
a measure of the accuracy and appropriateness of
product data, combined with the timeliness with which
those data are provided to all the people who need
them.
And more……
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 7
Error and Uncertainty in GIS
• One of the major problems currently existing within GIS is
the aura of accuracy surrounding digital geographic data
• Often hardcopy map sources include a map reliability rating
or confidence rating in the map legend
• This rating helps the user in determining the fitness for use
for the map
• However, rarely is this information encoded in the digital
conversion process
• Often because GIS data is in digital form and can be
represented with a high precision it is considered to be
totally accurate Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 8
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 66
Spatial Accuracy (Horizontal
Accuracy)
Circular error is based on the sample
standard deviation of di, the difference
between the data set coordinate value and
the coordinate value determined by an
independent check survey of higher accuracy
for the same point.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 67
The standard deviation for the horizontal coordinate r is:
1
)( 2
n
ddi
rs
Where:
22
iii yxr ii checkdatai rrd
n
dd
i The mean discrepancy
n = total number of points checked
NSSDA horizontal accuracy is:
Accuracyr = 2.4477 * si , (95% confid. level) Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 68
18
The standard deviation for the z coordinate direction is:
1
)( 2
n
ddi
zs
where:
i ii data checkd z z
n
dd
i The mean discrepancy
n = total number of points checked
NSSDA vertical accuracy is: Accuracyr = 1.96 * si , (95% confidence level)
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 69
Well-Defined Points
Small scale Large scale
Road/Rail intersections Center of utility access cover
Small isolated shrubs Sidewalk/curb/gutter intersec.
Corners of structures Monuments
Features that can be identified within 1/3 of the
maximum expected uncertainty for the data set.
Acceptable features
Check survey points should have accuracies within one-third the data sets intended accuracy (95% CL)
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 70
Check Point Location (assuming rectangle area)
Spaced at intervals of at least 10% of the diagonal.
At least 20% of the points are located in each quad.
Check points may be distributed more densely in the vicinity
of important features
When data exist for only a portion of the data set, confine
test points to that area.
When the distribution of error is likely to be nonrandom, it
may be desirable to locate check points to correspond to
the error distribution.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 71
Positional Accuracy evaluation
of Othophotos in New Jersey
Point
Accuracy (ft)
1
4.25
2
4.07
3
2.28
4
3.98
5
4.18 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 72
19
ATTRIBUTE ACCURACY
Defined as the closeness of attribute values to their true value
Note that while location does not change with time, attributes often do
Attribute accuracy must be analyzed in different ways depending on the nature of the data
For continuous attributes (surfaces) such as on a DEM or TIN:
accuracy is expressed as measurement error (e.g. ±1m)
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 73
ATTRIBUTE ACCURACY
For categorical attributes such as classified polygons:
Are the categories appropriate, sufficiently detailed and defined?
Is polygon classified as A really A or should be B?
How heterogeneous are the polygon (e.g. 70% A and 30% B
How well are A and B defined (e.g. soils classifications)
center area may be definitely A, but more like B at the edges
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 74
ATTRIBUTE ACCURACY
How to test attribute accuracy?
prepare a misclassification matrix and calculate the degree of correctness
Examples:
The Kappa coefficient
Map Producer’s accuracy
Map User’s accuracy
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 75
The Kappa coefficient
0 AA BBP P P
0
1
e
e
P PKappa
P
Dataset A Dataset B Comparing A to B
A B
A PAA PAB PAr
B PBA PBB PBr
PAc PBc 1
A B
A OAA OAB OAr
B OBA OBB OBr
OAc OBc Σ
e Ac Ar Bc BrP P P P P
O – Observed
P – Percentage
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 76
20
The Kappa coefficient
00.586 0.283 0.869P
0.869 0.5460.711
1 0.546Kappa
Dataset A Dataset B Comparing A to B
R B
R 0.586 0.061 0.646
B 0.071 0.283 0.354
0.657 0.343 1
R B
R 58 6 64
B 7 28 35
65 34 99
0.657 0.646 0.343 0.354 0.546e
P Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 77
How to interpret Kappa
Kappa is always less than or equal to 1.
A value of 1 implies perfect agreement and values less
than 1 imply less than perfect agreement.
In rare situations, Kappa can be negative. This is a sign
that the two observers agreed less than would be
expected just by chance.
A possible interpretation of Kappa. The agreement is:
0.0 0.2 0.4 0.6 0.8 1.0
Poor Fair Moderate Good Very good
Kappa Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 78
Assume we have a 9 cell land cover map, one from 1980 and one from 2000 with three categories: A, B, and C.
The cross tabulation can be quantified into a matrix oftentimes called a confusion matrix
Other Accuracy Assessment
A B C
A
B
C
1980 LC 2000 LC Cross Tabulated Grid
A B A
B C C
A A B
B B A
B B C
B A C
BA BB AA
BB BC CC
BA AA CB
2 0 2
0 2 1
0 1 1
The matrix shows the agreements
between the 1980 and 200 maps. As
an example, 2 cells remained A (AA),
1 cell was C and is now B (CB), etc.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 79
Other Accuracy Assessment
Sum up the rows and columns. But
what do these numbers tell us?
The bottom row tells us that there
were two cells that were A, five B,
and two C.
A B C
A
B
C
2 0 2
0 2 1
0 1 1
4
3
2
2 5 2
The rightmost column tells us that we mapped 4 cells as A, 3 as B, and 2 as C.
Adding up the Diagonal cells says that 5 cells were right.
The overall agreement between maps is:
Σdii /n = 5/9 = 0.55%
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 80
21
User and Producer Accuracy
The total correspondence of our example is 55%. But,
that only tells us part of the story. What if we were
really interested in classification B? Where there
changes in classification B? Even here, there are two
different ways of interpreting that question:
If I were interested in mapping all the areas of B,
how well did I get them all? This is called the map
Producer’s Accuracy. That is, how well did we
produce a map of classification B.
If I were to use the map to find B, how successful
would I be? This is called the Map User’s Accuracy.
That is, much confidence should a user of the map
have for a given classification. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 81
User and Producer Accuracy
Map user’s accuracy = the total number correct within
a row divide by the total number in the whole row.
Map producer’s accuracy = the total number of
correct within a column divided by the
total number in the whole column.
Example of classification B
Map user’s accuracy = 2/3 = 67%
Map producer’s accuracy = 2/5 = 40%
A B C
A
B
C
2 0 2
0 2 1
0 1 1
4
3
2
2 5 2
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 82
User and Producer Accuracy
How can we use the above results?
This means that if we were to use this map and look
for the classification of B, we would be correct 67% of
the time.
This means that the map produced only 40% of all
the B’s that were out there.
This also gives us some indication of the nature of
the errors. For instance, it appears that we confused
classification A with classification B (we said on two
occasions that B was A). By understanding the
nature of the errors, perhaps we can go back, look
over our process and correct for that mistake. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 83
LOGICAL CONSISTENCY
Refers to the degree of adherence to logical rules of
data structures (conceptual, logical or physical),
attribution and relationships. It includes:
Conceptual consistence; adherence to rules of
conceptual schema
Domain consistency; adherence of values to the value
domain
Format consistency; degree to which data is stored in
accordance to physical structure of the dataset
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 84
22
LOGICAL CONSISTENCY
Topological consistency; correctness of the explicitly
encoded topological characteristics of a dataset. For
example:
• If there are polygons, do they close?
• Is there exactly one label within each polygon?
• Are there nodes wherever arcs cross, or do arcs
sometimes cross without forming nodes?
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 85
COMPLETENESS
Refers to and absence of features, their attributes and
relationships of spatial data in comparing what is
defined in the data model or what is in the real world.
Error of commission – data presented in a data set that
is not present in the data model or the real world
Error of omission – data that is present in the data
model or the real world is absent in the dataset.
Affected by rules of selection, generalization and scale
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 86
LINEAGE
A record of the data sources and of the operations
which created the database
How was it digitized, from what documents?
When was the data collected?
What agency collected the data?
What steps were used to process the data?
• precision of computational results
Is often a useful indicator of accuracy
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 87
An Example of Data Quality Elements
and Sub-elements for Buildings
Quality
elements
Quality sub-
elements
Description by
examples
Completeness
Commission error Buildings with area less
than 4m2 are presented
in Building Polygon layer
of 1:1000 data set.
Omission error Buildings with area equal
to or larger than 4m2 are
absent from the Building
Polygon layer.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 88
23
An Example of Data Quality Elements
and Sub-elements for Buildings Quality
elements
Quality sub-
elements
Description by
examples
Positional
accuracy
Horizontal accuracy
RMSE of a building
polygon based on a com- parison of the horizontal coordinates of all the
nodes of its footprints of
a building in GIS with
the corresponding
reference values.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 89
An Example of Data Quality Elements
and Sub-elements for Buildings
Quality
elements
Quality sub-
elements
Description by
examples
Positional
accuracy
Vertical accuracy
RMSE of a building
polygon based on a
comparison of the
vertical coordinates of all
the nodes of its footprints
of a building in GIS with
the corresponding
reference values.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 90
An Example of Data Quality Elements
and Sub-elements for Buildings Quality
elements
Quality sub-
elements Description by examples
Attribute
accuracy
Classification
correctness
Correctness that a building or
related features is correctly
classified as one (or more)
building- related features.
Non-quantitative
attribute
correctness
The Name of a building
polygon may be correct or
wrong in a GIS.
Quantitative
attribute
correctness
The value of the field
"Building Top Level" of a
Building Polygon may be
correct or wrong. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 91
An Example of Data Quality Elements
and Sub-elements for Buildings Quality
elements
Quality sub-
elements Description by examples
Logical
consistency
Conceptual
consistency
A tower is described to be
under its podium.
Domain
consistency
The classification of feature
code for a building polygon is
beyond any of the following
given classes: BR BAR BUP,
IBP, OSP, PWP, TSP.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 92
24
An Example of Data Quality Elements
and Sub-elements for Buildings Quality
elements
Quality sub-
elements Description by examples
Logical
consistency
Format
consistency
Building names in title case -
Hong Kong Airport- are
consistent, while a name
such as "HONG KONG
Airport" is not consistent in
format.
Topological
consistency
When the outline of a building
polygon is closed, the
topology is consistent; when
the outline is not closed, the
topology is not consistent. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 93
Uncertainties Measured Based on
Various Mathematical Theories Uncertainty
Imprecision Ambiguity Vagueness
Confidence region
model Shi 1994
Entropy Shannon 1948 Hartley’s measure 1928
Discord measure, Confusion measure
and non-specificity measure
U-uncertainty, Fuzzy measure
Fuzzy topology measure
Probability and
statistical theory
Evidence theory
Fuzzy sets, Probability
and Fuzzy topology
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 94
Positional Uncertainty
DEM surface
Uncertainty In spatial analysis
Raster Image
A framework for modeling uncertainties
in spatial data and analysis
Real World
Object
Point
Line
Polygon
3D objects
Uncertain Topology
Uncertainty From
Multi-data source
Field Uncertainty of Remote
Sensing data
Errors in DEM
Positional Uncertainty
Hybrid DEM
Interpolation
Uncertain spatial Query
Geometric Correction and image
fusion
Pro
ce
ssin
g a
nd
the
un
ce
rtain
c
on
trol o
f Sp
atia
l da
ta
Vis
ua
lizatio
n a
nd
the
dis
tribu
tion
of
Un
ce
rtain
ty In
form
atio
n
Real World Data type Classification Of spatial data
Description of Uncertainty
Uncertainty modeling In spatial analysis
and query Control of
Uncertainties Visualization of Uncertainties Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 95
The transformation equation between U,V and X,Y is:
X
U V
Sx
Sy
Sv
Su
t
Y
X
t
t
t
t
V
U
cos
sin
sin
cos
t is rotation angle from Y axis to axis of largest error.
Su is the semi-major axis of ellipse. (Largest error) u
Sv is the semi-minor axis of ellipse. (Least error) v
Sx is the standard deviation in X of coordinate x
Sy is the standard deviation in Y of coordinate y
Error model of point – Error Ellipse
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 96
25
X
U V Sx
Sy
Sv
Su
t 22
22tan
YX
xy
SS
St
2222
4
)(XY
YX SSS
K
KSS
S YXu
2
222 K
SSS YX
u
2
222
Error model of point – Error Ellipse
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 97
Error model of line - Epsilon band
Assumptions:
1. each error effect relevant to a particular digital line in a
GIS can be treated as a random variable, perturbing the
true line to obtain the observed line.
2. the processes of generating a digital line in a GIS can be
treated as being independent.
The bandwidth is determined from a statistical function of
those positional errors on the line accumulated from the
first stage to the final stage of data capture.
The measured Line
The true Line Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 98
Error model of a polygon
The area S of the polygon is computed from:
The differential of the area is given as:
1 1 1, 1
1 1
1 1[ ( )] [ ]
2 2
n n
i i i i i i
i i
S x y y x y
D
1, 1 1, 1
1
1[ ]
2
n
i i i i i i
i
dS y dx x dy
D D
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 99
Error model of a polygon
For simplicity assume all coordinate accuracies are equal
to σo and covariance is 0 we get:
Where: li-1,i+1 is the distance between points Pi-1 and Pi+1
2 2 2 2 2
1, 1 1, 1 1, 1
1 1
1 1[ ] [ ]
4 4
n n
S i i i i o i i o
i i
y x ls s s
D D
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 100
26
What is a standard?
Standards are documented agreements containing
technical specifications or other precise criteria to be
used
consistently as rules, guidelines, or definitions of
characteristics, to
ensure that materials, products, processes and
services are fit for their purpose.
(as defined by ISO)
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 101
Traffic Signals – Road Signs
VISA / Mastercard: standards allow people to use a single card to obtain cash in the local currency around the world
Commerce/Manufacturing/Industry
World War II - Allied supplies and facilities were severely strained due to the incompatibility of tools, replacements parts, and equipment. The establishment of international standards helped to increase compatibility.
Examples of Everyday Standards
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 102
Disasters (fire, flood, …)
Great Baltimore Fire of 1904 - fire engines from different
regions arrived to help put out the fire, only they had
different hose coupling sizes that did not fit the Baltimore
hydrants - fire burned over 30 hours, resulted in destruction
of 1526 building covering 17 city blocks.
Metric System vs US Customary System
The Importance of Standards (when standards do not exist)
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 103
The Need for Standards in Geographic Information
To ensure common understanding through a common set of
terminology
To promote/enable interoperability
To support the establishment of geospatial infrastructures at
local, regional, and global levels
To promote data and information sharing/exchange
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 104
27
Types of geospatial standards
Data Classification
e.g., Vegetation Classification
Data Content
e.g., Digital Geospatial Metadata, Spatial Schema
Data Symbology or Presentation
e.g., Digital Geologic Map Symbolization
Data Transfer
Data Usability
e.g., Geospatial Positioning Accuracy
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 105
Evaluating and Reporting Quality Evaluation
Results [ISO 19114]
Dataset as specified by the scope
Identify a data quality measure
Select and apply a data quality evaluation method
Determine the data quality result
Identify an applicable data quality element, data quality subelement,
and data quality scope
Conformance quality level
Determine conformance
Product specification or user requirements
Report data quality result (quantitative)
Report data quality result (pass / fail)
work item
19131
ISO 19113 ISO 19113
ISO
19
11
4
5 step process on quality evaluation
1
2
3
4
5
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 106
Metadata Example
Without…
With…
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 108
28
Metadata need Example
WQPW- ID DIN Pb
PB-31 .34 .012
HK-14 .12 .023
PB12 35 034
PB-12 .35 .034
WA-3 .28 .001
PB-4 .23 .022
PB-5 .21 .013
HUH?
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 109
The Standard
Metadata has four major roles:
Availability- information needed to determine the
sets of data that exist for a geographic location.
Fitness for use- information needed to determine if a
set of data meets a specific need.
Access- information needed to acquire an identified
set of data.
Transfer- information needed to process and use a
set of data
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 110
Information that can be found in Metadata
• Title, Abstract, Publication Date (Section 1: Identification information) • Data Accuracy and Completeness (Section 2: Data Quality Information) • Data Form: Vector or Raster? (Section 3: Spatial Data Organization Information) • Projection or Geographic Reference System (Section 4: Spatial Reference Information) • What Values Are Associated with Geodata? (Section 5: Entity and Attribute Information) • How Do You Get It? Cost? (Section 6: Distribution Information) • How Current Is the Documentation? (Section 7: Metadata Reference Information) Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 111
The Value of Metadata
Organize and maintain an organization’s investment in data
Provide information to data catalogs and clearinghouses
Provide information to aid data transfer
Food for thought... Nothing happens overnight: get used to thinking of the long term benefits
of metadata. $$$
Documentation = defense
The Standard: don't judge a book by its cover
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 112
29
Metadata resources
The FGDC Federal Geographic Data Committee: Interagency committee that
coordinates federal geo-data activities.
The Content Standard for Digital Geospatial Metadata (CSDGM)
•The current US Federal Metadata standard
•Often referred to as the 'FGDC Metadata Standard‘
•Has been implemented in federal state and local governments
International Organization of Standards (ISO), has developed and
approved an international metadata standard, ISO 19115 – Geographic
Information Metadata
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 113
Metadata resources
• The objective of this International Standard is to provide a clear
procedure for the description of digital geographic datasets so that users
will be able to determine whether the data in a holding will be of use to
them and how to access the data. By establishing a common set of
metadata terminology, definitions and extension procedures, this
standard will promote the proper use and effective retrieval of geographic
data.
• Supplementary benefits of this standard for metadata are to facilitate the
organization and management of geographic data and to provide
information about an organization’s database to others.
• This standard for the implementation and documentation of metadata
furnishes those unfamiliar with geographic data the appropriate
information to characterize their geographic data and it makes possible
dataset cataloguing enabling data discovery, retrieval and reuse. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 114
Entity and
Attribute
Informatio
n
Graphical Representation of the:
US Geological Survey Biological Resources Division
DRAFT Content Standard for Biological Metadata
Based on : The Federal Geographic Data Committee’s Content Standard for Digital Geospatial
Metadata June 8, 1994 version 1.0
Prepared by Susan Stitt, Center for Biological Informatics
1. 2. 3. 4. 5. 6. 7.
Identification
Information
Data Quality
Information
Spatial Data
Organization
Information
Spatial
Reference
Informatio
n
Distribution
Information
Metadata
Reference
Information
Mandatory Mandatory
if Applicable
Optional Biological
Items Added
Metadata
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 115
Best Practices for Writing Quality Metadata
Writing Principles
Write simply but completely
Document for a general audience
Adopt a consistent style
Avoid using jargon
Define technical terms
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 116
30
Best Practices for Writing Quality Metadata
In Practice
State clearly what your data are not
Find, evaluate, and reuse good examples
See examples from FGDC workbook
Mine the Clearinghouse for other examples
Use keywords as indicators of the contents of a dataset
Use a thesaurus or controlled vocabulary when possible
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 117
Best Practices for Writing Quality Metadata
In Practice (continued)
Use subtitles to define and clarify long passages
Quantify assessments wherever possible
Use “None” and “Unknown” carefully
Format date: YYYYMMD
Avoid using confusing symbols & conventions:
! @ # % { } | / \ < > ~
Unnecessary carriage returns, tabs, indents, etc.
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 118
Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 119 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 120