Top Banner
1 Lecture 18 Data Quality Issues Ch. 14
64

11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

Jan 17, 2016

Download

Documents

Merilyn Harmon
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

11

Lecture 18Data Quality Issues

Ch. 14

Page 2: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

2

Introduction

• Spatial data and analysis standards are important because of the range of organizations producing and using spatial data, and the amount of data transferred between these organizations.

• There are several types of standards:– Data standards– Interoperability standards– Analysis standards – Professional and certification standards

Page 3: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

3

Introduction (continued)

• National and international standards organizations are important in defining and maintaining geospatial standards:– Federal Geographic Data Committee (FGDC) which

focuses on the national spatial data infrastructure (www.fgdc.gov)

– International Spatial Data Standards Commission which is a clearing house and gateway for international standards

– Open Geospatial Consortium (OGC) which is developing interoperability standards. Web Mapping Service (WMS) standards are an example.

Page 4: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

GIS Certification

• What kind of certification is available?

• Two primary options:– Geographic Information Systems Professional

(GISP) is based on your work and volunteering experience.

– ESRI Technical Certifications are test based.

• The third option is a university based certification.

4

Page 5: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

5

The Geospatial Competency Model

Page 6: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

6

Page 7: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

77

GIS Professional Certification URISA is the founding member of the

GIS Certification Institute, the organization that administers professional certification for the field

and is dedicated to advancing the industry.

Education: 30 Points

Experience: 60 Points

Contributions: 8 Points

The additional 52 points can be counted from any of the three categories. 

The minimum number of points needed to become a certified GIS Professional as detailed in the three point schedules given below is 150 points.  Thus, all applicants are expected to document achievements valued at a minimum of 150 points. To ensure that applicants have a broad foundation, specific minimums in each of the three achievement categories must be met or exceeded.  These minimums are as follows:

Page 8: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

8

Page 9: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

9

A Sample of University Certificates

• UMM – undergraduate

• USM undergrad/grad

• UM – graduate

• Penn State

• University of Denver

• University of Southern California

• George Mason University

Page 10: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

1010

Spatial Data Standards

• Data – measurements and observations

• Data quality – a measure of the fitness for use of data for a particular task (Chrisman, 1994).

• It is the responsibility of the user to insure that the data is fit for the task.

• Metadata – data about the data

Page 11: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

1111

Spatial Data Standards

• Spatial Data Standards – methods for structuring, describing and delivering spatially-referenced data.

• Media Standards – the physical form of the data (CD/download etc).

• Format Standards – specify data file components and structures. These standards aid in data transfer.

• Spatial Data Accuracy Standards –document the quality of the positional and attribute accuracy.

• Document Standards – define how we describe spatial data.

Page 12: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

1212

GIS Is Not PerfectA GIS cannot perfectly represent the world for many

reasons, including: • The world is too complex and detailed. • The data structures or models (raster, vector, or

TIN) used by a GIS to represent the world are not discriminating or flexible enough.

• We make decisions (how to categorize data, how to define zones) that are not always fully informed or justified, and are always biased.

• It is impossible to make a perfect representation of the world, so uncertainty is inevitable

• Uncertainty degrades the quality of a spatial representation

Page 13: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

1313

Concepts Related to Data Quality

• Related to individual data sets:– Errors – flaws in data– Accuracy – the extent to which an estimated

value approaches the true value.– Precision – the recorded level of detail of your

data.– Bias – the systematic variation of the data

from reality.• Personal bias• Instrument bias

Page 14: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

1414

Page 15: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

1515

Concepts Related to Data Quality

• Related to source data:– Resolution – the smallest feature in the data

set that can be displayed.– Generalization- simplification of objects in the

real world to produce scale models and maps.

Page 16: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

1616

Resolution and generalization of raster datasets

Page 17: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

1717

Figure 10.3 Scale-related generalization

Page 18: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

1818

Data Sets Used for Analysis

Must be:– Complete – spatially and temporally– Compatible – same scale, units of measure,

measurement level– Consistent – both within and between data

sets. – And Applicable for the analysis being

performed.

Page 19: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

1919

Sources of Error (Uncertainty) in GIS

Page 20: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

2020

A Conceptual View of Uncertainty

Real World

Conception

Data conversion and Analysis

Source Data, Measurements &Representation

Result

error propagation

Page 21: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

2121

Uncertainty in The Conception of Geographic Phenomena

Many spatial objects are not well defined or their definition is to some extent arbitrary, so that people can reasonably disagree about whether a particular object is x or not. There are at least four types of conceptual uncertainty

• Spatial uncertainty• Vagueness• Ambiguity• Regionalization problems

Page 22: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

2222

• Spatial uncertainty occurs when objects do not have a discrete, well defined extent.

• They may have indistinct boundaries.

• They may have impacts that extend beyond their boundaries.

• They may simply be statistical entities.

• The attributes ascribed to spatial objects may also be subjective.

Spatial uncertainty

Page 23: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

2323

• Vagueness occurs when the criteria that define an object as x are not explicit or rigorous.

• For example:– In a land cover analysis, how many oaks (or

what proportion of oaks) must be found in a tract of land to qualify it as oak woodland?

– What incidence of crime (or resident criminals) defines a high crime neighborhood?

Vagueness (obscureness)

Page 24: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

2424

Ambiguity

Ambiguity occurs when y is used as a substitute, or indicator, for x because x is not available.

• The link between direct indicators and the phenomena for which they substitute is straightforward and fairly unambiguous.

• Indirect indicators tend to be more ambiguous and opaque.

• Of course, indicators are not simply direct or indirect; they occupy a continuum. The more indirect they are, the greater the ambiguity.

Page 25: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

2525

• Regional geography is largely founded on the creation of a mosaic of zones that make it easy to portray spatial data distributions.

• A uniform zone is defined by the extent of a common characteristic, such as climate, landform, or soil type.

• Functional zones are areas that delimit the extent of influence of a facility or feature—for example, how far people travel to a shopping center or the geographic extent of support for a football team.

• Regionalization problems occur because zones are artificial.

Regionalization problems

Page 26: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

2626

Uncertainty in the measurement of geographic phenomena

Error occurs in physical measurement of objects. This error creates further uncertainty about the true nature of spatial objects.

• Physical measurement error• Digitizing error• Error caused by combining data sets with

different lineages

Page 27: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

2727

Physical measurement error

Instruments and procedures used to make physical measurements are not perfectly accurate. For example, a survey of Mount Everest might find its height to be 8,850 meters, with an accuracy of plus or minus 5 meters.

• In addition, the earth is not a perfectly stable platform from which to make measurements. Seismic motion, continental drift, and the wobbling of the earth's axis cause physical measurements to be inexact. (GPSing error, GPSing error, remote sensing errorremote sensing error)

Page 28: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

2828

Digitizing error

• A great deal of spatial data has been digitized from paper maps.

• Digitizing, or the electronic tracing of paper maps, is prone to human error. – Lines may be drawn too far, not far enough, or missed

entirely. Errors caused by digitizing mistakes can be partially, but not completely, fixed by software.

– Additional error occurs because adjacent data digitized from different maps may not align correctly. This problem can also be partially corrected through a software technique called rubbersheeting.

Page 29: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

2929

Digitizing ErrorAny digitized map requires:

Considerable post-processing Check for missing features

Connect lines Remove spurious polygons Some of these steps can be

automated

Page 30: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

3030

Error caused by combining data sets with different lineages

• Data sets produced by different agencies or vendors may not match because different processes were used to capture or automate the data. – For example, buildings in one data set may appear on the

opposite side of the street in another data set. – Error may also be caused by combining sample and

population data or by using sample estimates that are not robust at fine scales.

– "Lifestyle" data are derived from shopping surveys and provide business and service planners with up-to-date socioeconomic data not found in traditional data sources like the census. Yet the methods by which lifestyle data are gathered and aggregated to zones or are compared to census data may not be scientifically rigorous

Page 31: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

3131

Uncertainty in the representation of geographic phenomena

• Representation is closely related to measurement. • Representation is not just an input to analysis, but

sometimes also the outcome of it. For this reason, we consider representation separately from measurement.– The world is infinitely complex, but computer system are finite. – Representation is all about the choices that are made in capturing

knowledge about the world

• Uncertainty in earth model: ellipsoid models, datum, projection types

• Uncertainty in the raster data model (structure)• Uncertainty in the vector data model (structure)

Page 32: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

3232

• The raster structure partitions space into square cells of equal size (also called pixels).

• Spatial objects x, y, and z emerge from cell classification, in which Cell A1 is classified as x, Cell A2 as y, Cell A3 as z, and so on, until all cells are evaluated.

• A spatial object x can be defined as a set of contiguous cells classified as x.

• Commonly, a cell is not purely one thing or another, but might contain some x, some y, and maybe a bit of z within its area.

• These impure cells are termed mixed pixels or "mixels." • Because a cell can hold only one value, a mixel must be

classified as if it were all one thing or another. Therefore, the raster structure may distort the shape of spatial objects.

Uncertainty in the raster data structure

Page 33: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

3333

Error in raster

• raster- because of the distortions due to flattening, cells in a raster can never be perfectly equal in size on the Earth’s surface. - when information is represented in raster form all detail about variation within cells is lost, and instead the cell is given a single value. largest sharelargest share, central central pointpoint (f.g. USGS DEM), and mean valuemean value (f.g. remote sensing imagery)

Largest share

Central point

8 6 7.5

Mean value

6.33

66.29

8

8

8 6

6

66

6

8x(1/6)+6x(5/6)=6.338x(3/4)+6x(1/4)=7.58x(1/7)+6x(6/7)=6.29

Page 34: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

3434

Figure 10.8 Problems with remotely sensed imagery: (left) example of a satellite image with cloud cover (A), shadows from topography (B), and shadows from cloud cover

(C); (right) an urban area showing a building leaning away from the cameraSource: Ian Bishop (left) and Google UK (right)

Page 35: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

3535

• Socioeconomic data—facts about people, houses, and households—are often best represented as points.

• For various reasons (to protect privacy, to limit data volume), data are usually aggregated and reported at a zonal level, such as census tracts or ZIP Codes.

• This distorts the data in two ways: – First, it gives them a spatially inappropriate representation

(polygons instead of points); – Second, it forces the data into zones whose boundaries

may not respect natural distribution patterns.

Uncertainty in the vector data structure

Page 36: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

3636

Map representation error

Map scale Ground distance, accuracy, or resolution (corresponding to 0.5 mm map distance)

1:1,250 0.625 m

1:2,500 1.25 m

1:5,000 2.5 m

1:10,000 5 m

1:24,000 12 m

1:50,000 25 m

1:100,000 50 m

1:250,000 125 m

1:1,000,000 500 m

1:10,000,000 5 km

Page 37: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

3737

Uncertainty in the data conversion and analysis of geographic phenomena

Uncertainties in data lead to uncertainties in the results of analysis; Data conversion and spatial analysis methods can create further uncertainty

• Data conversion error• Georeferencing and resampling• Projection and datum conversions• The ecological fallacy• The modifiable areal unit problem (MAUP)• Classification errors

Page 38: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

3838

• The ecological fallacyThe ecological fallacy is the mistake of assuming that an overall characteristic of a zone is also a characteristic of any location or individual within the zone.

• The Modifiable Areal Unit Problem (MAUP)The results of data analysis are influenced by the number and sizes of the zones used to organize the data. The Modifiable Area Unit Problem has at least three aspects:

1. The number, sizes, and shapes of zones affect the results of analysis.

2. The number of ways in which fine-scale zones can be aggregated into larger units is often great.

3. There are usually no objective criteria for choosing one zoning scheme over another.

Page 39: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

3939http://www.gistutor.com/concepts/24-intermediate-concept-tutorials/57-

ecological-fallacy-in-gis.html

Ecological Fallacy Example

Page 40: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

4040

http://www.google.com/imgres?um=1&hl=en&client=firefox-a&sa=N&rls=org.mozilla:en-US:official&biw=1257&bih=845&tbm=isch&tbnid=ghU6S5VuksC-8M:&imgrefurl=http://www.indiana.edu/~gisci/courses/g438/lectures/gis_census.html&docid=VCO84JSYMIBN2M&imgurl=http://w

MAUP Example

Page 41: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

4141

Classification error and quality check

Page 42: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

4242

SelectingSelectingROIsROIs

Alfalfa

Cotton

Grass

Fallow

Page 43: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

4343

Background:Background: ETM+, 7/15/01

Top image:Top image:IKONOS, Oct, 2000

Classification ResultClassification Result

Page 44: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

4444

Confusion Matrix

1686

Grass Alfalfa Cotton Chili Fallow (corn)

total User accuracy (%)

Grass 110 22 0 0 0 132 83.3

Alfalfa 5 105 0 0 0 110 79.5

Cotton 0 0 945 5 0 950 99.5

Chili 0 0 50 42 0 92 45.7

Fallow 0 0 0 0 484 484 100

total 115 127 995 47 484 1768

Producer accuracy (%)

95.6 82.7 95.0 89.4 100

Classification resultsClassification resultsGGrroouunndd ttrruutthh

%4.951768

1686_ AccuracyOverlay

%3.891768/)4844844792995950127110115132(1768

1768/)4844844792995950127110115132(1686_

xxxxx

xxxxxIndexKappa

Page 45: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

4545

• Producer accuracy is a measure indicating the probability that the classifier has labeled an image pixel into Class A given that the ground truth is Class A.

• User accuracy is a measure indicating the probability that a pixel is Class A given that the classifier has labeled the pixel into Class A

• Overall accuracy is total classification accuracy.• Kappa index (another parameter for overall accuracy) is a

more useful index for evaluating accuracy.– Errors of commission represent pixels that belong to another class

but are labeled as belonging to the class.– Errors of omission represent pixels that belong to the ground truth

class but that the classification technique has failed to classify them into the proper class.

Bases of Confusion Matrix

Page 46: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

4646

Error Propagation

Real World

Conception

Data conversion and Analysis

Measurement &Representation

Result

error propagation

• the errors in the input will propagate to the output of the operation

• error propagation measures the impacts of error (uncertainty) in data on the results of GIS operations

Page 47: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

4747

Finding and Modeling Errors

• Checking for errors– Visual inspection during data editing and

cleaning.– Attributes can be checked by using

annotation, line colors and patterns.– Double digitizing– Statistical analysis may identify extreme

values of attributes.

Page 48: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

4848

Finding and Modeling Errors

• Error modeling– 1. Epsilon modeling

• Based on a method of line generalization, and adapted by Blakemore.

• It places an error band around a digitized line, describing the probable distribution of error.

• Error distribution is subject to debate:– Normal curve– Piecewise quartile distribution– Bimodal

• The epsilon band can be used in analyses to improve the confidence of the user in the result.

Page 49: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

4949

Figure 10.17 Point-in-polygon categories of containmentSource: Blakemore (1984)

Page 50: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

5050

Finding and Modeling Errors• Error modeling

– 2. Monte Carlo simulation – used in overlays.• Simulates input data error by adding random noise to the

line coordinates of the map data.

• Each input is assumed to be characterized by an estimate of positional error.

• This changes the shape of the line.

• The process is repeated multiple times and the randomized data put through the GIS analyses.

• Output:– A number

– A map

Page 51: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

5151

Figure 10.18 Simulating effects of DEM error and algorithm uncertainty on derived stream networks

Page 52: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

5252

Managing GIS Error

• To manage errors we must track and document them.

• The concepts introduced earlier:– Accuracy, Precision, Resolution,

Generalization, Bias, Compatibility, Completeness and Consistency

provide a checklist of quality indicators:

• These should be documented for each data layer.

Page 53: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

5353

Managing GIS Error

• Data quality information can be used to create a data lineage.

• A record of the data history that presents essential information about the development of the data.

• This becomes the metadata.

Page 54: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

5454

Living with uncertainty

• uncertainty is inevitable and easier to find,• use metadata to document the uncertainty• sensitivity analysis to find the impacts of input

uncertainty on output, • rely on multiple sources of data, • be honest and informative in reporting the results of GIS

analysis.• US Federal Geographic Data Committee lists five

components of data quality: attribute accuracy, positional accuracy, logical consistency, completeness, and lineage (details see www.fgdc.gov)

Page 55: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

5555

Basics of FGDC

• Federal Geographic Data Committee (FGDC) metadata answers the who, what, where, when, how and why questions of geospatial data.

• The data structure and elements defined for FGDC metadata are described fully in the “Content Standard for Digital Geospatial Metadata” (CSDGM).

Page 56: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

5656

SEVEN SECTIONS OF FGDC

The Federal Geographic Data Committee (FGDC), Content Standard for Digital Geospatial Metadata (CSDGM) organizes a metadata record into seven main sections: – Identification Information– Data Quality Information– Spatial Data Organization Information– Spatial Reference Information– Entity and Attribute Information– Distribution Information– Metadata Reference Information

Page 57: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

5757http://www.maine.gov/megis/policies/megisfgdc.rtf

Identification Information

• What is the name of the dataset?• What is the subject or theme of the information included?• What is the scale of the dataset?• What are the attributes of the dataset?• Where is the geographic location of the dataset?• Who developed the dataset?• Who provided the source material for the dataset?• Who will publish the dataset?• When were the features of the dataset identified?• How are the features of the dataset depicted?• Why was the data set created?• Are there restrictions on accessing or using the data?• Are external files available that are related to the dataset?

Page 58: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

5858http://www.maine.gov/megis/policies/megisfgdc.rtf

Data Quality Information

• How reliable are the data?• What are its limitations or inconsistencies? • What is the positional and attribute accuracy? • Is the dataset complete? • Were the consistency and content of the data

verified? • Where can the sources of the data be located?• What processes were applied to these sources

and by whom?

Page 59: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

5959http://www.maine.gov/megis/policies/megisfgdc.rtf

Spatial Data Organization

• What spatial data model was used to encode the spatial data?

• How many and what kind of spatial objects are included in the dataset?

• Are methods other than coordinates, such as street addresses used to encode locations?

Page 60: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

6060http://www.maine.gov/megis/policies/megisfgdc.rtf

Spatial Reference

• Are coordinate locations encoded using longitude and latitude?

• What map projections is used?

• What horizontal datum and/or vertical datum are used?

• What parameters should be used to convert the data to another coordinate system?

Page 61: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

6161http://www.maine.gov/megis/policies/megisfgdc.rtf

Entity and Attribute Information

• What geographic information (roads, houses, elevation, temperature, etc.) is described?

• How is this information coded?

• What do the codes mean?

• What source was used for defining the attributes or codes, i.e. Cowardin classification?

Page 62: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

6262http://www.maine.gov/megis/policies/megisfgdc.rtf

Distribution

• From whom can the data be obtained?

• What formats are available?

• What media are available?

• Are the data available online?

• What is the price of the data?

Page 63: 11 Lecture 18 Data Quality Issues Ch. 14. 2 Introduction Spatial data and analysis standards are important because of the range of organizations producing.

6363http://www.maine.gov/megis/policies/megisfgdc.rtf

Metadata Reference

• When were the metadata compiled, and by whom?

• When was the metadata record created?

• Who is the responsible party?

• When were the metadata last updated?