Top Banner
Hamish James Statistics New Zealand Open data and data curation
29
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hamish James Statistics New Zealand Open data and data curation.

Hamish JamesStatistics New Zealand

Open data and data curation

Page 2: Hamish James Statistics New Zealand Open data and data curation.

Outline

1. Setting the scene

2. Open data

3. How open data and data curation are related

Page 3: Hamish James Statistics New Zealand Open data and data curation.

information

structured

digital analogue

unstructured

Quick definitions

data

open datadata curation

Page 4: Hamish James Statistics New Zealand Open data and data curation.

Defining data

Data consists of sets of structured values that can be organised, analysed and manipulated by a software application or some other means of calculation. This includes data collected directly through surveys and administrative systems, as well as data created or compiled by aggregating or reanalysing other sources. A defining characteristic of data is that it is machine-readable.

Page 5: Hamish James Statistics New Zealand Open data and data curation.

Open data, data curation

Open data is a philosophy based on the idea that that data is more valuable if more people can use it, and that technology has made the cost of sharing data negligble

Data curation is a field of research and work focusing on the long-term management of data, built on the argument that the opportunity cost of losing data is high

Open data highlights benefits Data curation worries about costs

Page 6: Hamish James Statistics New Zealand Open data and data curation.

data knowledge value

Page 7: Hamish James Statistics New Zealand Open data and data curation.

Focus of open data activities

• Data collected and held by governments

• Data collected or generated through publically funded research

• http://wiki.opengovdata.org/index.php?title=OpenDataPrinciples

Page 8: Hamish James Statistics New Zealand Open data and data curation.

Reasons to make data open

• The underlying purposes of making publically funded data more accessible are to:• inform decision making by government, businesses and

communities

• increase transparency and accountability in government decision making

• assist informed participation by the public in government decision making

• promote economic development through the innovate application of data collected for one purpose to other tasks

• gain greater value from research data

Page 9: Hamish James Statistics New Zealand Open data and data curation.

Barriers to reuse of government data

Agency culture (reluctance or hostility to data sharing)

Funding constraints Ensuring data confidentiality Shared ownership Poor dissemination practices

Page 10: Hamish James Statistics New Zealand Open data and data curation.

Open Government Data Principles

• Government data shall be considered open if it is made public in a way that complies with the principles below:

1. Complete. All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.

2. Primary. Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.

3. Timely. Data is made available as quickly as necessary to preserve the value of the data.

4. Accessible. Data is available to the widest range of users for the widest range of purposes.

5. Machine processable. Data is reasonably structured to allow automated processing.

6. Non-discriminatory. Data is available to anyone, with no requirement of registration.

7. Non-proprietary. Data is available in a format over which no entity has exclusive control.

8. License-free. Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.

Page 11: Hamish James Statistics New Zealand Open data and data curation.

Characteristics of open data

Open data: Free and open access to the data Freedom to redistribute the data Freedom to reuse the data No restriction of the above based on who someone

is (e.g. their nationality) or their field of endeavour (e.g. commercial or non-commercial)

c.f. http://www.okfn.org/about/

Page 12: Hamish James Statistics New Zealand Open data and data curation.
Page 13: Hamish James Statistics New Zealand Open data and data curation.

Creative Commons

Attribution

Share-alike

No derivative works

Non-commercial

Creative Commons licence conditions

Page 14: Hamish James Statistics New Zealand Open data and data curation.

Linked data

• Linked data uses semantic web approaches (especially RDF) to describe data and make it accessible to machines – a web of linked data

• RDF ‘triples’ are used to describe things• Subject – predicate – object

• Hamish – is a – presenter

Page 15: Hamish James Statistics New Zealand Open data and data curation.

Linking Open Data dataset cloud

Page 16: Hamish James Statistics New Zealand Open data and data curation.

What is missing?

Page 17: Hamish James Statistics New Zealand Open data and data curation.

46

Page 18: Hamish James Statistics New Zealand Open data and data curation.

Of a

person

Census

2006

As at 7 March 2006

In years

Age

46

Data needs context

Page 19: Hamish James Statistics New Zealand Open data and data curation.

Examples

“Which town or city in the UK has the highest proportion of students?"

“Which town or city in the UK is home to one or more university campuses whose registered full or part time (non-distance) students divided by the local population gives the largest percentage?”

http://digitalcuration.blogspot.com/2010/03/linked-data-and-reality.html

Page 20: Hamish James Statistics New Zealand Open data and data curation.

render explain

re/use

Documentation:• Standards• Meaning• Interpretation

Technology:• Hardware• Formats• Software

Page 21: Hamish James Statistics New Zealand Open data and data curation.

data knowledge value

Technology to render data

Documentation to explain

Page 22: Hamish James Statistics New Zealand Open data and data curation.

What is missing? Context

• Data is not self-describing

• Who provides the description?

• What does it cost to provide the description?

• How much of the description is held as tacit knowledge?• Expert’s personal knowledge

• Rules and meaning encoded into the data and software

Page 23: Hamish James Statistics New Zealand Open data and data curation.
Page 24: Hamish James Statistics New Zealand Open data and data curation.

Data curation

• Data curation involves:• Data management

• Adding value to data

• Data sharing for re-use

• Data preservation for later re-use

http://www.dcc.ac.uk/news/what-makes-data-curation

= open data = data curation

Page 25: Hamish James Statistics New Zealand Open data and data curation.

Digital Curation Centre

Page 26: Hamish James Statistics New Zealand Open data and data curation.

DDI Alliance

Page 27: Hamish James Statistics New Zealand Open data and data curation.

Open data brings benefits and risks

open data

more users

highlights data

curation failures

justifies data

curation costs

pressure for more

user support

expands expert

community

increases risk of poor

analysis

Page 28: Hamish James Statistics New Zealand Open data and data curation.

Complementary ideas

• Actively curated data will:• Remain technologically accessible

• Be easier to understand (and therefore use)

• Data curation will benefit from data being made more open:• Data that is in active use tends to remain usable

• Widely used data is better understood than isolated data

Page 29: Hamish James Statistics New Zealand Open data and data curation.

Thank you

Hamish James

Manager, Information Management

[email protected]

04 931 4237