Transcript

Digital CholeraPeter Wells - @peterkwells

OpenTech - June 2015

CholeraThe last time I was in this building I went to a talk on an early example of data analysis and data visualisation.

John Snow famously traced a fatal cholera epidemic in Soho in 1854 to a local water pump.

Because of cholera in the pump the water was not safe to use.

Read more about John Snow: http://en.wikipedia.org/wiki/John_Snow_%28physician%29@peterkwells

Cholera and infrastructureThe Soho outbreak started at a water pump, it could have been a water reservoir.

The cholera bacteria would spread and contaminate the water downstream. An entire set of water infrastructure could have been contaminated.

The water would not have been safe to use. Yet water is essential to life.

Image CC-BY-2.0 by Woodley Wonderworks: https://www.flickr.com/photos/wwworks/@peterkwells

Safe waterAs a society we invest in water infrastructure. We have:

- inspections- alerting systems- purification- education

We put more focus at the top of the infrastructure, on water producers and distributors, than we do on water users.

The goal is to make water that’s safe for people to use.

A Doctor from the World Health Organisation

@peterkwells

Get to the digital...

@peterkwells

Open AddressesOrganisations have to buy lists of UK addresses, licensing is complicated, the quality isn’t great, the data doesn’t meet all the needs.

It’s hard to build new services.Open Addresses explored whether it was possible to build a new UK address list, to make things simpler and make addresses more widely used.

@peterkwells

Addressing needsDenmark had a 1000% increase in the organisations that use address data by making address data simpler to use.

We discovered other needs and benefits:

- people who move into new houses need their addresses to be published faster- people name their houses and need other people to know about it- people need it to be easier to enter addresses on websites- (I could go on…)

@peterkwells

More and better services that would make life a little bit easier

Getting addressesAs well as understanding the needs we had to find data.

There are 26-40m addresses in the UK.

The Land Registry publishes over 18 million addresses in the Price Paid Dataset. Sounds great!

@peterkwellsAside: we also did some neat stuff on mathematical inference for addresses. Check out www.openaddressesuk.org...

Land Registry says no...

Image from Owen Boswarva: http://mapgubbins.tumblr.com/post/107499166390/it-was-all-a-dream-land-registrys-price-paid@peterkwells

Third Party Rights are complex and can be fatalAddress datasets can include third-party database rights:

1. if the data was directly copied from an existing address database2. if an existing list of addresses (obtained through another route) was corrected orvalidated based on an existing address database

Unauthorised use of third party rights creates risk for both data publishers and consumers.

The service can simply…... stop.

@peterkwells

Third party rights, they’re everywhere!As we inspected other datasets we saw similar issues with unauthorised rights:

- websites for data capture that used third party address products- datasets that had been cleansed with third party address products- a clean website followed by automated back-end validation

Even with submission guidelines, provenance tracking and takedown policies the legal position for Open Addresses was really complex.

We made a :(@peterkwells

LightbulbIt is complicated to determine if unauthorised third party rights exist. You need to inspect the data and how it was produced

@peterkwells

Image by Richard Rutter: https://www.flickr.com/photos/clagnut/

Safe water - a repriseAs a society we invest in water infrastructure:

- inspections- alerting systems- purification- education

We put more focus at the top of the infrastructure, on water producers and distributors, than we do on water users.

The goal is to make water that’s safe for people to use.

Image CC-BY-2.0 by Woodley Wonderworks: https://www.flickr.com/photos/wwworks/@peterkwells

A Doctor from the World Health Organisation

Digital cholera

@peterkwells

Copyright is a good thing (don’t believe me? ask a musician) so I’m using a harsh metaphor, but the metaphor is useful.

Don’t take away my copyright!

Digital cholera

@peterkwells

The water may be infected with cholera.

Therefore we inspect it to see if the water is safe to use.

Land Registry address data may be infected with digital cholera.

Therefore we inspect it to see if the data is safe to use.

We learnt it wasn’t so we didn’t….

Digital cholera

@peterkwells

Not just about unauthorised third party rights.

Inappropriate releases of personal data.

Incomplete data.

Incorrect data.

Remember it’s a metaphor.

Digital cholera

@peterkwells

Can we learn more from how society learnt to deal with cholera in water?

Alerting system?

@peterkwells

We’ve told Land Registry of the problem(s).

We’ve published articles to alert others.

We’re here.

Should this be better?

Purification?

@peterkwells

Tricky. There is no equivalent of a purification tablet.

We need to cleanse data infrastructure of digital cholera or we need to rebuild it.

It is simplest if the data is kept pure by whoever creates and maintains it.

Just as with water.

Education

@peterkwells

The ODI already have a wealth of education material and are including the thinking and learning from Open Addresses in some future work:

Send your ideas more here:http://theodi.org/who-owns-our-data-infrastructure?

Water is essential to life so we invest in maintaining our water infrastructure to make water safe to use.

Data gives us more and better services. It is is essential to life. We need to invest in maintaining useful data infrastructure to make data safe to use.

@peterkwells

@peterkwellsImage by Don Graham: https://www.flickr.com/photos/23155134@N06/

If we don’t look after our data infrastructure we risk simply ending up with some rusty and unused data pumps….

top related