Digital Cholera Peter Wells - @peterkwells OpenTech - June 2015
Aug 08, 2015
Digital CholeraPeter Wells - @peterkwells
OpenTech - June 2015
CholeraThe last time I was in this building I went to a talk on an early example of data analysis and data visualisation.
John Snow famously traced a fatal cholera epidemic in Soho in 1854 to a local water pump.
Because of cholera in the pump the water was not safe to use.
Read more about John Snow: http://en.wikipedia.org/wiki/John_Snow_%28physician%29@peterkwells
Cholera and infrastructureThe Soho outbreak started at a water pump, it could have been a water reservoir.
The cholera bacteria would spread and contaminate the water downstream. An entire set of water infrastructure could have been contaminated.
The water would not have been safe to use. Yet water is essential to life.
Image CC-BY-2.0 by Woodley Wonderworks: https://www.flickr.com/photos/wwworks/@peterkwells
Safe waterAs a society we invest in water infrastructure. We have:
- inspections- alerting systems- purification- education
We put more focus at the top of the infrastructure, on water producers and distributors, than we do on water users.
The goal is to make water that’s safe for people to use.
A Doctor from the World Health Organisation
@peterkwells
Get to the digital...
@peterkwells
Open AddressesOrganisations have to buy lists of UK addresses, licensing is complicated, the quality isn’t great, the data doesn’t meet all the needs.
It’s hard to build new services.Open Addresses explored whether it was possible to build a new UK address list, to make things simpler and make addresses more widely used.
@peterkwells
Addressing needsDenmark had a 1000% increase in the organisations that use address data by making address data simpler to use.
We discovered other needs and benefits:
- people who move into new houses need their addresses to be published faster- people name their houses and need other people to know about it- people need it to be easier to enter addresses on websites- (I could go on…)
@peterkwells
More and better services that would make life a little bit easier
Getting addressesAs well as understanding the needs we had to find data.
There are 26-40m addresses in the UK.
The Land Registry publishes over 18 million addresses in the Price Paid Dataset. Sounds great!
@peterkwellsAside: we also did some neat stuff on mathematical inference for addresses. Check out www.openaddressesuk.org...
Land Registry says no...
Image from Owen Boswarva: http://mapgubbins.tumblr.com/post/107499166390/it-was-all-a-dream-land-registrys-price-paid@peterkwells
Third Party Rights are complex and can be fatalAddress datasets can include third-party database rights:
1. if the data was directly copied from an existing address database2. if an existing list of addresses (obtained through another route) was corrected orvalidated based on an existing address database
Unauthorised use of third party rights creates risk for both data publishers and consumers.
The service can simply…... stop.
@peterkwells
Third party rights, they’re everywhere!As we inspected other datasets we saw similar issues with unauthorised rights:
- websites for data capture that used third party address products- datasets that had been cleansed with third party address products- a clean website followed by automated back-end validation
Even with submission guidelines, provenance tracking and takedown policies the legal position for Open Addresses was really complex.
We made a :(@peterkwells
LightbulbIt is complicated to determine if unauthorised third party rights exist. You need to inspect the data and how it was produced
@peterkwells
Image by Richard Rutter: https://www.flickr.com/photos/clagnut/
Safe water - a repriseAs a society we invest in water infrastructure:
- inspections- alerting systems- purification- education
We put more focus at the top of the infrastructure, on water producers and distributors, than we do on water users.
The goal is to make water that’s safe for people to use.
Image CC-BY-2.0 by Woodley Wonderworks: https://www.flickr.com/photos/wwworks/@peterkwells
A Doctor from the World Health Organisation
Digital cholera
@peterkwells
Copyright is a good thing (don’t believe me? ask a musician) so I’m using a harsh metaphor, but the metaphor is useful.
Don’t take away my copyright!
Digital cholera
@peterkwells
The water may be infected with cholera.
Therefore we inspect it to see if the water is safe to use.
Land Registry address data may be infected with digital cholera.
Therefore we inspect it to see if the data is safe to use.
We learnt it wasn’t so we didn’t….
Digital cholera
@peterkwells
Not just about unauthorised third party rights.
Inappropriate releases of personal data.
Incomplete data.
Incorrect data.
Remember it’s a metaphor.
Digital cholera
@peterkwells
Can we learn more from how society learnt to deal with cholera in water?
Alerting system?
@peterkwells
We’ve told Land Registry of the problem(s).
We’ve published articles to alert others.
We’re here.
Should this be better?
Purification?
@peterkwells
Tricky. There is no equivalent of a purification tablet.
We need to cleanse data infrastructure of digital cholera or we need to rebuild it.
It is simplest if the data is kept pure by whoever creates and maintains it.
Just as with water.
Education
@peterkwells
The ODI already have a wealth of education material and are including the thinking and learning from Open Addresses in some future work:
Send your ideas more here:http://theodi.org/who-owns-our-data-infrastructure?
Water is essential to life so we invest in maintaining our water infrastructure to make water safe to use.
Data gives us more and better services. It is is essential to life. We need to invest in maintaining useful data infrastructure to make data safe to use.
@peterkwells
@peterkwellsImage by Don Graham: https://www.flickr.com/photos/23155134@N06/
If we don’t look after our data infrastructure we risk simply ending up with some rusty and unused data pumps….