Top Banner
Why Should You Trust My Data? building data infrastructure that accommodates networks of trust Matt Zumwalt datjawn.com | databindery.com @flyingzumwalt code{4}lib 2016
60

Why should you trust my data code4lib 2016

Jan 26, 2017

Download

Technology

flyingzumwalt
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Why Should You Trust My Data?building data infrastructure that accommodates networks of trust

    Matt Zumwalt

    datjawn.com | databindery.com

    @flyingzumwaltcode{4}lib 2016

    http://datjawn.comhttp://databindery.com

  • Im interested in trust.

  • Im interested in trust.particularly trust & trustworthiness

    when people exchange data

  • theres a rhythm to the computing world

    centralization decentralization

    client-server peer-to-peer

  • mainframes

    personal computers

    server farms

    [internet of everything]the cloud

    the PC revolution

    computers

    the diamond age

  • remember mainframes?

  • image credit wikipedia

    https://en.wikipedia.org/wiki/UNIVAC#/media/File:UnivacII.jpg

  • the www

  • host datareference each other

  • but data

  • image credit Torkild Retvedt

    https://www.flickr.com/photos/torkildr/3462606643

  • $$

    $$

    $$

    $

  • By 2019 the data created by IoE devices alone will be 49 times higher than all the traffic that moved through

    datacenters in 2014.

    it wont scale.

    Reference: Cisco Global Cloud Index

    http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.html

  • Worldwide Storage Capacity in 2012: 2.5 zettabytes

    Total Data Center Traffic in 2016: 10.4 zettabytes per year

    Anticipated data created by Internet of Everything (IoE) devices in 2019:

    507.5 zettabytes per year

    References: NetApp Cisco Global Cloud Index gigaom Washington Post

    http://siliconangle.com/blog/2012/05/21/when-will-the-world-reach-8-zetabytes-of-stored-data-infographic/http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.htmlhttps://gigaom.com/2012/05/30/heres-what-our-web-addiction-looks-like-in-2016/https://www.washingtonpost.com/blogs/ezra-klein/post/how-big-can-the-internet-get/2012/05/30/gJQAu9OH2U_blog.html

  • distributed data web

    You cant propose that something be a universal space and at the

    same time keep control of it. - Tim Berners Lee

    http://webfoundation.org/about/vision/history-of-the-web/

  • this relies on trust

  • elements of trustworthiness

    authority & reputation integrity & provenance synergy or compatibility

    consistency etc

  • weve got thisOrganisms have been solving

    these problems for eons Humans for millennia

    Librarians for centuries Software developers for decades

  • git for (tabular) data

    transparency & reproducibility

    http://datjawn.com builds from the work of http://dat-data.com

    Tabular: rows & columns (ie. Spreadsheets, CSV, SQL DBs)

    http://datjawn.comhttp://dat-data.com

  • history has branches

  • initial commit

    a set of changes

    commit those changes and describe them

    Who made the changes? Why did they make them?

    When did they commit them?

  • more changes

    commit those changes

  • different changes committed to a different branch

  • other changes on another branch

  • merge two branches

  • get a specific version prove its identical know who made it

  • Files are data. They have histories.

    Metadata are data. They have histories too. Whatever the data,

    The same patterns apply.

  • How does this get replicated?

  • client-server approach

  • peer to peer approach

  • the tide has already shifted

  • Stop building server-side applications. Assume that data are anywhere and/or everywhere.

    Assume that your software will be run in many places. Erase your distinctions between server and client.

    Let data grow branches - build trees (ie. Merkle DAGs) Stop thinking of data as singular.

    Stop thinking of datasets as monolithic. Embrace redundancy & replication.

    Understand that trustworthiness and authority are dynamic. Broaden your sense of now.

    Appreciate provenance.

    there are no servers there is only the web

  • Meet the dat jawn team on Wednesday

    Matt Zumwalt

    datjawn.com | databindery.com

    @flyingzumwaltcode{4}lib 2016

    http://datjawn.comhttp://databindery.com