Top Banner
Richard Akerman NRC-CISTI Presented at Access 2009, Oct. 1, 2009 Will We Command Our Data? From the Petascale to the Personal
25

Will We Command Our Data? From the Petascale to the Personal

Nov 12, 2014

Download

Technology

Richard Akerman

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Will We Command Our Data?  From the Petascale to the Personal

Richard Akerman

NRC-CISTI

Presented at Access 2009, Oct. 1, 2009

Will We Command Our Data?From the Petascale to the Personal

Page 2: Will We Command Our Data?  From the Petascale to the Personal

Overview

• Definitions / Assumptions• How Big is Data?• Four Sources of Data

• Drivers• Activities

Page 3: Will We Command Our Data?  From the Petascale to the Personal

Definitions / Assumptions

• Petabyte = 1000 Terabytes• data = datasets• “data is”

Page 4: Will We Command Our Data?  From the Petascale to the Personal

How Big is Data?

http://www.instructables.com/file/FA9N61CF54HJ6GG/

Page 5: Will We Command Our Data?  From the Petascale to the Personal

How Big is Data?

http://www.flickr.com/photos/doctorow/2731870631/

Page 6: Will We Command Our Data?  From the Petascale to the Personal

How Big is Data?

http://en.wikipedia.org/wiki/File:Postduif.jpg

Page 7: Will We Command Our Data?  From the Petascale to the Personal

Four Sources of Data

• Research data• Government data• Library data• Personal data

Page 8: Will We Command Our Data?  From the Petascale to the Personal

General Drivers

• Since 2000, a convergence of factors:–Value of sharing–Ease of sharing–Level of sharing (machine level)

Page 9: Will We Command Our Data?  From the Petascale to the Personal

Specific Drivers: Research Data

• OECD Principles and Guidelines for Access to Research Data from Public Funding (April 2007)

• The Toronto Statement on prepublication data sharing (September 2009)

Page 10: Will We Command Our Data?  From the Petascale to the Personal

OECD Principles

• “Open access to research data from public funding should be easy, timely, user-friendly and preferably Internet-based.”

http://www.flickr.com/photos/ben-zvan-photography/468487548/

Page 12: Will We Command Our Data?  From the Petascale to the Personal

Specific Drivers: Open Government Data

• UK Power of Information Task Force Report (March 2009)– Modernise data publishing and reuse

http://poit.cabinetoffice.gov.uk/poit/category/data-final/– “public information held by for example the police,

health bodies and local authorities is often not available. This is bad for democratic expression, the economy and citizen customers.”

• Data.gov (May 2009)• UK PM Brown meets with Sir Berners-Lee (Sept. 2009)

Page 13: Will We Command Our Data?  From the Petascale to the Personal

Specific Drivers: Library Data

• ILS Customer Bill-of-Rights, John Blyberg (November 2005)

• “Berkeley Accord” (March 2008)

Page 14: Will We Command Our Data?  From the Petascale to the Personal

Specific Drivers: Personal Data

• Wired cover feature “Living by numbers” (July 2009)– “

Know Thyself: Tracking Every Facet of Life, from Sleep to Mood to Pain, 24/7/365”

– “Numbers are making their way into the smallest crevices of our lives. We have pedometers in the soles of our shoes and phones that can post our location as we move around town. We can tweet what we eat into a database and subscribe to Web services that track our finances. There are sites and programs for monitoring mood, pain, blood sugar, blood pressure, heart rate, … and prayers.”

Page 15: Will We Command Our Data?  From the Petascale to the Personal

Why Libraries

• Advocates• Exemplars• Experts

Page 16: Will We Command Our Data?  From the Petascale to the Personal

Research Data:DataCite

• http://www.datacite.org/• “DOIs for data”• “The long term vision of the partnership is to

support researchers by providing methods for them to locate, identify, and cite research datasets with confidence.”

Page 17: Will We Command Our Data?  From the Petascale to the Personal

Research Data: Gateway to Data Sets

• NRC-CISTI, Gateway to (Canadian) Scientific Data Sets

• http://cisti-icist.nrc-cnrc.gc.ca/eng/services/cisti/scientific-data/data-sets/– e.g. Canadian Astronomy Data Centre (CADC),

Large Synoptic Survey Telescope (LSST)

Page 18: Will We Command Our Data?  From the Petascale to the Personal

Government Data: Canada - Federal

• http://geogratis.cgdi.gc.ca/• StatsCan Data Liberation Initiative (DLI)• Ontario Data Documentation, Extraction Service and Infras

tructure Initiative (ODESI)– “The project will target Statistics Canada datasets...

The files will be marked-up using DDI, an international, XML-based metadata tagging system which allows data resource discovery, distributed access, extraction and analysis.”

Page 19: Will We Command Our Data?  From the Petascale to the Personal

Government Data: Municipal - Vancouver

• http://data.vancouver.ca/

Page 20: Will We Command Our Data?  From the Petascale to the Personal

Government Data:Municipal - SF

• San Francisco http://datasf.org/

Page 22: Will We Command Our Data?  From the Petascale to the Personal

APIs vs raw data

• APIs– Always serve up latest data– Control over access– Tracking/stats– Advanced/complex functionality on top of the data

• Raw data– Unconstrained / can do things never imagined by API– Hard to track / version– Can lose metadata– Allows choice of computing

Page 23: Will We Command Our Data?  From the Petascale to the Personal

Personal Data:Daytum

• http://www.daytum.com/

Page 24: Will We Command Our Data?  From the Petascale to the Personal

Personal Data:Total Recall

• http://totalrecallbook.com/ (Sept. 2009)

Page 25: Will We Command Our Data?  From the Petascale to the Personal

Richard Akerman

© 2009 Government of Canada

Licensed in the Creative Commons

Thank You

http://creativecommons.org/licenses/by-nc-sa/2.5/ca/