Top Banner
Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Work ing_Group_Meetup November 4, 2014 1
22

Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

Dec 27, 2015

Download

Documents

Marilyn Brown
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

1

Data Science Publication for NSF Polar Cyberinfrastructure

Dr. Brand NiemannDirector and Senior Data Scientist/Data Journalist

Semantic Communityhttp://semanticommunity.info/

http://www.meetup.com/Federal-Big-Data-Working-Group/http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup

November 4, 2014

Page 2: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

2

Preface

• Some prep work is already underway (if you scour the Open Science Codefest site you will find some) to prepare some datasets of relevance to the Polar community. We will provide some of this prepared data to interested parties ahead of the workshop in the next few weeks in case folks want to start hacking early. We will tweet under the hash tag: #nsfpolardatavis

Source: Chris Mattmann

Page 3: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

3

Overview• Build the knowledge base (in MindTouch) and spreadsheet (in Excel) first,

which then makes the Spotfire (data browser) application easier to “storify” the results.

• Follow the Cross-Industry Data Mining Standard by:– 1 Business Understanding (of the Hackathon),– 2 Data Understanding (by mining the Sessions),– 3 Data Preparation (by screen scraping and downloading),– 4 Modeling (enough data for statistical significance?),– 5 Evaluation (How collected?, Where stored?, What results?, and Believe them?;

and– 6 Deployment (Story and Demo).

• The documentation will be in the form of the Data Science Publication for NSF Polar Cyberinfrastructure.

• My goal is to see if I can integrate and federate these multiple data sources.

Page 4: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

4

Data Science for Business:Data Mining Process

Source: Data Science for Business: Chapter 2, 2014

Page 5: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

5

Data Science for NSF Polar Cyberinfrastructure: Knowledge Base

Data Science for NSF Polar Cyberinfrastructure

Page 6: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

6

Possible Data Sets

• Scour the Open Science Codefest:– https://github.com/NCEAS/open-science-codefest/issues/26

• The Polar Data Catalogue (YES):– https://polardata.ca/

• BCO-DMO (YES):– http://www.bco-dmo.org/

• Polar Hub (NO):– http://polar.geodacenter.org/polarhub/

• The AMRC at University of Wisconsin-Madison (YES):– ftp://amrc.ssec.wisc.edu/pub/requests/DVPC/

Page 7: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

7

Open Science Codefest:NASA/NSF/NSIDC Data Sets

• NASA Antarctic Master Directory– A master directory for arctic data sets

• http://gcmd.gsfc.nasa.gov/KeywordSearch/Keywords.do?Portal=amd&KeywordPath=Parameters|CRYOSPHERE&MetadataType=0&lbnode=mdlb2

• NSF ACADIS Gateway– NSF data repository for arctic/polar data

• https://www.aoncadis.org/home.htm

• NSIDC Arctic Data Explorer– National Snow and Ice Data Center repository

• http://nsidc.org/acadis/search/ Source: Link to presentation given at Open Science Codefest

Page 8: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

8

Polar Data Catalogue: Home Page

https://polardata.ca/

Page 9: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

9

Polar Data Catalogue: Collections

https://polardata.ca/pdcsearch/

Page 10: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

10

Polar Data Catalogue: Search

https://polardata.ca/pdcsearch/

Page 11: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

11

Polar Data Catalogue:Canadian Lake Ice Database

https://polardata.ca/pdcsearch/PDC_Metadata_Data_Download.ccin?action=downloadPDCData&ccin_ref_number=1821&fileLoc=/pdc/ccin/1821/lakeice/CCIN1821_20030925_CID_BDCG_Ver2003_1.zip

11 MB MDB

Page 12: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

12

Polar Data Catalogue:Sea Ice Thickness in Southern Beaufort Sea

https://polardata.ca/pdcsearch/PDC_Metadata_Data_Download.ccin?action=downloadPDCData&ccin_ref_number=11470&fileLoc=/pdc/brea/11470/CCIN11470_20130827_GIS_DATA_Sea_Ice_Thickness_2012.zip

Downloaded 5 Files4 Text and 1 ZIP (Shape) 1.3 MB

Page 13: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

13

Polar Data Catalogue: Spreadsheet

http://semanticommunity.info/@api/deki/files/31201/NSFPolarCI.xlsx?origin=mt-web

Canadian Lake Ice DatabaseSea Ice Thickness in Southern Beaufort Sea

Page 14: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

14

BCO-DMO

http://www.bco-dmo.org/

Tutorial PDF

Page 15: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

15

Data Access Tutorial 2014 OCB PI Summer Workshop

• How to Submit Data• Data access: TEXT-BASED SEARCH scenario 1:

– You have a general idea of what you are looking for.• Data access: MAP BROWSE scenario 2:

– You are interested in data from a particular geographic region.• Data access: MAP KEYWORD SEARCH scenario 3:

– You are interested in data of a particular type from a particular geographic area.• Data access: MAP SEMANTIC SEARCH scenario 4:

– You have an idea what you are looking for, but you do not know the Program, Project, or Deployment name.

• Glossary of Terms• Acknowledgments• Follow BCO-DMO

http://www.bco-dmo.org/files/bcodmo/OCB-Tutorial.pdf

My Question: Could Spotfire do all of this?

Page 16: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

16

BCO-DMO Datasets

http://www.bco-dmo.org/datasets

Page 17: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

17

BCO-DMO MapServer Geospatial Interface

http://mapservice.bco-dmo.org/mapserver/maps-ol/index.php

Page 18: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

18

Polar Hub:A Global Hub for Polar Data Discovery

http://polar.geodacenter.org/polarhub/

My Question: Where is the Data?

Page 19: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

19

The AMRC at University of Wisconsin-Madison

Name Size Date Modified[parent directory]Ant_IR_area/ 9/30/14, 8:18:00 PMAnt_IR_netCDF/ 9/30/14, 8:37:00 PMAWS_dat_MAY_2014/ Text Files: Spotfire? 9/30/14, 3:23:00 PMAWS_q10_MAY_2014/ Text Files: Spotfire? 9/30/14, 3:24:00 PMAWS_q1h_MAY_2014/ Text Files: Spotfire? 9/30/14, 3:25:00 PMAWS_q3h_MAY_2014/ Text Files: Spotfire? 9/30/14, 3:27:00 PMAWS_r_MAY_2014/ Text Files: Spotfire? 9/30/14, 3:22:00 PMreadme.txt 4.6 kB 10/6/14, 6:37:00 PM

Index of /pub/requests/DVPC/

ftp://amrc.ssec.wisc.edu/pub/requests/DVPC/

Page 20: Data Science Publication for NSF Polar Cyberinfrastructure Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

20

Data Science for NSF Polar Cyberinfrastructure: Spreadsheet Knowledge Base

http://semanticommunity.info/@api/deki/files/31201/NSFPolarCI.xlsx?origin=mt-web