Open DATA METI: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ March 15, 2013 http://semanticommunity.info/A_Japan_METI_Open_Data_Dashboard/Open_DATA_METI 1
Open DATA METI: All Content As Big Data. Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community http://semanticommunity.info/ AOL Government Blogger http://gov.aol.com/bloggers/brand-niemann/ March 15, 2013 - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Open DATA METI:All Content As Big Data
Dr. Brand NiemannDirector and Senior Enterprise Architect – Data Scientist
Semantic Communityhttp://semanticommunity.info/
AOL Government Bloggerhttp://gov.aol.com/bloggers/brand-niemann/
March 15, 2013http://semanticommunity.info/A_Japan_METI_Open_Data_Dashboard/Open_DATA_METI
– Does this deal with the data elements themselves in the data sets, so you can search for data elements that you want to integrate with other data elements and find their definitions (metadata) to know if they are the same or similar enough to be semantically integrated?
• Answer from John Erickson, Director, Web Science Operations, Tetherless World Constellation (RPI):– No. DCAT deals with the initial problems of where dataset catalogs and datasets
themselves are from and what they contain. Loosely speaking, it does for catalogs and datasets what Dublin Core did for publications: it provides a succinct vocabulary that providers can rely on for describing their datasets, and consumers can rely on for finding. DCAT has already been used as the basis for the schema.org "datasets" extension as a way to make discovery of datasets easier using popular search engines.
– Articulating the actual vocabularies used in published datasets is waaaay beyond the scope of DCAT, in part because DCAT is not restricted to datasets published as linked data. Some work including http://healthdata.tw.rpi.edu are looking at ways to communicate standard vocabularies used in published linked data...
All the work with Data Catalogs does not really help with data integration.
"The data warehouse does what it does well and is not going to go anywhere. But it is not architected very well for the future. Our job, as IT, revolves entirely around one thing -- data integration”.
– Introduced as OMB Chief of Data Analytics & Reporting at the Big Data Technology Symposium, March 13, 2013.
– Said “new Digital Government Strategy is treating all content as data.“– Dominic Sale joined OMB’s Office of E-Government and Information
Technology in 2008 as a portfolio manager for several government-wide IT initiatives. At OMB, Dominic played a lead role in implementing and operating major initiatives such as the IT Dashboard, and he is currently heavily involved in implementing the Federal CIO’s 25-Point IT Management Reforms. Prior to arriving at OMB, Dominic began his Federal career as a program analyst in the OCIO at the Department of Transportation. In his prior life as a contractor at both BAE Systems and BearingPoint, Dominic managed EA, capital planning and security initiatives at DOL, NLRB, FDA, and Census. He has also worked on a variety of federal programs, at agencies such as the IRS, US Postal Service, US Mint, US Patent and Trademark Office, and the National Park Service.
– All the work with Data Catalogs does not really help with data integration.– Big Data Spells New Architecture.– Big Data is the new software.– New Digital Government Strategy is treating all content as data.
• The Open DATA METI Data Catalog has been turned into data in spreadsheets and statistical visualizations in Spotfire.
• This simplifies the complex WordPress & CKAN interface which requires lots of extra mouse clicks and provides no faceted search.
• Google Chrome provides Japanese language translation of the metadata, but not of the data columns in the spreadsheets.
• This process provides the beginning of a Unified Data Architecture and Ecosystem for Data Integration using the View Data function in Spotfire 5.