Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Work ing_Group_Meetup September 8, 2014 1
14
Embed
Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Federal Big Data Working Group Meetup
Dr. Brand NiemannDirector and Senior Data Scientist
Mission Statement• Federal: Supports the Federal Big Data Initiative, but not
endorsed by the Federal Government or its Agencies;• Big Data: Supports the Federal Digital Government
Strategy which is "treating all content as data", so big data = all your content;
• Working Group: Data Science Teams composed of Federal Government and Non-Federal Government experts producing big data products (How was the data collected, Where is it stored, What are the results, and Does the data story persuade?); and
• Meetup: The world's largest network of local groups to revitalize local community and help people around the world self-organize like MOOCs (Massive Open On-line Classes) being considered by the White House to reduce the cost of higher education.Co-organizers: Brand Niemann and Katherine Goodier
3
What Are We Doing?• Leadership of the Semantic Data Science Team that produced Semantic Medline
running on the Yarc Data Graph Appliance.• Founding and co-organizing of the Federal Big Data Working Group Meetup.• A graduate class prepared for GMU entitled “Practical Data Science for Data
Scientists”.• Using the Cross Industry Standard Process for Data Mining (CRISP-DM; Shearer,
2000) to build a Data Science Knowledge Base• Mining of the Data Science and Digital Earth scientific journals for the CODATA
International Workshop on Big Data for International Scientific Programmes, June 8-9, in Beijing.
• Participation in the Data FAIRport (Findable, Accessible, Interoperable, and Reusable) with “Data Publication in Data Browsers”.
• Providing data stories that persuade and presentation materials for public education conferences like the COM.BigData Conference, August 4-6, in Washington, DC.
4
How Are we Doing it?• Federating Uses Cases: Data Science (Brand Niemann); Environmental
and Earth Science (Joan Aron); and Astronomy (Kirk Borne)• Federating Data Publications: Structured Scientific Content (Papers,
journals, books, reports, etc.); Data FAIRports (Findable, Accessible, Interoperable); and Reusable Data Stories That Persuade (Claims and Evidence)
• Federating Solutions & Technologies: Hand-Crafted by Individuals and Teams (Mary Galvin, STEM); Data Mining Standards and Products (Brand Niemann, Data Publications in Data Browsers); Machine Processing (Fredrik Salvesen, Semantic Data Publications on Yarc Data Graph Appliance); Reading and Reasoning (Katherine Goodier and Chuck Rehberg (Semantic Insights on Elsevier Content Text Mining); and Data Curation at Scale (Alan Wagner, Tamr on 1000s of Spreadsheets)
NIH Data CommonsDr. Phil Bourne (7/30/2014): Rules, Credit/Not Money, & More Offline
http://semanticommunity.info/Data_Science/Data_Science_for_RDA#Slide_50_The_Power_of_the_CommonsMy Note: Registries, Repositories, Clearinghouses, Portals, GitHubs, Data Commons, & Data FAIRports to MindTouch and Spotfire
• The Fourth Paradigm of Science (1):– First Paradigm. Observation, descriptions of natural phenomena, and
experimentation.– Second Paradigm. Theoretical science such as Newton’s laws of motion
and Maxwell’s equations.– Third Paradigm. Simulation and modelling, such as in astronomy.– Fourth Paradigm. Data-intensive science that exploits the large volumes of
data in new ways for scientific exploration, such as the International Virtual Observatory Alliance in astronomy.
• The Fourth Question of Big Data for Science (2):– How was the data collected?– Where is the data stored?– What are the data results?– Does the data story persuade?(1) Bell G, Hey, T., & Szalay, A. (2009) Beyond the data deluge, Science 323, 6 March 2009, pp. 1297-1298.
(2) de Waard, Anita, (2014) About Stories, that Persuade With Data, Federal Big Data Working Group Meetup, 20 May,, 41 slides.
8
August 11th SilverLine Metro More Ontology Experts (Baclawski, Guerino, Morosoff, & Goodier)
• How Was the Meetup?– If you could perfect your meeting A/V, it would be even more awesome!
Nevertheless, lots on enterprise ontology on the way for archivists everywhere, including LOC, SharePoint experts, etc.
– Another good meeting with more pieces of the Big Data puzzle being placed on the table and related to each other.
– Fuse was a bust, but the presentations were great! Thanks Brand and Katherine!
– Brand was a Star at the Comstar conference! • We Listen and Respond:
– We are involved in two research collaborations (one with Columbia Univ. and the other with Harvard) that are investigating the use of Semantic MEDLINE and literature-based discovery to elucidate statistical correlations found in the EHR. –Tom Rindfleschhttp://www.meetup.com/Federal-Big-Data-Working-Group/events/199040882/
Current Activities• Ongoing analytics with OpenFDA data for Dr. Taha Kass-Hout, FDA’s first Chief Health
Informatics Officer (CHIO), August 11:– Keynote at AFCEA Bethesda’s Health IT Day, December 2, Bethesda North Marriott Hotel and
Conference Center.• Followup Meeting with Bob Chadduck and Fouad Ramia on September 8 th Joint Meetup with
NSF and Professor Alex Szalay on Joint Meetup on The JHU DIBBs Project, August 19 th:– Big Data Science for Astronomy use case (ontology, graph computing and SciDB.org) with Professor
Borne.• Followup Meeting with Professor Jens Pohl and Peter Morosoff, August 20:
– Your pioneering work in trying to convince various branches of the Government to gain control of their data and exploit the information that can be extracted from the data is quite remarkable and certainly inspirational.
• The Federal Trade Commission, Big Data: A Tool for Inclusion or Exclusion?, September 15, Washington, DC:– This workshop is free and open to the public. Registration will begin at 8:00 a.m. A live webcast of
the workshop will also be available on the day of the event. The submission deadline for pre-workshop comments is August 15, 2014, but the comment period will be held open until October 15, 2014.
• 2014 IEEE International Conference on Big Data, October 27-30, Washington DC.:– Submitted Paper and NIST Workshop Proposals.
10
DGI’s Annual Big Data Conference, October 9, Washington, DC Reagan Building
• Session title: Challenges and Solutions for Big Data in the Public Sector
• Moderator: Dr. Brand Niemann, Director and Senior Data Scientist, Semantic Community, and Co-organizer, Federal Big Data Working Group Meetup
• Panelists:– Dr. Kirk Borne, Professor of Astrophysics and Computational
Science, George Mason University– Dr. Tom Rindflesch, Information Research Specialist at
Cognitive Science Branch, National Institutes for Health (NIH)http://www.digitalgovernment.com/Events/Conferences/Government-Big-Data-Conference--Expo.shtml
• October 6: Wolfram Language (Invited) and Michael Daconta, Build a Knowledge Base with the my (experimental) software EzKb
• November 3: Georgetown Massive Data Institute (Invited)
• December 1: NSF GEO/EarthCube and ICER (Integrative and Collaborative Education & Research)
12
FASTER• Faster Administration of Science and Technology Education and Research
(FASTER) Community of Practice (CoP):– FASTER’s goal is to enhance collaboration and accelerate agencies’ adoption of
advanced IT capabilities developed by Government-sponsored IT research. FASTER hosts Expedition and Emerging Technology workshops as well as monthly meetings with invited guest speakers to achieve this goal.
– NITRD created FASTER for Federal agency CIOs and/or their advanced technology specialists. FASTER, seeks to accelerate deployment of promising research technologies; share protocol information, standards, and best practices; and coordinate and disseminate technology assessment and testbed results. The Federal CIO Council under the leadership of the Office of Management and Budget (OMB) coordinates the use of IT systems. NITRD coordinates federally supported IT research under the leadership of OSTP (with OMB participation). FASTER, supported by the NITRD NCO, communicates with OMB and the Federal CIO Council concerning IT R&D matters that are of general interest to Federal agencies.
– FASTER is responding to the Open Government Directive by using the technologies of the Social Data Web (e.g., Linked Open Data and the Semantic Web).