Federal Big Data Working Group Meetup Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Work ing_Group_Meetup March 18, 2014 1
Federal Big Data Working Group Meetup. Dr. Brand Niemann Director and Senior Data Scientist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Working_Group_Meetup March 18, 2014. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Federal Big Data Working Group Meetup
Dr. Brand NiemannDirector and Senior Data Scientist
Mission Statement• Federal: Supports the Federal Big Data Initiative, but not
endorsed by the Federal Government or its Agencies;• Big Data: Supports the Federal Digital Government Strategy which
is "treating all content as data", so big data = all your content;• Working Group: Data Science Teams composed of Federal
Government and Non-Federal Government experts producing big data products (How was the data collected, Where is it stored, and What are the results?); and
• Meetup: The world's largest network of local groups to revitalize local community and help people around the world self-organize like MOOCs (Massive Open On-line Classes) being considered by the White House.
• Dr. George Strawn, Director, NITRD/NCO and co-chair of the Federal Big Data Senior Steering Work Group:– Public access mandated for "scientific results" supported by
the U.S. government– Federal agencies have submitted their "initial plans" for
public access to scientific data to OSTP– Digital Object Architecture:
• An "hour glass" for data? (As the Internet was an hour glass for networks: TCP/IP at the narrow point; many applications above, many implementations below)
– One result will be to make the scientific record into a first class scientific object
Agenda• 6:30 p.m. Tutorials (Proposed GMU Course) and Refreshments
– Continue Data Science Tutorial: Class 4 and Graph Databases and Bigdata SYSTAP Literature Survey of Graph Databases
• 7:00 p.m. Introductions and Announcements (10 seconds per individual depending on the size of the group)
• 7:15 p.m. Featured Presentation/Demonstration (where did you get the data, where did you store the data, and what were your results?)– Bryan Thompson, Chief Scientist of SYSTAP, LLC will speak about their SYSTAP open source
graph database platform. Highlights will include support for highly available replication clusters as well their recent work with accelerated graph processing on GPUs at 3 billion traversed edges per second.
– See CSHALS 2014: Tech Talk and Poster in Wiki• 8:30 p.m. Networking/Individual Demos (talk among yourselves and look at one
another's work)• 9:00 p.m. Continue Your Conversations Elsewhere (We need to clear out of the
– Network Analytics and Visualization of Big Data Privacy Workshop Tweets, Dr. Marc A. Smith, Chief Social Scientist, Connected Action Consulting Group, and Remarks by the President on Review of Signals Intelligence, Dr. Kate Goodier, Information Architect, Xcelerate Solutions
• Seventh Meetup: April 15, 6:30 p.m.– DARPA Big Mechanism, Mike Megginson, Northrop Grumman, and Fredrik
Salvesen, YarcData (in planning)• Eighth Meetup: May 6, 6:30 p.m.
– Federating Big Data for Big Innovation, Dr. Jeanne Holm Data.gov Evangelist• Ninth Meetup: May 18, 6:30 p.m.
– The Science Behind Data Science, Ruhollah Farchtchi, Director of Big Data, UNISYS• 2nd Cloud, SOA, Semantics and Data Science Conference, June (in planning)
8
Overview• Practical Data Science for Data Scientists:
– 2/11 Specific Data Science Tools and Applications 1– Chapters 7 & 8
• Data Science for VIVO & Information Visualization MOOC (not time to cover):– 7 Weeks of Course Work with Sci2 Tools– Forming Teams to Work with Clients for Next 7 Weeks
• NodeXL and Sci2 for Data Science (not time to cover):– NodeXL: A free, open-source template for Microsoft® Excel® that
makes it easy to explore network graphs.– Sci2: A modular tool for science of science research & practice on
• Chapter 7:– How do companies extract meaning from the data they
have? In this chapter we hear from two people with very different approaches to that question—namely, William Cukierski from Kaggle and David Huffaker from Google.
• Chapter 8:– This is the most difficult chapter in the book for me to
teach since I do not understand the Python code at the end and have never built a Recommendation Engine myself. I would welcome some help here.
13
Present and Discuss Team Homework Exercise
• Get the Data: Go to Yahoo! Finance and download daily data from a stock that has at least eight years of data, making sure it goes from earlier to later. If you don’t know how to do it, Google it. – Yahoo: http://finance.yahoo.com/q/hp?s=%5EO...
torical+Prices (CSV)– See Spotfire Web Player and File
• DARPA wants to help the DoD get to the essence of cause and effect for cancer from reading the medical literature.
• The Federal Big Data Working Group Meetup has also been doing that with Semantic Medline - YarcData and Euretos BRAIN (Bio Relations and Intelligence Network). – See the video for Cancer Immunotheraphy (21 minutes) which
Science magazine called the biggest breakthrough in 2013 at the end of 2013 and which Dr. Tom Rindflesch (the inventor of Semantic Medline) identified from Semantic Medline as a very important breakthrough in early 2013!