FDA Data Innovation Lab and Predictive Analytics Meetup Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ http://semanticommunity.info/Data_Science/Federal_Big_Data_Work ing_Group_Meetup October 6, 2014 1
32
Embed
FDA Data Innovation Lab and Predictive Analytics Meetup
FDA Data Innovation Lab and Predictive Analytics Meetup. Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community http://semanticommunity.info/ http://www.meetup.com/Federal-Big-Data-Working-Group/ - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
FDA Data Innovation Lab and Predictive Analytics Meetup
Dr. Brand NiemannDirector and Senior Data Scientist/Data Journalist
Agenda• 6:30 p.m. Welcome and Introduction – Report on Recent Meeting
with Dr. Taha Kass-Hout, FDA’s First Chief Health Informatics Officer (CHIO) and FDA Data Science Data Publication Tutorial:– Interest in our Meetup on OpenFDA, July 7th – Keynote at AFCEA Bethesda’s Health IT Day, December 2nd
• 7:00 p.m. Brooke Aker, Big Data Lens, Predictive Analytics for OpenFDA and Other Examples
• 7:45 p.m. Brief Member Introductions and Inter-American Development Bank Open Data Portal and FDA Examples
OpenFDA• OpenFDA, a new initiative to provide unprecedented access to FDA data
and highlight projects in the public and private sector that use these data to further scientific research, educate the public, and save lives.
• OpenFDA is an initiative of FDA’s Office of Informatics and Technology Innovation to provide a new level of access to a number of public high-value FDA datasets via RESTful APIs and structured raw file download. Currently, the project is in an early-development stage, with an alpha release of two datasets planned for spring 2014 and a larger public release later in the year. Additionally, openFDA will provide a platform for the community to interact with each other and FDA domain experts with the goal of spurring innovation around FDA data and creating new partnerships and opportunities between the public and private sector (BOLDING BY ME).– Presidential Innovation Fellow: Sean Herron is a Presidential Innovation Fellow
serving at FDA [email protected] | @seanherronhttp://www.hhs.gov/idealab/innovate/openfda/
OpenFDA History• OpenFDA is the first innovation created by Taha Kass-Hout, MD, MS, upon joining FDA as the
first Chief Health Information Officer in March 2013.• Dr. Kass-Hout launched the project by obtaining a Presidential Innovation Fellow to focus on
policy and programmatic issues in July 2013.• In August 2013, a research and development contract was awarded to Iodine, Inc. to build the
site.• The public cloud environment was determined in September 2013, and Dr. Kass-Hout’s team
solicited agency and user input into policies, first priority datasets, and desirable technical characteristics of openFDA.
• In December 2013 FDA established the Office of Informatics and Technology Innovation (OITI) under Dr. Kass-Hout’s leadership.
• OpenFDA launched in Beta mode on June 2, 2014.• By September 2014, medical device reports, enforcement reports, and drug adverse event
reports were available.• There were over 4.5 million data calls, over 40,000 visitors to openFDA from all over the world,
dozens of press articles, and several websites that use openFDA in their own public offerings.• During Fiscal Year 2015, additional datasets and harmonization will be added.
FDA's Path Forward for Open Data and Next Generation Sequencing
• Utility NGS (Next Generation Sequencing) in the Internet cloud: FDA is facing growing NGS needs for processing internal genome sequencing data as well as the NGS data from industry submissions. The NGS initiative is planning and developing a cloud-base Big Data platform and analytics for robust, secure and controlled data storage, analysis, and collaboration and potentially sharing public-access genome sequencing information.
• NGS is a Big Data Initiative.https://open.fda.gov/update/fda-path-forward-for-open-data-and-next-generation-sequencing/
• 1. Start with the examples• 2. Know the limitations• 3. Know why the data is sometimes messy• 4. Make sure you check out the reference• 5. Learn the Lucene query syntax• 6. Don’t forget about count• 7. Use the openfda fields!• 8. Use .exact to count for phrases• 9. Beware of null values• 10. Watch for changes
– We’ll be adding additional data to this endpoint whenever a new Quarterly Data File is posted.• My Note: Bulk data downloads I used!https://open.fda.gov/update/ten-things-to-know-about-adverse-events/
Data Science Data Publications forBig Data Analytics
• New Government Data Science Best Practices:– Digital Government Strategy– Open Research Data Policy– Agency: HHS IdeaLab, NIH Data Commons, FDA Innovation Lab– White House NITRD Big Data Initiative and NSF Agency Strategic
Plan: Data Science, Data Infrastructure, and Data Publications• New Government Data Science Publication Examples:
– Federal Data Center Consolidation 2014– Performance.gov– FDA Data and FDA Data Innovation Lab– National Science Board Science & Engineering Indicators
11
Data Science Data Publication for Federal Data Center Consolidation 2014: Data Journalism
• In 2011 and 2012, I published three stories on the Federal Data Consolidation Initiative because of the poor quality and incompleteness of the data. It was one of the first non-federal applications of analytics I did after leaving government service. I decided to revisit the data for this and was please to find that the quality and completeness had improved considerably and so I decided to import the new spreadsheet into Spotfire and explore the results in multiple dynamically linked adjacent visualizations.
• Of the 3,665 data centers in the data set now, only 976 have been closed since the beginning of the program and 2,689 are yet to be closed in 2014-2015! The vast majority of these (2,254) belong to the Department of Agriculture.
Data Science Data Publications for FDA:Data Science Data Mining Process
• Recall OpenFDA Knowledge Base for previous visualization and analytics:– Brooke Aker, Biplab Pal, and Brand Niemann.
• Mined HealthData.gov for FDA data and built linked data spreadsheets (17) for Spotfire:– See next slides.
• Mined FDA Site Map for data:– Found Two: Data Standards and FDA Drug Approvals & Databases.– Downloaded and inventoried files (41) (ZIP, CSV & XLS) for Spotfire.– Used for FDA Data Innovation Lab Visualization Gallery.
FDA Data Innovation Lab Visualization Gallery:File Folder
My Note: Some folders contain multiple files!
25
Suggestions
• Help the FDA Data Innovation Lab with data publication gallery and wall posters.
• Help the FDA Data Innovation Lab with their Open Data Lab Day.
• Organize Joint Meetups and promote use of the FDA Data Innovation Lab.
• Help form Data Science Teams to work on FDA big data problems.
26
Open Data Portal for the Inter-American Development Bank: Comments
• Another good meeting last night.• Thank you for organizing this meetup, very helpful! Special thanks
to Brand for all the info you shared. I'm looking forward to future ones!
• Terrific and innovative data visualizations can make a big impact indeed.
• This week was very good - exposure to interesting beta products (Semantic Insights, this week) as well as new approaches to visualization techniques are always things to which I look forward. When I get to see an illustration of the concept of "cognitive load" in visualizations the way it was shown in this session (with Sankey diagrams), it makes it an even better session. Great stuff! And I get to play around with a new data set - even better!
Open Data Portal for the Inter-American Development Bank: Annette Hester
• Thanks for hosting me last night. It was a pleasure to share ideas with such a knowledgeable group.
• We would be delighted if you or any in your group took time to understand the database and compare it to traditional graphs and other visualizations. As I mentioned, the easiest way to do so would be using the first data graph, Energy Flows (http://www.iadb.org/eic/database). It is a Sankey Graph with a twist. You can find similar products at:– http://www.iea.org/Sankey/– https://flowcharts.llnl.gov/– www.energyliteracy.com– http://www.sankey-diagrams.com/tag/ghg/
• And if you google energy flow charts you will find quite a variety. • The more I look at energy data and what we have published, the better I feel
about our database. I look forward to the results of your investigation. Please do keep in touch… and do feel free to post this note on the meetup website.
Inter-American Development Bank Open Data Portal Examples, Etc.
• Please post your interest in providing a visualization example(s) and explanations to our Meetup site
• Also feel free to use the FDA data or any other data you are working with in visualizations and explanations.– NSB Science & Engineering Indicators– FDA Data Innovation Lab Visualization Gallery