Why Data Science is Something You Should Care About Presented @ South Dakota Code Camp 2012 Ryan Swanstrom @swgoof
Jan 15, 2015
Why Data Science is Something You Should Care About
Presented @ South Dakota Code Camp 2012
Ryan Swanstrom @swgoof
About Ryan Swanstrom
Find me on the web
http://twitter.com/swgoof
http://linkedin.com/in/ryanswanstrom
http://datascience101.wordpress.com/
Data Science
"[ability to] obtain, scrub, explore, model and interpret data, blending hacking, statistics, and machine learning."
definition by Hilary Mason, Chief Scientist @ Bit.ly
Data Science
http://www.drewconway.com/zia/?p=2378
Big Data
Any dataset where the size or speed of incoming data causes difficulties in processing
● Volume● Velocity● Variety
Hadoop
"[...] a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models."
Apache Hadoop Website
● HDFS - Hadoop Distributed File System● MapReduce
Why Do You Care?
McKinsey Global Big Data Report
● 140k - 190k Unfilled Jobs by 2018
● 1.5M Managers & Analysts
Indeed Data Science Job Listings
http://www.indeed.com/jobtrends?q=Data-science&relative=1
Now That You Care, What Skills?
1. Machine Learning2. Statistics3. Story Telling (Communication)4. Big Data5. Algorithms6. Curiosity
College and University
http://datascience101.wordpress.com/2012/04/09/colleges-with-data-science-degrees/http://whatsthebigdata.com/2012/08/09/graduate-programs-in-big-data-and-data-science/
College and University
Pros
● Credentials● Experts● Familiar● Widely Accepted● Structured
Cons
● Expensive● Not Individualized● School● Lengthy● Inflexible● Not Real World
General Assembly - Not really Corp Training, but it looks really good
Corporate Training
Corporate Training
Pros
● Short Timeframe● Experts● Certificates● Business-Savy● Real World● Structured
Cons
● Expensive● Not Individualized● Product Focused● Sales Pitch
MOOCs (Massive Open Online Courses)
MOOCs (Massive Open Online Courses)
Pros
● Free● Experts● Flexible
Cons
● No Credentials● Single Course● No Programs (Yet)
Blogs/Wikis/Other
Blogs/Wikis/Other
Pros
● Free● Very Specific● Short● Lots of them
Cons
● Quality?● No Credentials● No Structure● Too many!
Blogs/Wikis/Other
The Problem
● What content is good?
● What order should I cover the content?
● Where do I find new content?
● Who can help me understand?
Data Science 201 - coming soon
http://www.datascience201.comHelping you find the best
data science learning content!
Thank You