Welcome to Big Data Saravanan Subburayal Principal Architect
Jun 10, 2015
Welcome to Big Data
Saravanan SubburayalPrincipal Architect
Agenda• What is Big data?• Some BIG facts• Objective• Sources• 3 V’s of Big data• 3 + 1 V’s of Big data• Technologies• Opportunities• Major Players• Questions• Conclusion
What is Big data?
Data
Big Data
Size does matter
Not enough
What is Big data?
Data
Big Data
Some BIG facts• 90% of the data in the world today has been created in the
last two years alone• IDC Forecasting: The global universe of data will double
every two years, reaching 40,000 exabytes or 40 trillion GB by 2020
• The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data per year.
• Ancestry.com, the genealogy site, stores around 2.5 petabytes of data.
• The Internet Archive stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month.
Some BIG facts – What happens everyday?• The New York Stock Exchange generates about one
terabyte of new trade data• Zynga processes 1 Petabyte of content • 30 billion pieces of content were added to Facebook• 2 billion videos are watched in Youtube• 2.5 quintillion bytes of data is created
Some BIG facts – What happens every minute?
Courtesy: http://practicalanalytics.files.wordpress.com
Big data – Objective
Effectively store, manage and analyze all the data to create meaningful information out of it
Big data – Sources
Big data – 3 V’s of Big data
Courtesy: bigdatablog.emc.com
Big data – 3 + 1 V’s of Big data
Courtesy: http://www.datasciencecentral.com/
Big data - Volume
Courtesy: http://www.datasciencecentral.com/
Volumes are in:• Terabytes• Exabytes• Petabytes• Zetabytes
Big data - Volume
Courtesy: http://www.datasciencecentral.com/
Name Value
1 GB 1,073,741,824 bytes
1 Terabyte (TB) 1024 GB
1 Petabyte (PB) 1,048,576 GB
1 Exabyte (EB) 1,073,741,824 GB
1 Zeta byte (ZB) 1,099,511,627,776 GB
1 Yottabyte (YB) 1,125,899,906,842,624 GB
Big data - Velocity
Courtesy: http://www.datasciencecentral.com/
• Live Stream• Real time• Batch
Big data - Variety
Courtesy: http://www.datasciencecentral.com/
• Structured (Tables)• Unstructured (Tweets, SMSes)• Semi-structured (Logfiles,
RFID)
Big data - Veracity
Source: McKinsey, Gartner, Twitter, Cisco, EMC, SAS, IBM, MEPTEC, QAS
• This kind of data is often overlooked
• It is now considered as important as 3 V’s of Big Data
• Effort to clean up data is rather not given importance
• Poor data quality costs the U.S. economy around $3.1 trillions a year
Big data Technologies
Technologies & Solution providers:• Storage (MS SqlServer, Apache Hadoop, Mongo DB)• Processing (MapReduce, Impala)• Analytics (SAS, R, Business Intelligence)• Integration (Flume, Sqoop)
Big data - Opportunities
• Storage• Processing• Analytics• Integration• Solution
Big data – Major Players
Big data – Questions?
Big data – Thank you !!!