Introduction to Big Data An analogy between Sugar Cane & Big Data Jean-Marc Desvaux – March 2012 Image Source: MicFarris.com age Source: alternative-energy-fuels.com
Dec 20, 2014
Introduction to Big DataAn analogy between Sugar Cane & Big Data
Jean-Marc Desvaux – March 2012
Image Source: MicFarris.comImage Source: alternative-energy-fuels.com
Session Abstract :
What is Big Data ? Where does it apply ?What are the technologies behind it ?Is it going to replace your RDBMS ? …
Big data, It’s all Silicon Valley is talking about. It’s the new buzz word after ‘cloud.’
“Everybody is speaking of it and many are convinced it is the only way forward. As always, such dramatic statements are not only dangerous but serve to put some people off the concept. “
Source: Tom Kyte’s Big Data Are you ready ? presentation
What is Big Data ?
Big Data is data that exceeds the processing capacity of conventional database systems.
It’s too big, too fast or does not fit the structures of database architectures.To gain value from this type of data you need an alternative way to process it.
Why this is happening ?Data is growing faster than computers are getting bigger.
A catch-all term.Includes Social Networks data, Web logs, MP3s, Web pages unstructured content, XML, GPS tracking data, Vehicles Telemetry, financial market data and many more…
Can be characterized by the 3 Vs :-
Image Source: Tom Kyte’s Big Data Are you ready ? presentation
VolumeData growing faster than machines getting bigger. Data sources adding up..
VelocityRate of acquisition and desired rate of consumption.
VarietyExtends beyond structured data, includes unstructured data of all varieties.
Image Source: Tom Kyte’s Big Data Are you ready ? presentation
Where does Big Data apply?
Big Data value to an Organisation falls into two main categories :
Analytical Use
Enabling new products and services
Analytical Use
To reveal insights previously hidden because hard to record and exploit.
An edge on classic Analytics based on sampling and more “static” & predetermined reports.
It promotes an investigative approach to data and put the data scientist and analyst in the spotlight.
Hal Varian, chief economist at Google“I keep saying that the sexy job in the next 10 years will be statisticians”
Some terms linked to the Analytical Use of Big Data
Sentiment Analysis :Mining the Web in real time and getting a quick read of what people are thinking.
Named-entity recognition (NER) (also known as entity identification and entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.(ex: Big B in a tweet is for Big
Brother or Amitabh Bachan)
Product/Service Enabler
Some products and services cannot exist if not backed up by Big Data technologies:-Need to Scale-Need a fast Feedback Loop on complex analytics.
Highly successful Web startups pioneering Big Data technologies through R&D to enable new type of products are a good example:Google, Yahoo, Amazon,Facebook.
Sectors with Fast Adoption and High Potential
Financial SectorTelecommunications
GovernmentHealthRetail
Big Data Sources :Internal & Data Marketplaces.
Internal sources
Time Attendance logsRFID sensors logs
Security LogsVehicles GPS tracking
Machinery/Telemetry LogsPictures & videos
Enterprise Social NetworksService Forum/Discussions
….
Mostly anything unstructured or simply structured
Source: DataSift.com
External Sources (feeders/data marketplaces)Examples: Infochimps.com, DataSift.com, datamarket.azure.com
An Enterprise Architecture for Big Data
An analogy with a Sugar Cane Factory
AQUIRE (HARVEST)
EXTRACT/SCHRED
EVAPORATE/DISTILL/BOIL
DRY/STORE/SUGAR
A Sugar Factory
= VALUEBOTTOM LINE
SUGAR CANE FIELDS
An Enterprise Big Data Factory
AQUIRE (HARVEST)
ORGANIZE(EXTRACT)
ANALYSE (SCHRED/DISTILL/BOIL)
BUSINESS INTELLIGENCE
(DECIDE)
= VALUEBOTTOM LINE
DATA SOURCES(RDBMS &
Data Marketplaces)
HDFS(Hadoop Distributed FS)
NoSQL Database(Hadoop Distributed FS)
RDBMSEnterprise Applications
Map Reduce(Hadoop)
Big DataConnectors
RDBMSConnectors
Data Warehousing / RDBMS stores
Analytic Applicationsthe sweet part (sugar/rhum)
Some Factories & architectures from vendors
Greenplum (EMC2)An Example of a Turnkey Factory Solution
Another “Turnkey Factory” Example from OracleTargeting high-end Analytics
AQUIRE (HARVEST)
ORGANIZE(EXTRACT)
ORGANIZE(EXTRACT)ANALYSE
(SCHRED/DISTILL/BOIL)
BUSINESS INTELLIGENCE
(DECIDE)
Image Source: Tom Kyte’s Big Data Are you ready ? presentation
+ Of Course, you can build your own factory using OpenSource widely available and on which most
turnkey factory are built.
The Microsoft way
Technologies behind Big Data
Factory blocks & screws used for engineering solutions
NoSQL will kill SQL ?!
Turning RDBMS to a legacy data store ?
Not at all.
We need RDBMS to store high value data and for its feature rich approach (feature first).
NoSQL (scale first) is not a superset of RDBMS technologies (a bit like Einstein Relativity to Newton Physics).
Remember NoSQL is not “No SQL” but “Not Only SQL”
Big Data future
Rise of Data Marketplaces
Data Science tools development:More powerful & expressive toolsets for analysis
Streaming Data processing emerging tools(Twitter Storm, Yahoo s4, Streambase) :Real-time enablement / Live BI
Further cloud-enablement
Ease of integration to Enterprise Sources
Conclusion
To leverage Big Data you need something like a Sugar Factory.It can be very entry level factory (Excel – Azure Source)or more complex. The more complex and complete the more value at the end of the processing chain
To turn Big Data technologies from developer-centric solutions to enterprise solutions, they must be combined with SQL solutions into a single proven infrastructure meeting manageability and security requirements of enterprises.
The challenge for Enterprises is to simplify Big Data integration/engineering and leverage it where possible to improve their processes at tactical and strategic levels.
Architects & DBAs will be able to make choices for datastores technologies and will need to understand where one is better than the other.
Big Data has to be part of the Enterprise Applications EcoSystem where it will be turned to value.
Thank you.