Big Data & Data Mining WELCOME TO OUR PRESENTATION Submitted By Supervise By
Big Data & Data Mining
WELCOME TO OUR PRESENTATION
Submitted By Supervise By
CONTENTS
Problem DefinitionPurposeWhat is ….Challenges with dataBig data algorithms How To Produce The Big Data Big Data CharacteristicsApplications of Data MiningFILD OF BIG DATAVariety (Complexity) Real-time/Fast DataReal-Time Analytics/Decision RequirementA Single View to the CustomerWhat’s driving Big Data Benefits
Big Data consists of huge modules, difficult, growing data sets with numerous and , independent sources. With the fast development of networking, storage of data, and the data gathering capacity, Big Data are now quickly increasing in all science and engineering domains, as well as animal, genetic and biomedical sciences. This paper elaborates a HACE theorem that states the characteristics of the Big Data revolution, and proposes a Big Data processing model from the data mining view.
Problem Definition:
This requires carefully designed algorithms to analyze model correlations between distributed sites, and fuse decisions from multiple sources to gain a best model out of the Big Data. Developing a safe and sound information sharing protocol is a major challenge. To support Big Data mining, high-performance computing platforms are required, which impose systematic designs to unleash the full power of the Big Data. Big data as an emerging trend and the need for Big data mining is rising in all science and engineering domains.
Purpose:
What is …… ?
Data Mining
computational process of discovering patterns in large data sets
Big Data
Big data is the data characterized by 3 attributes: volume, variety and velocity.”
it is the term for a collection of data sets so large and complex that it becomes difficult to process
data has exponential growth, both structured and unstructured
Data: data is any set of characters that has been gathered and translated for some purpose, usually analysis. It can be any character, including text and numbers, pictures, sound, or video. If data is not put into context, it doesn't do anything to a human or computer.
How much Data does exist?• 2.5 quintillion bytes of data are created
EVERY DAY • IBM: 90 percent of the data in the world
today were produced with past two years
• Forms of Data????
Data Mining Challenges with Big Data• Big Data Mining Platform
• Dig Data Semantics and Application Knowledge
I. Information Sharing and Data Privacy
II. Domain and Application Knowledge
• Big Data Mining Algorithm
I. Local Learning and Model Fusion for Multiple Information Sources
II. mining from Sparse, Uncertain, and Incomplete Data
III. Mining Complex and Dynamic Data
Data Mining Challenges With Big Data
Data Mining Algorithm Decision tree induction classification
algorithms Evolutionary based classification algorithms Partitioning based clustering algorithms
Hierarchical based clustering algorithms Hierarchical
based clustering algorithms Hierarchical based
clustering algorithms Model based clustering algorithms
How To Produce The Big Data
Big Data Types
Enterprise Data
TransactionsPublic Data
Social Media
SensorData
11
Big Data CharacteristicsData has grown
tremendously.Big Data starts
with large-volume, heterogeneous, autonomous sources with distributed and decentralized system
Applications of Data Mining Marketing
Analysis of consumer behavior Advertising campaigns Targeted mailingsFinanceo Creditworthiness of clients o Performance analysis of finance investmentsManufacturingo Optimization of resources o Optimization of manufacturing processes
FILD OF BIG DATA
15
Variety (Complexity) Relational Data (Tables/Transaction/Legacy
Data)Text Data (Web)Semi-structured Data (XML) Graph Data
Social Network, Semantic Web (RDF), …
Streaming Data You can only scan the data once
A single application can be generating/collecting many types of data
Big Public Data (online, weather, finance, etc)
To extract knowledge all these types of data need to linked
together
16
Real-time/Fast Data
The progress and innovation is no longer hindered by the ability to collect data
But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion
Social media and networks(all of us are generating data)
Scientific instruments(collecting all sorts of data)
Mobile devices (tracking all objects all the time)
Sensor technology and networks(measuring all kinds of data)
Real-Time Analytics/Decision Requirement
Customer
InfluenceBehavior
Product Recommendations that are Relevant
& Compelling
Friend Invitations to join a
Game or Activitythat expands
business
Preventing Fraud as it is Occurring & preventing more
proactively
Learning why Customers Switch to competitors
and their offers; in time to Counter
Improving theMarketing
Effectiveness of a Promotion while it
is still in Play
A Single View to the Customer
Customer
Social Media
Gaming
Entertain
Banking
Finance
OurKnow
nHistor
y
Purchase
5 Vs of Big DataVolum
e
• Data quantity
Velocity
• Data Speed
Variety
• Data Types
Veracity
• Authenticity
Value• Statistical• Events
20
What’s driving Big Data
- Ad-hoc querying and reporting- Data mining techniques- Structured data, typical sources- Small to mid-size datasets
- Optimizations and predictive analytics- Complex statistical analysis- All types of data, and many sources- Very large datasets- More of a real-time
BenefitsCost & management
Economies of scale, “out-sourced” resource management
Reduced Time to deploymentEase of assembly, works “out of the box”
ScalingOn demand provisioning, co-locate data and
computeReliability
Massive, redundant, shared resourcesSustainability
Hardware not owned
ANY QUESTION
???