Top Banner
Concept of Big Data Presented by MTech-CE(Boys Group)
34

A Big Data Concept

Aug 19, 2014

Download

Engineering

Dharmesh Tank

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Big Data Concept

Concept of Big DataPresented byMTech-CE(Boys Group)

Page 2: A Big Data Concept

What is Data

The word Data is plural of datum in the Latin dare which meant "to give", that is to “something given”.

Data as an abstract concept can be viewed as the lowest level of abstraction from which information and then knowledge are derived.

Information in raw or unorganized form(such as alphabets, numbers, or symbols) that refer to, or represent, conditions, ideas, or objects. Data is limitless and present everywhere in the universe. See also information and knowledge.

Computers: Symbols or signals that are input, stored, and processed by a computer, for output as usable information.

Page 3: A Big Data Concept

Type of Data

Relational Data (Tables/Transaction/Legacy Data)

Text Data (Web)

Semi-structured Data (XML)

Graph DataSocial Network, Semantic Web (RDF), …

Streaming Data You can only scan the data once

Page 4: A Big Data Concept

Big Data Definition

Big data is a massive volume of both structured and unstructured data that is so large that it's difficult to process with traditional database and software techniques.

Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications

Big data is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it…

Page 5: A Big Data Concept

Walmart handles more than 1 million customer transactions every hour.

Facebook handles 40 billion photos from its user base.

Decoding the human genome originally took 10 years to process; now it can be achieved in one week.

Google processes 20 PB a day (2008)Wayback Machine has 3 PB + 100 TB/month (3/2009)

Facebook has 2.5 PB of user data + 15 TB/day (4/2009)

eBay has 6.5 PB of user data + 50 TB/day (5/2009)

Where the Big Data???

Page 6: A Big Data Concept

Data Units

Big Data is Data growing faster than Moore’s law1 Bytes - 8 Bits1 Kilobyte(KB) - 10^3 Bytes1 Megabyte(MB) - 10^6 Bytes1 Gigabyte(GB) - 10^9 Bytes1 Terabyte(TB) - 10^12 Bytes)

Page 7: A Big Data Concept

Big Big Big Data

Petabyte(PB) - 10^15 BytesExabyte (EB) - 10^18 BytesZettabyte(ZB) - 10^21 BytesYottabyte (YB) - 10^24 BytesXenottabyte(XB) - 10^27 BytesShilentnobyte (SB) - 10^30 BytesDomegrottebyte (DB) - 10^33 Bytes

Page 8: A Big Data Concept

Characteristics of Big Data

Page 9: A Big Data Concept

Volume Data Volume44x increase from 2009 2020From 0.8 zettabytes to 35zb

Data volume is increasing exponentially

Page 10: A Big Data Concept

Varity

Various formats, types, and structures

Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays, etc…

Static data vs. streaming data A single application can be generating/collecting many types of data

Page 11: A Big Data Concept

Velocity

Data is begin generated fast and need to be processed fast

Online Data AnalyticsLate decisions missing opportunitiesExamples

E-Promotions: Based on your current location, your purchase history, what you like send promotions right now for store next to you

Healthcare monitoring: sensors monitoring your activities and body any abnormal measurements require immediate reaction

Page 12: A Big Data Concept

Big Data(3-V)

Page 13: A Big Data Concept

Some Make it 4V’s

Page 14: A Big Data Concept

Harnessing Big Data

OLTP: Online Transaction Processing (DBMSs)

OLAP: Online Analytical Processing (Data Warehousing)

RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)

Page 15: A Big Data Concept

LayOut

Page 16: A Big Data Concept

Who’s Generating Big Data

Social media and networks(all of us are generating data)

Scientific instruments(collecting all sorts of data)

Mobile devices (tracking all objects all the time)

Sensor technology and networks(measuring all kinds of data)

Page 17: A Big Data Concept

Implementation of Big Data

Parallel DBMS technologiesProposed in late eightiesMatured over the last two decades

Multi-billion dollar industry: Proprietary DBMS Engines intended as Data Warehousing solutions for very large enterprises

Map Reduce pioneered by Googlepopularized by Yahoo! (Hadoop)

Page 18: A Big Data Concept

MetaData Management of Big Data

Page 19: A Big Data Concept

MapReduce Parallel DBMS technologies

Data-parallel programming model

An associated parallel and distributed

implementation for commodity clusters

Popularized by open-source Hadoop

Used by Yahoo!, Facebook,

Amazon, and the list is growing …

Popularly used for more than two decades Research Projects:

Gamma, Grace, … Commercial: Multi-

billion dollar industry but access to only a privileged few

Relational Data Model Indexing Familiar SQL interface Advanced query

optimization Well understood and

studied

Comparison

Page 20: A Big Data Concept

MapReduce Advantages

Automatic Parallelization:Depending on the size of RAW INPUT DATA instantiate multiple MAP tasks

Similarly, depending upon the number of intermediate <key, value> partitions instantiate multiple REDUCE tasks

Run-time:Data partitioningTask schedulingHandling machine failuresManaging inter-machine communication

Completely transparent to the programmer / analyst / end user

Page 21: A Big Data Concept

Big dataset(Hadoop)

Page 22: A Big Data Concept

Why Hadoop

Big Data analytics and the apache hadoop open source project are rapidly emerging as the preferred solution to address business & technology trends that’s are disrupting traditional data management & processing

Page 23: A Big Data Concept

Hadoop Adoption in Industry

Page 24: A Big Data Concept

What is Hadoop???

Page 25: A Big Data Concept

Challenge in Big Data

Big Data Integration is MultidisciplinaryLess than 10% of Big Data world are

genuinely relationalMeaningful data integration in the real,

messy, schema-less and complex Big Data world of database and semantic web using multidisciplinary and multi-technology method

The Linked Open Data RipperMapping, Ranking, Visualization, Key

Matching, SnappinessDemonstrate the Value of Semantics: let data

integration drive DBMS technologyLarge volumes of heterogeneous data, like

link data and RDF

Page 26: A Big Data Concept

Provocations for Big Data

1. Automating Research Changes the Definition of Knowledge

2. Claim to Objectively and Accuracy are Misleading

3. Bigger Data are not always Better data

4. Not all Data are equivalent

5. Just because it is accessible doesn’t make it ethical

6. Limited access to big data creates new digital divides

Page 27: A Big Data Concept

Who is collecting all Big Data

Web Browsers Search Engines

Page 28: A Big Data Concept

Who is collecting all Big Data

Smartphones & Apps

Apple’s iPhone(Apple O/S)

Samsung, HTC.Nokia, Motorola(Android O/S)

RIM Corp’s Blackberry(BlackBerry O/S)

Tablet Computers & Apps

Apple’s iPad

Samsung’s Galaxy

Amazon’s Kindle Fire

Page 29: A Big Data Concept

Who is collecting for what?

Credit Card Companies What data are they getting?

Restaurant check

Grocery Bill

Airline ticket

Hotel Bill

Page 30: A Big Data Concept

Why are they collecting all this data?

Target Marketing

To send you catalogs for exactly the merchandise you typically purchase.

To suggest medications that precisely match your medical history.

To “push” television channels to your set instead of your “pulling” them in.

To send advertisements on those channels just for us!

Targeted Information To know what you need

before you even know you need it based on past purchasing habits!

To notify you of your expiring driver’s license or credit cards or last refill on a Rx, etc.

To give you turn-by-turn directions to a shelter in case of emergency.

Page 31: A Big Data Concept

Future Enhancement

Smartphones and tablets outsold desktop and laptop computers in 2011. There are more Smartphones in the U.S. in 2012 than people!

The phone in your pocket has more programmable memory, more storage and more capability than several large IBM computers.

It takes dozens of microprocessors running 100 million lines of code to get a premium car out of the driveway, and this software is only going to get more complex. In fact, the cost of software and electronics accounts for 30-40% of the price.

Page 32: A Big Data Concept

Conclusion

Big Data and Big Data Analytics – Not Just for Large Organizations

It Is Not Just About Building Bigger DatabasesMoving Processing to the Data Source Yields Big

DividendsChoose the Most Appropriate Big Data Scenario

Complete data scenario whereby entire data sets can be properly managed and factored into analytical processing, complete with in-database or in-memory processing and grid technologies.

Targeted data scenarios that use analytics and data management tools to determine the right data to feed into analytic models, for situations where using data set isn’t technically feasible or adds little value.

Page 33: A Big Data Concept

Closing Thought

Big data is not just about helping an organization be more successful – to market more effectively or improve business operations.

High-performance analytics from designed to support big data initiatives, with in-memory, in-database and grid computing options.

Those organizations can benefit from cloud computing, where big data analytics is delivered as a service and IT resources can be quickly adjusted to meet changing business demands.

On Demand provides customers with the option to push big data analytics to greatly eliminating the time, capital expense and maintenance associated with on-premises deployments.

Page 34: A Big Data Concept

Thank you