Top Banner
Data Types, Characteristics and U
23

big data - Data Types

Oct 04, 2015

Download

Documents

himanshu_93

data types
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Data Types, Characteristics and Uses

  • *

  • Big Data Technology*

  • Big Data EveryWhere! Lots of data is being collected and warehoused Web data, e-commercepurchases at department/ grocery storesBank/Credit Card transactionsSocial Network

  • How much data?Google processes 20 PB a day (2008)Wayback Machine has 3 PB + 100 TB/month (3/2009)Facebook has 2.5 PB of user data + 15 TB/day (4/2009) eBay has 6.5 PB of user data + 50 TB/day (5/2009)CERNs Large Hydron Collider (LHC) generates 15 PB a year

    640K ought to be enough for anybody.

  • The EarthscopeThe Earthscope is the world's largest science project. Designed to track North America's geological evolution, this observatory records data over 3.8 million square miles, amassing 67 terabytes of data. It analyzes seismic slips in the San Andreas fault, sure, but also the plume of magma underneath Yellowstone and much, much more. (http://www.msnbc.msn.com/id/44363598/ns/technology_and_science-future_of_technology/#.TmetOdQ--uI)

    1.

  • Types of DataRelational Data (Tables/Transaction/Legacy Data)Text Data (Web)Semi-structured Data (XML) Graph DataSocial Network, Semantic Web (RDF),

    Streaming Data You can only scan the data once

  • Big Data Analysis ExampleBig data can generate significant financial value across sectors

    *

  • Who is collecting all of this data? Government AgenciesBig Pharmaceutical Companies(Hey, I didnt say which government!)

  • Who is collecting all this data?Consumer Products Companies Big Box Stores

  • Who is collecting what?Credit Card CompaniesWhat data are they getting?Restaurant checkGrocery BillAirline ticketHotel Bill

  • Why are they collecting all this data? Target MarketingTo send you catalogs for exactly the merchandise you typically purchase.To suggest medications that precisely match your medical history.To push television channels to your set instead of your pulling them in.To send advertisements on those channels just for you! Targeted InformationTo know what you need before you even know you need it based on past purchasing habits!To notify you of your expiring drivers license or credit cards or last refill on a Rx, etc.To give you turn-by-turn directions to a shelter in case of emergency.

  • What to do with these data?Aggregation and Statistics Data warehouse and OLAPIndexing, Searching, and QueryingKeyword based search Pattern matching (XML/RDF)Knowledge discoveryData MiningStatistical Modeling

  • Where Is This Big Data Coming From ?12+ TBs of tweet data every day25+ TBs of log data every day? TBs of data every day

  • With Big Data, Weve Moved into a New Era of Analytics

  • The number of organizations who see analytics as a competitive advantage is growing.

  • Four Characteristics of Big DataCollectively Analyzing the broadening VarietyResponding to the increasing VelocityCost efficiently processing the growing VolumeEstablishing the Veracity of big data sources30 Billion RFID sensors and counting1 in 3 business leaders dont trust the information they use to make decisions50x35 ZB202080% of the worlds data is unstructured2010

    Chart1

    0.8

    0.2

    Sales

    Sheet1

    Sales

    Unstructured80%

    Structured20%

    To resize chart data range, drag lower right corner of range.

  • The 5 Key Big Data Use Cases

    * 2013 IBM Corporation

  • Big Data Exploration: NeedsStruggling to manage and extract value from the growing 3 Vs of data in the enterprise; Need to unify information across federated sourcesInability to relate raw data collected from system logs, sensors, clickstreams, etc., with customer and line-of-business data managed in enterprise systemsRisk of exposing unsecure personally identifiable information (PII) and/or privileged data due to lack of information awareness

  • Big Data Exploration: Value & Diagram*Application/UsersFind, Visualize & Understand all big data to improve business knowledgeGreater efficiencies in business processesNew insights from combining and analyzing data types in new waysDevelop new business models with resulting increased market presence and revenue

  • Enhanced 360 View of the Customer: NeedsNeed a deeper understanding of customer sentiment from both internal and external sourcesDesire to increase customer loyalty and satisfaction by understanding what meaningful actions are neededChallenged getting the right information to the right people to provide customers what they need to solve problems, cross-sell & up-sell

  • Security/Intelligence Extension: Needs 2013 IBM Corporation

  • Operations Analysis: NeedsBenefits:Gain real-time visibility into operations, customer experience, transactions and behaviorProactively plan to increase operational efficiencyBusiness Challenges:Complexity and rapid growth of machine data Difficult to capture small fraction of machine for better decisionIn-ability to analyze machine data and combine it with enterprise data for a full view analysisIdentify and investigate anomaliesMonitor end-to-end infrastructure to proactively avoid service degradation or outages

    **Obviously, there are many other forms and sources of data. Lets start with the hottest topic associated with Big Data today: social networks. Twitter generates about 12 terabytes a day of tweet data which is every single day. Now, keep in mind, these numbers are hard to count on, so the point is that theyre big, right? So dont fixate on the actual number because they change all the time and realize that even if these numbers are out of date in 2 years, its at a point where its too staggering to handle exclusively using traditional approaches. +CLICK+Facebook over a year ago was generating 25 terabytes of log data every day (Facebook log data reference: http://www.datacenterknowledge.com/archives/2009/04/17/a-look-inside-facebooks-data-center/ ) and probably about 7 to 8 terabytes of data that goes up on the Internet. +CLICK+Google, who knows? Look at Google Plus, YouTube, Google Maps, and all that kind of stuff. So thats the left hand of this chart the social network layer. +CLICK+Now lets get back to instrumentation: there are massive amounts of proliferated technologies that allow us to be more interconnected than in the history of the world and it just isnt P2P (people to people) interconnections, its M2M (machine to machine) as well. Again, with these numbers, who cares what the current number is, I try to keep them updated, but its the point that even if they are out of date, its almost unimaginable how large these numbers are. Over 4.6 billion camera phones that leverage built-in GPS to tag the location or your photos, purpose built GPS devices, smart metres. If you recall the bridge that collapsed in Minneapolis a number of years ago in the USA, it was rebuilt with smart sensors inside it that measure the contraction and flex of the concrete based on weather conditions, ice build up, and so much more. So I didnt realise how true it was when Sam P launched Smart Planet: I thought it was a marketing play. But truly the world is more instrumented, interconnected, and intelligent than its ever been and this capability allows us to address new problems and gain new insight never before thought possible and thats what the Big Data opportunity is all about!

    *Jervin : there is so much that we can with BigData Look at (VOLUME/VARIETY) the amount of data that we can use to boost our ANALYTIC IQ,

    It is also CRITICAL, while BigData gives lots of opportunity, there is a VERACITY components that related to TRUST of source of data how do we TRUST and GOVERN that data.Next is VELOCITY (the speed of data that arrives at your door step..). What Are you going to do and how long does it that for you to REACT on it.

    +CLICK+I think we can all relate to Volume when describing Big Data. Of course all of the numbers on this slide are out of date the moment I saved them; but you get the point. I think back 7 years ago when I used to maintain a TB Club for data warehouse customers, today I have a 1TB in my pocket.

    +CLICK+Big Data gives us the opportunity to include different kinds of data into our analysis, thereby boosting your analytics IQ.

    +CLICK+Veracity is another characteristic of Big Data; this goes to if you can trust the source of the data, or understand it. Its critical, if you are going to reach out into emails, call center, Tweets, Facebook, and more, youre going to have to trust the source.

    +CLICK+One of the biggest differentiators for the IBM Big Data platform is around the final V, Velocity. This is about how fast data arrives at the organizations doorstep, but more: what are you going to do about and how long does it take. You get some details in the next slide.In this environment, organizations using analytics are gaining real competitive advantage

    -57% increase from 2010 to 2011 in respondents who say analytics creates a competitive advantage

    Source: IBM IBV/MIT Sloan Management Review Study 2011Copyright Massachusetts Institute of Technology 2011Big data has 5 key characteristics. The first is volume. Of course this may seem obvious, but it is complex that you may think. Yes the volume of data is growing. Experts predict that the volume of data in the world will grow to 25 Zettabytes in 2020. That same phenomenon affects every business their data is growing at the same exponential rate too. But it isnt jus the volume of data that is growing. Its the number of sources of that data. And that leads to the third characteristic of big data, variety, which we will cover later.

    Data is increasingly accelerating the velocity at which it is created and at which it is integrated. Weve moved from batch to a real-time business. Data comes at you at a record or a byte level, not always in bulk. And the demands of the business have increased as well from an answer next week to an answer in a minute. And the world is also becoming more instrumented and interconnected. The volume of data streaming off those instruments is exponentially larger than it was even 2 years ago.

    Variety presents an equally difficult challenge. The growth in data sources has fuelled the growth in data types. In fact, 80% of the worlds data is unstructured. Yet most traditional methods apply analytics only to structured information.

    And finally we have veracity. How can you act upon information if you dont trust it. Establishing trust in big data presents a huge challenge as the sources and the variety grows. *Our product management, engineering, marketing, CTPs, etc, etc teams have all been working together to help to better understand the big data market. Weve done surveys, met with analysts and studied their findings, weve met in person with customers and prospects (over 300 meetings) and are confident that we found market sweet spots for big data. These 5 use cases are our sweet spots. These will resonate with the majority of prospects that you meet with. In the coming slides well cover each of these in detail, well walk through the need, the value and a customer example. **Enterprise-wide, InfoSphere Data Explorer addresses the ongoing challenge of information silos, a challenge that isnt going away any time soon. Each of the systems in your enterprise was designed to serve a critical function, whether its managing customer data, managing your supply chain, securing sensitive content or any of a myriad of different functions. Systems such as CRM, ECM, supply chain management, e-mail and others are necessary to perform these specific functions. Each of these systems is a silo with its own login, user interface and way of delivering information. The problem is that almost no one in your organization can rely on only one of these silos for the information they need to do their job. [click] Velocity delivers business value by enabling everyonefrom management through knowledge workers to front-line employeesto access all of the information they need in a single view, regardless of format or where it is managed. Rather than wasting time accessing each silo separately, Velocity enables them to navigate seamlessly across all available sources, and provides the added advantage of cross-repository visibility. Information is secured so that users only see the content that they are permitted to view when logged directly into the target application. [click] In addition, Velocity gives users the ability to comment, tag and rate content, as well as create shared folders for content they would like to share with other users. [click] All of this user feedback and social content is then fed back into Velocitys relevance analytics to ensure that the most valuable content is presented to users. The result is Better decisions More efficient operationsBetter understanding of customersInnovation

    IBM IOD 2011*Prensenter name here.ppt***What is Operations Analysis? Its using big data technologies to enable a new generation of applications that analyze large volumes of multi-structured, often in-motion machine data and gain insight from it, which in turn improves business results

    What are the drivers for an Operations Analysis use case? In its raw format, businesses are unable to leverage machine dataGrowing at exponential ratesComes in large volumes, variety of formats, often in-motionNeeds to be combined with existing enterprise dataRequires complex analysis and correlation across different types of data setsRequires unique visualization capabilities based on data type and industry/application

    Organizations want to leverage machine data to improve business results and decision-making

    *