Copyright 2013 by Data Blueprint Demystifying Big Data Date: May 14, 2013 Time: 2:00 PM ET/11:00 AM PT Presenter: Peter Aiken, Ph.D. • Every century, a new technology-steam power, electricity, atomic energy, or microprocessors-has swept away the old world with a vision of a new one. Today, we seem to be entering the era of Big Data – Michael Coren 1
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Copyright 2013 by Data Blueprint
Demystifying Big Data
Date: May 14, 2013Time: 2:00 PM ET/11:00 AM PTPresenter: Peter Aiken, Ph.D.
• Every century, a new technology-steam power, electricity, atomic energy, or microprocessors-has swept away the old world with a vision of a new one. Today, we seem to be entering the era of Big Data– Michael Coren
• Every century, a new technology-steam power, electricity, atomic energy, or microprocessors-has swept away the old world with a vision of a new one. Today, we seem to be entering the era of Big Data– Michael Coren
1
Copyright 2013 by Data Blueprint 2
Live Twitter Feed @datablueprint @paiken #dataed
Like Us www.facebook.com/datablueprint Join the Group Data Management & Business Intelligence
Get Social with Us!
Presented by Peter Aiken, Ph.D.
Demystifying Big Data 2.0Developing the Right Approach for Implementing Big Data Techniques
Copyright 2013 by Data Blueprint 4
Peter Aiken, PhD• 30+ years of experience in data
management• Multiple international awards &
recognition• Founder, Data Blueprint (datablueprint.com)
• Associate Professor of IS, VCU (vcu.edu)
• Past President, DAMA International (dama.org)
• 9 books and dozens of articles• Experienced w/ 500+ data management
practices in 20 countries• Multi-year immersions with
organizations as diverse as the US DoD, Nokia, Deutsche Bank, Wells Fargo, and the Commonwealth of Virginia
2
Copyright 2013 by Data Blueprint
Outline
• Big Data Context: Why the Big Deal about Big Data?
• Big Data Challenges: Historical Perspective
• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:
Foundational & Technical• Take Aways and Q&A
5
Copyright 2013 by Data Blueprint
Why the Big Deal about Big Data?
6
• We are at an inflection point: The sheer volume of data generated, stored, and mined for insights has become economically relevant to businesses, government, and consumers (McKinsey)
• We believe the same important principles still apply:
– What problem are you trying to solve for your business? Your solution needs to fit your problem
– Doing data for (big) data’s sake is not going to solve any problems
– Risk of spending a lot of money on chasing Big Data that will realize little to no returns - especially at this hype cycle stage
• Gartner: High-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision-making, insight discovery and process optimization.
• IBM: Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.
• NY Times: Shorthand for advancing trends in technology that open the door to a new approach to understanding the world and making decisions.
• McKinsey: Large pools of data that can be brought together and analyzed to discern patterns and make better decisions
1. VolumeThe amount of data
2. VelocityThe speed of data going in and out
3. VarietyThe range of data types & sources
4. VariabilityMany options or variable interpretations confound analysis
Q: "Would it be more useful to refer to "big data techniques?"
Copyright 2013 by Data Blueprint
Big Data Characteristics generally include:
12
Copyright 2013 by Data Blueprint
Big Data Gartner Hype Cycle
13
Copyright 2013 by Data Blueprint
Some Big Data Limitations
• Data analysis struggles with social cognition
• Data struggles with context• Data creates bigger haystacks• Big data has trouble with big
problems• Data favors memes over
masterpieces• Data obscures values
14
David Brooks, New York Times: http://www.nytimes.com/2013/02/19/opinion/brooks-what-data-cant-do.html?_r=0
Copyright 2013 by Data Blueprint
Business Information Market: $1.1 Trillion a Year
15
• Enterprises spend an average of $38 million on information/year
• Small and medium sized businesses on average spend $332,000
– Track what is being purchased and how often– Coupons based on purchasing history– Targeted communications, campaigns & special offers– Social media for additional interactions– Personalize consumer interactions
• Customer purchase history influences product placements– Retailers rapidly respond to consumer demands– Product placements, planogram optimization, etc.
The European Union last year approved a new rule mandating that all trades must exist for at least a half-second - in this instance 1,200 orders and 215 actual trades
Copyright 2013 by Data Blueprint 34
#3 VARIETY, Range of Data Types & Sources Increasingly individuals make use of data producing gadgets to perform services for them
Copyright 2013 by Data Blueprint 35
#4 VARIABILITY, Many options or variable interpretations confound analysis
Historyflow-Wikipedia entry for the word “Islam”
Copyright 2013 by Data Blueprint
Take Aways: Big Data Challenges Today• Fact: Big Data techniques are innovative but
“Big Data” is not• Challenges are both foundational and
technical, today as well as in 1600s• Technology continues to advance rapidly (4
Vs)• Challenges associated with Big Data are not
new:– Well-known foundational data management issues– Need to align data and business with rapidly
changing environment– Duplicity, accessibility, availability– Foundational business issues
36
Copyright 2013 by Data Blueprint
Outline
• Big Data Context: Why the Big Deal about Big Data?
• Big Data Challenges: Historical Perspective
• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:
Foundational & Technical• Take Aways and Q&A
37
Copyright 2013 by Data Blueprint
Myth #6: Big Data provides all the Answers
Fact:• Big Data does not mean the end of
scientific theory• Be careful or you’ll end up with
spurious correlations– Don’t just go fishing for correlations and
hope they will explain the world
• To get to the WHY of things, you need ideas, hypotheses and theories
• Having more data does not substitute for thinking hard, recognizing anomalies and exploring deep truths
+ of data/day2. Velocity: 60 GB of data per second 3. Variety: 8.5 billion devices connected4. Variability: Sponsor data, athlete data, etc.5. Vitality: Data Art project “Emoto”6. Virtual: Social media
Copyright 2013 by Data Blueprint 42
• Based on my 6 V analysis, do I need a Big Data solution or does my current BI solution address my business opportunity?– Do the 6 Vs indicate general Big Data characteristics?– What are the limitations of my current Bi environment?
(Technology constraint)– What are my budgetary restrictions? (Financial constraint)– What is my current Big Data knowledge base? (Knowledge
constraint)
Copyright 2013 by Data Blueprint 43
• MUST have both Foundational and Technical practice expertise
Copyright 2013 by Data Blueprint 44
Copyright 2013 by Data Blueprint 45
• Data Strategy
• Data Governance
• Data Architecture
• Data Education
Copyright 2013 by Data Blueprint 46
• Data Quality
• Data Integration
• Data Platforms
• BI/Analytics
Copyright 2013 by Data Blueprint 47
• Needs to be actionable• Generally well understood by
business• Document what has been learned
Copyright 2013 by Data Blueprint 48
• Perfect results are not necessary
• Reiterate and refine• Iterative process to
reach decision point• Use as feedback for
next exploration
Copyright 2013 by Data Blueprint 49
Copyright 2013 by Data Blueprint
Take Aways-Approach: Crawl, Walk, Run• Crawl:
– Identify business opportunity and determine whether you truly need a Big Data solution
• Walk:– Apply a combination of
foundational and technical data management practices. Document your insights and make sure they are actionable
• Run: – Recycle and explore. Staying
agile allows you to be exploratory.
50
Copyright 2013 by Data Blueprint
Outline
• Big Data Context: Why the Big Deal about Big Data?
• Big Data Challenges: Historical Perspective
• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:
Foundational & Technical• Take Aways and Q&A
51
Copyright 2013 by Data Blueprint
• Your data strategy must align to your organizational business strategy and operating model
• As the market place becomes more data-driven, a data-focused business strategy is an imperative
• Must have data strategy before you have a Big Data strategy
52
Foundational Practice: Data Strategy
Copyright 2013 by Data Blueprint
Data Strategy Case StudyEnterprise Information Management Maturity
53
Copyright 2013 by Data Blueprint
• What are the questions that you cannot answer today?
• Is there a direct reliance on understanding customer behavior to drive revenue?
• Do you have information overload and are you trying to find the signal in the noise?
• Which is more important:– Establishing value from current
data assets/data reporting?– Exploring Big Data
opportunities?
54
Data Strategy Considerations
Copyright 2013 by Data Blueprint
Myth #7: You need Big Data for Insights
Fact:• Distinction between Big Data and
doing analytics– Big Data is defined by the technology stack
that you use– Big Data is used for predictive and
prescriptive analytics
• Use existing data for reporting, figure out bottlenecks and optimize current business model
• Understand how is your data structured, architected and stored
55
Copyright 2013 by Data Blueprint
• Common vocabulary expressing integrated requirements ensuring that data assets are stored, arranged, managed, and used in systems in support of organizational strategy [Aiken 2010]
• Most organizations have data assets that are not supportive of strategies
• Big question:– How can organizations more
effectively use their information architectures to support strategy implementation?
56
Foundational Practice: Data Architecture
Copyright 2013 by Data Blueprint
• Does your current architecture for BI and analytics support Big Data?
• Are you getting enough value out of your current architecture?
• Can you easily integrate and share information across your organization?
• Do you struggle to extract the value from your data because it is too cumbersome to navigate and access?
• Are you confident your data is organized to meet the needs of your business?
57
Data Architecture Considerations
Copyright 2013 by Data Blueprint
• A data-centric organization requires unified data
• Integrating data across organizational silos creates new insights
• It is also the biggest challenge
• Big Data techniques can be used to complement existing integration efforts
58
Technical Practice: Data Integration
Allowing connections between RDBMS and NoSQL data is beneficial
Examples:1. Invoices2. Passports3. Stock shelving
Copyright 2013 by Data Blueprint
Integration Data Vault 2.0 with Big Data
59
Copyright 2013 by Data Blueprint
• The complexity of your data integration challenge depends on the questions you’re trying to answer
• Integration requirements for Big Data are dependent on the types of questions you’re asking: – Integration here may be more fuzzy than
discrete– Integration is domain-based (based on
time, customer concept, geographic distribution)
• Those requirements should evolve from your strategy
60
Data Integration Considerations
Copyright 2013 by Data Blueprint
• Quality is driven by fit for purpose considerations
• Big Data quality is different:– Basic– Availability– Soft-state– Eventual consistency
• Directional accuracy is the goal• Focus on your most important data
assets and ensure our solutions address the root cause of any quality issues – so that your data is correct when it is first created
• Experience has shown that organizations can never get in front of their data quality issues if they only use the ‘find-and-fix’ approach
61
Technical Practice: Data Quality
Copyright 2013 by Data Blueprint
• Big Data is trying to be predictive
• What are the questions you are trying to answer?– What level of accuracy are you
looking for?– What confidence levels?– Example: Do I need to know
exactly what the customer is going to buy or do I just need to know the range of products he/she is going to choose from?
62
Data Quality Considerations
Copyright 2013 by Data Blueprint
Myth #8: Bigger Data is Better
Fact:• Better to have less data of good
quality than more poor quality big data
• Analysis to reduce variables and increase manageability, otherwise Big Data = Quantity over Quality
• Beware of Shiny Object Syndrome– What problem are we trying to solve?– The solution needs to fit the problem
• Big Data may not be your answer, it may be your problem
• Investments in foundational and technical approaches result in better outcomes for Big Data
63
Copyright 2013 by Data Blueprint
• Do you want to measure critical operational process performance?
• No one data platform can answer all your questions. This is commonly misunderstood and often leads to very expensive, bloated and ineffective data platforms.
• Understanding the questions that need to be asked and how to build the right data platform or how to optimize an existing one
64
Technical Practice: Data Platforms
Copyright 2013 by Data Blueprint
The Big Data Landscape
65
Copyright Dave Feinleib, bigdatalandscape.com
Copyright 2013 by Data Blueprint
• Commonalities between most big data stacks with file storage, columnar store, querying engine, etc.
• Big data stack generally looks the same until you get into appliances – Algorithms are built into appliance
themselves, e.g. Netezza, Teradata, etc.)
• Ask these questions:– Do you want insights on your
customer’s behavior?– Do you need real-time customer
transactional information?– Do you need historical data or just
access to the latest transactions?– Where do you go to find the single
version of the truth about your customers?
66
Data Platforms Considerations
Copyright 2013 by Data Blueprint
Take Aways-Design Principles: Foundational & Technical
• Foundational data management principles still apply
• Beware of SOS (Shiny Object Syndrome)
• You must have a data strategy before you can have a Big Data strategy
• Fact: You don’t need Big Data to gain insights
• Big Data integration requirements evolve from your strategy
• Fact: Bigger Data is not always better
67
Copyright 2013 by Data Blueprint
Outline
• Big Data Context: Why the Big Deal about Big Data?
• Big Data Challenges: Historical Perspective
• Big Data Challenges: Today• Big Data Approach: Crawl, Walk, Run• Design Principles:
Foundational & Technical• Take Aways and Q&A
68
Copyright 2013 by Data Blueprint
Take Aways: In Summary• Big data techniques are innovative
but “Big Data” is not• Big Data characteristics: 6 Vs
• The Washington Post: Five Myths about Big Data (http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics)
• Gartner: Gartner’s 2013 Hype Cycle for Emerging Technologies Maps Out Evolving Relationship Between Humans and Machines (http://www.gartner.com/newsroom/id/2575515)
• The New York Times | Opinion Pages: What Data Can’t Do (http://www.nytimes.com/2013/02/19/opinion/brooks-what-data-cant-do.html?_r=1&)
• CIO.com: Five Steps for How to Better Manage Your Data (http://www.cio.com.au/article/429681/five_steps_how_better_manage_your_data/)
• Business Insider: Enterprises Aren’t Spending Wildly on ‘Big Data’ But Don’t Know If It’s Worth It Yet (http://www.businessinsider.com/enterprise-big-data-spending-2012-11#ixzz2cdT8shhe)
• Inc.com: Big Data, Big Money: IT Industry to Increase Spending (http://www.inc.com/kathleen-kim/big-data-spending-to-increase-for-it-industry.html)
• Forbes: Big Data Boosts Customer Loyalty. No, Really. (http://www.forbes.com/sites/xerox/2013/09/27/big-data-boosts-customer-loyalty-no-really/)
70
Copyright 2013 by Data Blueprint
Questions?
It’s your turn! Use the chat feature or Twitter (#dataed) to submit
your questions to Peter now.
71
+ =
Data-Centric Strategy & Roadmap February 11, 2014 @ 2:00 PM ET/11:00 AM PT
Emerging Trends in Data JobsMarch 13, 2014 @ 2:00 PM ET/11:00 AM PT
Sign up here: www.datablueprint.com/webinar-schedule or www.dataversity.net
Copyright 2013 by Data Blueprint
Upcoming Events
72
10124 W. Broad Street, Suite CGlen Allen, Virginia 23060804.521.4056