This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Advanced Analytics Platform Deep Dive Components, Patterns, Architecture Decisions ISA-3637 (Tue Nov 5 11:15 AM – 12:15 AM)
Please note IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
Acknowledgements and Disclaimers Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates.
The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.
All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.
• U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
• Please update paragraph below for the particular product or family brand trademarks you mention such as WebSphere, DB2, Maximo, Clearcase, Lotus, etc
IBM, the IBM logo, ibm.com, [IBM Brand, if trademarked], and [IBM Product, if trademarked] are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml
If you have mentioned trademarks that are not from IBM, please update and add the following lines:
[Insert any special 3rd party trademark names/attributions here]
Other company, product, or service names may be trademarks or service marks of others.
§ Big Data Scale § Investment Decisions § Lower storage requirements § Smarter Returns § Analyze data before it lands – then store only what you need § New analytic models § Share critical information across the enterprise vs. deliver multiple copies of the data § Traditional Infrastructure Optimization § Product Knowledge Hub
§ Content Network Distribution § Proactive Device Management § Network Fault Prevention § ICTO (Energy Savings) § Real Time Traffic Optimization § Network Abuse from excessive data users § Discrete on-line charging for quality of experience § Real time automated capacity management for dropped calls § SON Capacity Management for special events (traffic offload) § Service Migration
§ Align diverse streams of data, identify customers, align to IDs, sense data importance § Categorize incoming data, use window counts to aggregate atomic data or threshold vioilations,
focus attention on monitored situations abstracted from raw events § Use scoring models developed by prediction engine to score observations, activities, customers,
etc. in real time § Make data ready for execution of events – e.g., designing campaign messages based on
information available. § Includes TEDA and geo-spatial accelerators
§ Create models using historical data sources § Optimize outcomes by promoting best model for a particular treatment (Champion / Challenger) § Manage policies associated with decisions – e.g., WODM decision rules, Optim data policies, etc. § Includes SPSS Deployment Server § Includes SPSS location analytics
§ Provide capabilities for storage of structured, unstructured and semi-structured data § Provide capabilities for analytics using DB functions (e.g., SPSS model development) § Provide capabilities for data archival using archival policies § Includes Optim / DS for archival policy execution
§ Deep analysis of consumer behavior is performed to mine data for model creation § Includes unstructured search, pattern matching using arbitrarily defined patterns, qualitative
analytics, quantification of data (e.g., sentiment analysis) § Includes Big Insights accelerators
§ Perform Ad hoc queries, standard reports, dash board § Run simulation models, what-if analysis § Geo-spatial and semantic viewing of data
Streaming Engine
Prediction / Policy Engine
Database Server
Insight
Information Interaction
AAP Capabilities
Content
• Use cases to support Business Architecture
• Components to support Application Architecture
• Data Integration
• Privacy Management & Archiving
• Location & Lifestyle Analytics
• Adaptive Analytics
• Momentum and Conclusions
16
Mature Organizations are Looking for Instantaneous Insight from Data
Speed to insight
Total respondents n = 973
Respondents were asked how quickly business users require data to be available for analysis or within processes. Box placement reflects the prevalence of that requirements within each a stage.
17
Current fact finding
Analyze data in motion – before it is stored
Low latency paradigm, push model
Data driven – bring data to the analytics
Historical fact finding
Find and analyze information stored on disk
Batch paradigm, pull model
Query-driven: submits queries to static data
Traditional Computing Stream Computing
Stream Computing Represents a Paradigm Shift
Real-time Analytics
18
Massively scalable stream analytics
Linear Scalability • Clustered deployments –
unlimited scalability
Automated Deployment • Automatically optimize
operator deployment across nodes
Performance Optimization • Parallel & pipeline
operations • Efficient multi-threading
Analytics on Streaming Data • Analytic accelerators for a
variety of data types • Optimized for real-time
performance
Visualization
Streams Runtime
Deployments
Sink Adapters
Analytic Operators
Source Adapters
Automated and Optimized
Deployment Streaming Data
Sources
Streams Studio IDE
19
Modify Filter / Sample
Classify
Fuse
Annotate
Big Data in Real Time with InfoSphere Streams
Score Windowed Aggregates
Analyze
IBM Big Data Advanced Analytics Platform (AAP) Architecture
A
B
C
D G
AAP Capabilities High Performance Historical analysis Model Based Predictive Analytics Real-time scoring, classification, detection and action
Visualize, explore, investigate, search and report
High Performance Unstructured Data analysis Discovery Analytics Take action on analytics
F
Information Interaction
Analytics Engine
Prediction / Policy Engine
Sense, Identify,
Align
Reports
Geo/Semantic Mapping
Dashboards
Simulation
Outcome Optimization
Model Creation
Semi Structured
Data
Dat
a R
epos
itorie
s
Network Events
Network Policies C
ontin
uous
Fee
d S
ourc
es
XDR
Batch Data
Data for Historical Analysis
Deploy Model
Streaming Engine
Streaming Data Categorize, Count, Focus
Score, Decide
Historical Data Models
In Database Mining
Reports & Dashboards
Ad-hoc Queries
Actions
Event Execution
Policy Mgmt
Ext
erna
l D
ata Social
3rd party
High Velocity
High Volume
Open API
Customer Activities
A
B
C
D G
Marketing
Customer Care
Users
NOC/SOC
Network Planning
...
Marketing
Customer Care
Users
NOC/SOC
Network Planning
...
Campaign Mgmt.
Pro-active Customer
Experience Management
Pro-active Network Mgmt
Real time Scoring & Decision Mgmt.
...
Deploy Model
Policy Management
Data Integration ETL
Deduplicate
Standardize
Identity Resolution
Network Topology
Data
Application & Usage
Data Customer
Data
Capture Changes
Un-Structured
Data Hadoop
E
E
Structured Data
Insight F Search, Pattern Matching, Quantitative, Qualitative
Personally Sensitive • Information that can be misused to harm a person in financial,
employment or social way. (Names, Social Security Number, Credit Card, etc.)
Network Sensitive • Information that can be misused to breech or disable critical
network communication (Circuit Identifiers, IP Addresses, etc.)
Corporate Sensitive • Information that can misused to compromise the competitive
position of a company (Operational Metrics, etc.)
6 steps that work together to achieve an acceptable and manageable level of data security
Processes & Information assets
Audit
Manage
Define process
Implement Controls
Assess Risk
Data masking requires a combination of process, templates and tools
Our approach brings together data masking infrastructure using DataStage and ProfileStage, combining with Masking on Demand plug-in using Optim technology.
InfoSphere Analyzer Optim, DataStage
Tools
Templates Masking Utilities - Incremental Autogen - Swap - Relational Group Swap - String Replacement - Universal Random
Data Definitions - Customer ID - Name - Address - Credit Card No - Social Sec No - Etc.
Identify Select Verify Implement
Reusable Processes
Validate
IBM Big Data Advanced Analytics Platform (AAP) Architecture
A
B
C
D G
AAP Capabilities High Performance Historical analysis Model Based Predictive Analytics Real-time scoring, classification, detection and action
Visualize, explore, investigate, search and report
High Performance Unstructured Data analysis Discovery Analytics Take action on analytics
F
Information Interaction
Analytics Engine
Prediction / Policy Engine
Sense, Identify,
Align
Reports
Geo/Semantic Mapping
Dashboards
Simulation
Outcome Optimization
Model Creation
Semi Structured
Data
Dat
a R
epos
itorie
s
Network Events
Network Policies C
ontin
uous
Fee
d S
ourc
es
XDR
Batch Data
Data for Historical Analysis
Deploy Model
Streaming Engine
Streaming Data Categorize, Count, Focus
Score, Decide
Historical Data Models
In Database Mining
Reports & Dashboards
Ad-hoc Queries
Actions
Event Execution
Policy Mgmt
Ext
erna
l D
ata Social
3rd party
High Velocity
High Volume
Open API
Customer Activities
A
B
C
D G
Marketing
Customer Care
Users
NOC/SOC
Network Planning
...
Marketing
Customer Care
Users
NOC/SOC
Network Planning
...
Campaign Mgmt.
Pro-active Customer
Experience Management
Pro-active Network Mgmt
Real time Scoring & Decision Mgmt.
...
Deploy Model
Policy Management
Data Integration ETL
Deduplicate
Standardize
Identity Resolution
Network Topology
Data
Application & Usage
Data Customer
Data
Capture Changes
Un-Structured
Data Hadoop
E
E
Structured Data
Insight F Search, Pattern Matching, Quantitative, Qualitative
Buddies, Hangouts, Globtrotters Areas of mobility analytics
n Individual Lifestyle and Usage profiles
n Popular Locations with specific profiles
n Who are the Buddies
n Predicting where people go
Who Are You? Homebody Daily Grinder Delivering the Goods Globetrotter Sofa Surfer
10 Top Hangouts
Mobile ID Buddy Rank
2702 1
1256 2
8786 3
4792 4
8950 5
What are Profiles
• Lifestyle Profiles are defined by marketing analysts for specific use cases or marketing programs
• Usage Profiles are created using data mining algorithms and define how a person uses services during the day
• Location Affinity is created with algorithms and determines preferred locations for individuals throughout the day and week
• Together these uniquely define a person with relation to how
the retailer or marketer might want to market to them
Creating Groups of Mobility Profiles Enables Better Prediction for Certain Groups
l profiles breakdown like this
l Homebody, doesn't visit too many unique locations
l Daily Grinder, back and forth to work, quiet weekends, makes stops along the way
l Norm Peterson, inside the lines, no deviations
l Delivering the goods, no predictable patterns, many different locales during the day
l Globe Trotter, either not in town, or keeps their phone turned off
l Rover Wanderer, spends evenings at various location (sofa surfers www.couchsurfing.org)
l “Other”, is a group hard to categorize
By Profile, when is it easy or difficult to predict where they will be?
Profile Day Time Predictability
Daily Grinder Thursday Dinner Highest
Daily Grinder Friday Afternoon Lowest
Homebody Saturday Night Highest
Homebody Wednesday Morning Lowest
These are the 2 most predictable profiles, yet there is diversity in their predictability. To best communicate with Daily Grinders, contact them on Thursday Afternoons just before dinner
Preferred Locations of by profile type at Lunchtime Weekdays (Central Stockholm)
Delivering the Goods
Night Shifters
Daily Grinders
What analysis is available (Anonymous Data)
From the mobility profiles, summarized, anonymous analysis is available
l Summarized to ensure anonymity, analysis of popular locations by time of day and profile of subscribers is possible
l For retailers this information can help understand what types of people are nearby at lunch time
l What types of people prefer which areas. Some obvious results are Globe Trotters go to airports, Daily Grinders go to office buildings. Other non-obvious results show up also.
l Are there predictable patterns that we can use to target certain groups in the future?
What Makes this Possible?
l Using the power of Netezza and modeling capabilities of SPSS we can literally throw all the data at data mining algorithms and create discrete clusters of subscribers by activity, mobility
l Apply the data mining outputs to the entire subscriber base by creating detailed specific analyses for each subscriber refined by the mobility profiles
IBM Big Data / Advanced Analytics Value Proposition
All Telco Data Combine Network Data (usage, performance, capacity), Billing Call Detail Records, Subscriber, Channel, Policy, Device, Social etc.
At Scale Ability to manage the stored Petabytes of data and incoming billions of records per day
At Speed of Business
Only IBM
Ability to process data and analytics in real time and close to point of origination to support emerging use cases such as Location Based Services (LBS) and Machine to Machine (M2M)
Only IBM can deliver the complete end to end technology and skills to capture quickly the new ERA value of Telco Big Data
Communities • On-line communities, User Groups, Technical Forums, Blogs, Social
networks, and more o Find the community that interests you …