Data. Changes. Everything. Martin Willcox [@willcoxmnk], Director Big Data Centre of Excellence (Teradata International) May 2016
Data. Changes. Everything.Martin Willcox [@willcoxmnk], Director Big Data Centre of Excellence (Teradata International)
May 2016
2
Agenda
From transactions and events – to
interactions and observations
The five things that you need to
know about smart things
Succeeding with Big Data
Summary and conclusions
3 © 2016 Teradata
From transactions and events – to interactions and observations
4 © 2016 Teradata
Big Data’s three new waves
People interacting with things
People interacting with people
Things interacting with things
Clickstream / social / mobile interaction data enables Amazon, LinkedIn, Netflix, etc. to go social (“people who like what you like also like…”)
2
Analysis of web / clickstream data enables Google, Amazon, eBay and others to achieve “mass customisation”.
1
Increasing instrumentation is now leading to the emergence and optimisation of “the Internet of Things”.3
5 © 2016 Teradata
Up to 88% of purchasers on some
ecommerce sites search for products.
75% view just the first page of results.
35% view just the top 3 results.
Search is becoming more-and-more
important as more-and-more online
customer journeys are made using
smartphones and mobile devices.
1Understanding what customers are reallysearching for online
6 © 2016 Teradata
Understanding what customers are reallysearching for online
What Does Under Performing On-site
Search Behavior Look Like?
Queries that Return Zero Results
Immediate Exit After Initial Search
Multiple Search Attempts
Search as the Last Event of Session
No Conversion
1
7
Search “toddler long sleeve top”
And zero results returned… yet
1Understanding what customers are reallysearching for online
8
1. Text-Parser (Tokenization)
2. TF-IDF
3. Cosine Similarity
4. Naïve Bayes Text Classifier
5. nPath
6. cFilter
7. Graph
8. SVM (Support Vector Machines)Text Analytics
Web logs with customer search
words & queries
Post search navigation and
conversion
Update
unlimited
products and
improve
unlimited
queries per day
On-site search optimisation typically results in 2.0% or better improvement in search conversion – which often represents millions of dollars, even for medium-sized eCommerce sites.
1
Predict keywords by product
Generate file with keywords by product
Integrate keywords in content
management
Understanding what customers are reallysearching for online
9
Realising cross-selling opportunities with smart recommender systems
© 2016 Teradata
2
Recommendations account
for up to
30% of sales at Amazon,
50% of connections
made on LinkedIn and
75% of viewings on Netflix.
10
Realising cross-selling opportunities with smart recommender systems 2
Products
Customers
Recommended to Customer 1
Customer 1 Customer 2 Customer 3
Recommended to Customer 2
11
Understanding offline customer journeysHow do customers shop (physical) stores?
© 2016 Teradata
3
Which paths do customers who convert
take through the store?
Which paths do customers who don’t buy
take through the store?
Which departments do customers visit
multiple times?
How many customers are in different parts
of the store at different times of day?
And how many staff?
12 © 2016 Teradata
3
10,00,0
24,1
7
Understanding offline customer journeysHow do customers shop (physical) stores?
13 © 2016 Teradata
The five things that you need to knowabout smart things
14 © 2016 Teradata
Even when they
are not lying,
sensors may not
tell the whole
truth
2Extracting useful
signal requires
“multi-genre”
Analytics – and
additional data
3
Sensors typically
don’t measure
the quantity of
interest directly
4
By itself, sensor
data is
frequently not
actionable
5
#TFTTWNTAWWTASASD
Don’t assume that
sensor data is
accurate,
complete and
consistent; do
apply Information
Management best-
practices.
Capture raw sensor
data wherever
possible; otherwise,
understand how
sensor data has
been summarised.
Plan to support a
wide variety of
Analytics – path /
pattern / graph /
time-series / text -
just to prepare an
ADS for modelling.
Some model
scoring may take
place on the smart
device itself; build
models centrally,
but be prepared to
deploy them
widely.
Integrate your Data
Lake and
Data Warehouse
environments, so
that observation
and transaction
data can be
joined.
Sensors
sometimes lie
1
15 © 2016 Teradata
Extracting useful signal from sensor data requires“multi-genre” Analytics – and additional data #3
16 © 2016 Teradata
Da
taA
na
lytic
sP
roc
ess
Raw sensor data
Raw sensor data
N/A
Capture full-fidelity
data to enable use-
case specific event
detection
Cleansed sensor data
Raw sensor data
from adjacent
sensors; Reference,
Master data
Interpolation, neural
networks, FFTs,
smoothing.
Interpolation of missing
values, “virtual sensor”
correction for drift, re-
calibration, etc., etc.
Event detection
Alerts data; “whole
fleet” sensor data;
Environmental data
Time-series, Path,
Pattern, Similarity
Identification of
changes of state;
signature matching
Path-to / Event association
“Whole device” and
“whole fleet” sensor
data
Path, Graph,
Clustering,
Co-occurrence
Comparison and
correlation with other
system / device events
Labelled sensor data
Maintenance and
Operations data
Text, Relational
Comparison and
correlation with human
observations
Extracting useful signal from sensor data requires“multi-genre” Analytics – and additional data #3
17 © 2016 Teradata© 2016 Teradata
Succeeding with Analytics
18 © 2016 Teradata
If Analytics were easy,all companies would be “data-driven” by now
Today, twice as many organizations
feel that they are generating new
ideas and opportunities from
company data “to a great
extent” (25 percent versus
12 percent in 2009)…
On the other hand, just over
one fifth of respondents
(22 percent) said they are
“very satisfied” with business
outcomes driven by their
analytics investments to date.
Analytics in Action. Breakthroughs and Barriers
on the Journey to ROI, Accenture, 2013
“The companies that had the
data they needed and used it
to make decisions (instead of
relying more on intuition and
expertise) had the highest
productivity and profitability.
Specifically, the most
data-driven companies
had 4% higher productivity
and 6% higher profits than
the average in our sample,
all else being equal.”
Andrew McAfee & Erik Brynjolfsson
19 © 2016 Teradata
Why aren’t more organisations more successful with Analytics?
Data is
scattered across
the organisation
in silos
1 Multiple
techniques,
algorithms,
workflows
2 Skilled and
experienced
resources are at
a premium
3High “failure”
rate requires low
cycle times
4 Crossing
the chasm from
the data lab –
to production
5
20 © 2016 Teradata
Start with a presumption for integration#1
Centralise
interaction and
observation data
in a Data Lake
Integrate
transaction and
event data in a
Data Warehouse
Enable
“run-time”
integration
between the Data
Lake and the Data
Warehouse
“Push-down”
complex Analytic
processing to the
data
21 © 2016 Teradata
Support multi-genre Analytics#2
z
SNAP™ FRAMEWORK
INTEGRATED
OPTIMIZER
INTEGRATED
EXECUTER
UNIFIED SQL
INTERFACE
STORAGE SYSTEM
AND SERVICES
STATSTEXT
TMAP REDUCESQL GRAPH
FILE STORECOLUMN STOREROW STORE
PATH
YARN (Cluster Resource Management)
HDFS (Redundant Reliable Storage)
Support for multiple
processing engines –
SQL, MapReduce, BSP
/ Graph, etc., etc.
Support for multiple
data representations
Integrated
Optimisation /
Execution framework
enables
polymorphism
Aster 7.0 (available
Q3 2016) will be
YARN-native –
bringing complex
Analytics direct to
the Data Lake
27 © 2016 Teradata
Summary & Conclusions
28
“NEW DATA”
Clickstream, Social, Call Centre Agent
Notes, Machine Logs, etc., etc., etc
“NEW” ANALYTICS
Path, Graph, Time-Series, Pattern
Matching, Text, etc., etc.
© 2016 Teradata
Big Data = “New” Data + “New” Analytics
29
The machines are coming!From transactions and events - to interactions and observations
© 2016 Teradata
Simple computing devices are now so inexpensive that increasingly everything is instrumented and can be measured;
But data generated by machines are still just
data – we can’t automatically assume that they are complete, consistent and accurate;
And the value of sensor data increases exponentially when they are integrated – with
other sensor data and with transaction and event data.
30
Data. Changes. Everything.
© 2016 Teradata
Digitization, in short, is not a great equalizer that drives all
companies toward similar processes and outcomes.
Instead, it's driving the leaders and laggards further apart.
Andrew McAfee & Erik Brynjolfsson
3131 © 2016 Teradata
@willcoxmnk