Close Encounters with Data Science Oct 28, 2015 Geoff Yuen, Ph.D. VP Emerging Technology, PCCW [email protected]
Close Encounters with Data Science
Oct 28, 2015
Geoff Yuen, Ph.D. VP Emerging Technology, PCCW [email protected]
What’s new about data ?
• Data = values of qualitative or quantitative variables, belonging to a set of items (usually population)
• Data = often unstructured (without pre-established data model), usually raw file, different formats
chat Genome-DNA base pairs picture
Lots of Data ≠ Insight
Data itself is not useful, we need insights !
Its easy to get lost in your data
Jiawei Han. Abel Bliss Professor, Department of Computer Science, UIUC; “Pattern Discovery in Data Mining” Coursera online course with 75000 students 2/2015
39
%
39
2G Network
3G Network
900 MHz
1800 MHz
2100 MHz
2013 4G Network
The O2 mobile network has hundreds of cells to measure the trends in footfall across the country (Telefonica UK)
Network Data
39
%
39
Easier to use
Further protecting
anonymity
Extrapolated to
represent local
population
Footfall is rendered into 200 x 200 metre grid squares
200 x 200 Grid
Drilling into footfalls demographics
…
“US has killed Osama Bin Laden” • average of 3,000 tweets per second • 27,900,000 tweets in 2.5 hours • peak of 12,384,000 tweets in one hour
Viral Social Data : From 1 to 14.9 million tweets in 5 minutes (1st May 2011)
…
“US has killed Osama Bin Laden” • average of 3,000 tweets per second • 27,900,000 tweets in 2.5 hours • peak of 12,384,000 tweets in one hour
Viral Social Data : From 1 to 14.9 million tweets in 5 minutes (1st May 2011)
The data is the second most important thing
Jeff Leeks, Assistant Professor of Biostatistics, Data Science Program , John Hopkins University :
Focus on the problem first …
Facebook “Likes” Predicting Personality Facebook can predict personality based on annotated data better than humans
… except for spouse
http://www.pnas.org/content/112/4/1036.full.pdf
What’s New About Analytics
• Golden Age of Analytisc (1995-) Statistical Machine Learning has contributed many much more powerful algorithms than simple regression (list modified from Seni Giovanni, A9):
• 1983 CART (Tree) • 1996 Lasso • 1996 Bagging • 1997 AdaBoost • 2001 Random Forest • 2003 Learning Ensembles • 2004 Regularization & Boosted Lasso • 2005-2013 Deep Belief / Deep Learning
Many ways to predict and classify structured and unstructured data now !
1. Kinect Posture Detection
Kinect detection of body segments
Goal: Estimate Pose from Depth Image
A single input depth image is segmented into a dense probabilistic body part labeling, with the parts defined to be spatially localized near skeletal joints of interest
From depth images to joint positions in 3D
Challenges
• 3 trees each of depth 20 from 1 million images were trained
• Get 3D models for 15 bodies with a variety of weights, heights, etc.
• Synthesize mocap data for all 15 body types
• Capture and sample 500K mocap frames of people kicking, driving, dancing, etc.
Get Lots of Training Data into ‘3 trees’
Kinect's reliable detection of body segments is based on successful application of a famous
analytic algorithm (random forest)
Opportunities
What application areas can benefit ? Rehabilitation, motion training (martial arts, tennis, dry land training), elderly fall detection
With aging population, fall detection and related services can be a major opportunity • Australia : 30% of adults over 65 experiencing at least one
fall per year, group predicted to increase from 14% to 23% (8.1 million) in 2050, costing $1.4 billion by 2051.
• China : 1405 mil vs 24 mil, a factor of 58 bigger !
Recommend for HK : elderly fall detection and motion training
Flyby Science is hard!
Flyby Science (typical)
Status Quo: Respond in days
Onboard analysis: Respond in minutes
NASA JPL: better flyby surface feature recognition by random forests
2. Deep learning
By 2017, 10 % of computers will be learning rather than processing (Gartner 2013)
Page 27
Structured Data Unstructured Data
Regression
Linear or Logistic
Problem specific
Learning structure in data
non-Linear (polynomial)
Knowledge specific
Big Data finally found its analytic partner : deep learning
CIFAR-10 Units: accuracy %
Rank Results (%) Method Venue
1 94 Lessons learned from manually classifying CIFAR-10 unpublished 2011
2 91.78 Deeply-Supervised Nets arXiv 2014
3 91.2 Network In Network ICLR 2014
4 90.68 Regularization of Neural Networks using DropConnect ICML 2013
5 90.65 Maxout Networks ICML 2013
6 90.61 Improving Deep Neural Networks with Probabilistic Maxout Units
ICLR 2014
7 90.5 Practical Bayesian Optimization of Machine Learning Algorithms
NIPS 2012
8 89 ImageNet Classification with Deep Convolutional Neural Networks
NIPS 2012
9 88.79 Multi-Column Deep Neural Networks for Image Classification CVPR 2012
10 84.87 Stochastic Pooling for Regularization of Deep Convolutional Neural Networks
arXiv 2013
• The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class.
• There are 50000 training images and 10000 test images.
• Classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
Error Back propagation
Error Back Propagation
Error Back propagation
Parallel Error Correction
Train this layer first
Learning Layer by Layer
Train this layer first
then this layer
The new way to train multi-layer NNs…
Train this layer first
then this layer
then this layer
The new way to train multi-layer NNs…
Train this layer first
then this layer
then this layer
then this layer
The new way to train multi-layer NNs…
Train this layer first
then this layer
then this layer
then this layer finally this layer
The new way to train multi-layer NNs…
EACH of the middle layers is trained to be
an auto-encoder
... basically forced to learn good features coming from the previous layer
The new way to train multi-layer NNs…
Deep Learning for unstructured data
• Previous paradigm for feature detection and prediction from data is based on modelling and optimization. “Deep learning” have now surpassed related performance in many problems from various researchers around the world.
“Tech 2015: Deep Learning And Machine Intelligence Will Eat The World” Forbes 12/2014
• Deep learning scale well with big data to learn “layering of knowledge” in hidden
layers without handcrafting of feature detectors as past machine learning methods. Convergence time proof for RBM.
• Demonstrated impressive improvements in diverse areas : speech recognition, object recognition in images, targeted advertising, fraud detection, personalization • Speech recognition : Microsoft, Google & Apple competing mobile “digital assistants” (Google Now vs Siri vs Cortana
9/2014) Digital assistants will drive mCommerce & 50% US digital purchases in 2017 (Gartner) • Object recognition : Facebook
Mining user images for intentions (NYT) • Real-time translation : Skype • World Cup / NBA Predicting 2014 (MS) • Others : Baidu, IBM, Yahoo, Tencent, Netflix, Adobe, NEC, Toyota • Telco centric vendors : Wise-athena, Dataspark, Zettics
Deep learning has created breakthroughs in object and speech recognition.
But also watch other areas : sports prediction, natural language processing, churn prediction, targeted advertising, customer segmentation
2014 Survey of Deep Learning Vendor Claims Previous Accuracy
Data used to train model
Latest Accuracy
Company
Speech Recognition 75% 680 speakers, 10 sentences each
94% (2013) Google, IBM, Skype, MS
Object recognition 70% 1.2 mil images 95% (2015) Baidu, Google, Facebook
Target Advertising <1 % (Banner Ads)
220K users 22% NDA
Personalization na 220K users
27% NDA
Churn Prediction (Telco)
69% (SAS) 300 mil CDRs 1.8 mil users
82% NDA
Dealer Fraud Detection (Telco)
<40% (reactive)
700 mil CDRs 1.2 mil users
80% (predictive)
NDA
• Other big companies in related efforts : Baidu, IBM, Yahoo, Tibco, Tencent, Netflix, Adobe, NEC, Toyota
Speech Recognition : the race is on
Contextual Mobile Targeting Contextual & unstructured data using machine learning technology also improve advertising accuracy +219 %
43
Customer visibility: Accuracy and Algorithm speed
43
Manual test of the algorithm
• Several camera can observe
same area
• Aggregated signals with
proper threshold will perfectly
match
Algorithm speed
• Calibration: manual
• Runtime: 60 msec/frame
0
0.2
0.4
0.6
0.8
1
1.2
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59minute
Ground truth(is person infront of ATM 7)
aggregatedsignal
44
Utilization
Average daily
utilization is 10%.
Highest values
(20%) are on
weekends,
Saturdays mostly,
except Chinese
New Year. Lowest
utilization is on the
11th of March (1%).
Recorded coverage
There is recording in
the 30%-90% of the
hours and the 10%-
70% of total time.
This highly correlates
with daily utilization,
but the weekly cycle
is more obvious.
0%
5%
10%
15%
20%
25%
20
15
02
15
20
15
02
16
20
15
02
17
20
15
02
18
20
15
02
19
20
15
02
20
20
15
02
21
20
15
02
22
20
15
02
23
20
15
02
24
20
15
02
25
20
15
02
26
20
15
02
27
20
15
02
28
20
15
03
01
20
15
03
02
20
15
03
03
20
15
03
04
20
15
03
05
20
15
03
06
20
15
03
07
20
15
03
08
20
15
03
09
20
15
03
10
20
15
03
11
20
15
03
12
20
15
03
13
20
15
03
14
20
15
03
15
20
15
03
16
20
15
03
17
20
15
03
18
20
15
03
19
20
15
03
20
20
15
03
21
20
15
03
22
20
15
03
23
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
20
15
02
15
20
15
02
16
20
15
02
17
20
15
02
18
20
15
02
19
20
15
02
20
20
15
02
21
20
15
02
22
20
15
02
23
20
15
02
24
20
15
02
25
20
15
02
26
20
15
02
27
20
15
02
28
20
15
03
01
20
15
03
02
20
15
03
03
20
15
03
04
20
15
03
05
20
15
03
06
20
15
03
07
20
15
03
08
20
15
03
09
20
15
03
10
20
15
03
11
20
15
03
12
20
15
03
13
20
15
03
14
20
15
03
15
20
15
03
16
20
15
03
17
20
15
03
18
20
15
03
19
20
15
03
20
20
15
03
21
20
15
03
22
20
15
03
23
Sat
Sat
Sat Sat
Sat
Sat Sat
Sat
CNY School holiday
Data error
Daily utilization
45
Customer demography: Accuracy and Algorithm speed
Manual test of the algorithm
• Average age/gender accuracy of
algorithms with 48x48 = 92%
• Our current algorithm at the desk with
face size of 40x40 = 72%
• Accuracy will be improved up to 85%,
using tilted face + body corpus
Algorithm speed
• Calibration: one-time
• Runtime: irrelevant
Opportunities What application areas can benefit ? • Internet : Baidu targeting advertising, Facebook sentiments
from face photos • Commercial : fraud detection, churn prediction, food
detection, weapons detection • Others : disability assistance, object recognition for the blind,
speech recognition for the deaf, cancer tissue recognition
Specific Application Example • Bank customers recognition
Recommend for HK : biggest market impact may be in health image processing and online education
3. Networks
How Google beat previous search engines ?
Aside from searched content, also use url data patterns (links)* An additional datatype can make a huge difference ! * Eric Schmidt “How Google Works”; also see http://www.economist.com/node/3171440
Genetic Basis of Diseases
Asthma : known to have multiple variant gene sequences
“ Simple Regression ” “ Multivariate Sparse Lasso Regression ”
Novel statistical method allows for joint network analysis to correlated phenotypes
Eric Xing (2014)
Advantages
• Greater power to detect weak associations
• Fewer false positives
• Joint association to multiple correlated phenotypes
Asthma Trait Network
53
FB data only
Asian Telco data versus Facebook - 1
Analysing family relations with graphs
Asian Telco data versus Facebook - 2 Telco data only
+
Campaign Targeting using URL + Social Data Types Response rate
Normal / Control 0.20%
With Social 0.49%
Social + URL 2.30%
Romantic Partner Relationship Prediction Data Types Accuracy
SMS No. 25%
SMS No. + CDR graphs 75%
SMS No. + CDR & Location graphs 85%
…
Combined Social Networking
Graph
1. Improved demographic prediction : Age (45% -> 63%), Gender (45% -> 70%) 2. Inferring romantic partner from SMS/CDR 3. Inferred family relationships, colleagues & communities
Results :
CDR Facebook
Location
URL
Survey Registration
Loyalty
Telco + FB data
Telco Data and Facebook combined !
•Wave 1 churners with red •Wave 2 churners with pink •Own customers with yellow •Competitor customers with green •Very active customer with blue
Finding: Wave 1 (red) Churners are contagious (followed by pinks) when local community members are less embedded in the network
Viral churn in service providers : prioritize key opinion leaders before they leave !
Capturing network properties can improve prediction
• Finding friend of a friend in social network requires one join operation in relational database (RDBMS), so for six degrees of separation, six joins are required. Graph DB can solve this with six simple traversals which is fast and scalable to millions
Depth (how many level of friends of friends)
Execution Time (seconds)
Result Count
MySQL
2 0.016 ~2,500
3 30.267 ~125,000
4 1,543.505 ~600,000
5 Not finished (days) N/A
Neo4J (Graph db)
2 0.01 ~2,500
3 0.168 ~110,000
4 1.359 ~600,000
5 2.132 ~800,000
• Performance RDBMS joining suffered beyond 2 levels due to the huge Cartesian product resulted from each join operation.
Real Life Benchmarks - A MySQL DB with 1M users and each user has 50 friends.
How to learn network properties ?
2
Opportunities
What application areas can benefit ?
• marketing: recommendation, churn and loyalty
• health: family social disease inheritance, personalized medicine, health education and engagement
• education: socially assisted
Recommend for HK :
digital marketing, education and health
Conclusion
Advancing technologies to derive insights from increasing types and amounts of data points to many new opportunities ahead
Questions ? Email [email protected]
Special Thanks to : Mr. William Mak