Turning Low Level Behavioural Signals Into User Profiles Pablo Rosenman, VP Development
Jan 23, 2018
Turning Low Level Behavioural Signals Into User ProfilesPablo Rosenman, VP Development
Adience
Leading the user-centric mobile revolution
- Harness Deep Learning to profile mobile app users
- Distill user/app interaction to actionable segmentation data
2
3
Adience Insights
4
Adience SDK
- Runs on tens of millions of devices
- Runs in the background, without interfering with device’s
operations
- Collects raw data from the system and environment (according
to available permissions)
- Reduces dimensionality and anonymizes the data
- Sends results to the SDK Server
5
SDK Server
- Receives tens of millions of data submissions from the Mobile
SDK installations per day
- It should be able to scale by two orders of magnitude
- It should handle requests quickly, so as not to hang the client
(i.e. Mobile SDK)
- It should avoid losing data
6
SDK Server (Architecture)
- Data is sent from the mobile SDK to an Apache server running
on EC2
- SDK Server verifies validity of incoming data
- Incoming data gets written immediately (no processing) to S3
Amazon EC2Mobile Client Amazon S3
7
SDK Server (Scaling)
- The ELB balances the load on all the servers
- Auto Scaling will make sure there are enough servers to
handle the load
Amazon EC2
Auto Scaling
Mobile Client Amazon S3Elastic Load Balancer
8
Insights Workflow
- Create insights on the device’s owner when new data
arrives from the device
- Doesn’t have to be real-time (as the data arrives), but
shouldn’t be far behind
9
Insights Workflow (cont.)
- Data report sent by SDK consists of:
- Simple data points requiring simple statistic and arithmetic
operations, for example:
- Device model
- OS version
- More complex data matrices requiring matrix operations,
for example:
- Machine Learning features on time series data
- Machine Learning features on photos10
Insights Workflow (Architecture)
- Simple pattern for streamlined processing server application:
- Read input S3 filename from input SQS
- Read the file from the input S3 bucket, and process it
- Write results to file in output S3 bucket
- Send output S3 filename to output SQS
EC2 Servers
Amazon SQS
S3 Bucket Auto Scaling S3 Bucket
Amazon SQS
11
Insights Workflow (Architecture)
- Aggregate the data from all reports to a single device object
- Create insights from all the device’s aggregated data
- Advantages of architecture:
- Scalability
- Decoupling
Insights Servers
Devices Servers
Amazon SQS
Amazon SQS
Reports S3 Bucket
Insights DynamoDB
Table
Deep Learning Servers (GPU)
Devices S3 Bucket
Amazon SQS
12
Adience Events
13
Events SDK
- Receives events based on user interaction with the app
- Some events are automatically implemented (app was started)
- Custom events are the real driving force (user has made an in-
app purchase for $3.99)
- Events should be sent to the Events Server
14
Events Server
- Receives hundreds of millions of data submissions from the
Mobile SDK installations per day
- It should be able to scale by two orders of magnitude
- It should handle requests quickly, so as not to hang the client
- Analytics engine should work on all data from the last 30 days
- Data should be enriched with the user insights
15
Events Server (Architecture)
- All incoming events are written to a file in the local volume
- Once every hour, we close the file in each instance and ship it
to S3
Amazon EC2
Auto Scaling
Mobile Client Amazon S3Elastic Load Balancer
Amazon EBS
logrotate
16
Insights MapReduce
- At the end of each day, all events from that day are in the
events S3 bucket
- We add to these a “mock event” per report sent to the SDK
Server
- Eventually, we wish to compare all the app’s users in the last 30
days to a subset of those users
17
Insights MapReduce (cont.)
- Using Amazon EMR, we aggregate the data per app, device,
day, and event type
- Example: device 0123, on 2016-01-04, in app Blappy Fird,
purchased in-app goods worth a total of $100
EventsS3 Bucket
Raw2DailyAmazon EMR
Mock EventsS3 Bucket
DailyS3 Bucket
18
Insights MapReduce (cont.)
- Using the Daily data for the last 30 days, we run an additional
EMR to aggregate per app, device, and event type
- Example: device 0123, in app Blappy Fird, purchased in-app
goods worth a total of $1000 (in the last 30 days)
- We enrich the data by adding the device’s insights to each record
DailyS3 Bucket
Daily2AggregateAmazon EMR
AggregateS3 Bucket
Insights DynamoDB
Table
19
Insights MapReduce (cont.)
- Accessing DynamoDB per event type is costly
- We know last day’s users - save them to an in-memory cache
DailyS3 Bucket
Daily2AggregateAmazon EMR
AggregateS3 Bucket
Insights DynamoDB
Table
20
Insights Servers
Insights ElastiCache
Insights MapReduce (cont.)
- Using the Aggregate data for the last 30 days, we run an
additional EMR to aggregate per app, country, age, gender, and
subset type
- Example: app Blappy Fird, in the US, for males aged 25-34
who purchased in-app goods worth a total of more than
$500 (in the last 30 days), 70% are tech savvy, 40% are
commuters, etc.
AggregateS3 Bucket
SubsetS3 Bucket
Aggregate2SubsetAmazon EMR
21
Insights MapReduce (cont.)
22
Insights MapReduce (cont.)
- How can we show data on apps that haven’t integrated us?
- Create a mock event per app that we know is installed on the
device!
EventsS3 Bucket
Raw2DailyAmazon EMR
Mock Events
S3 Bucket
DailyS3 Bucket
Daily2AggregateAmazon EMR
AggregateS3 Bucket
Insights DynamoDB
Table
Aggregate2SubsetAmazon EMR
SubsetS3 Bucket
23
24
Next Generation
25
SDK Server (Next Generation)
Amazon EC2
Auto Scaling
Mobile Client Amazon S3Elastic Load Balancer
Mobile Client Amazon S3Amazon API Gateway
AWS Lambda
26
Insights Workflow (Next Generation)
Insights Servers
Devices Servers
Amazon SQS
Amazon SQS
Reports S3 Bucket
Insights DynamoDB
Table
Deep Learning Servers (GPU)
Devices S3 Bucket
Amazon SQS
Reports S3 Bucket Devices
Lambda
Devices S3 BucketDeep Learning
Servers (GPU)
Amazon SQS
StagingS3 Bucket
InsightsLambda
Insights DynamoDB
Table
27
Events Server (Next Generation)
Amazon EC2
Auto Scaling
Mobile Client Amazon S3Elastic Load Balancer
Amazon EBS
logrotate
Mobile Client Amazon S3Amazon API Gateway
AWS Lambda
Amazon Kinesis
Firehose
28
Bonus:ELK with Amazon
29
ELK with Amazon
- Server code sends logs to local ZMQ process
- ZMQ process then asynchronously sends to Kinesis
- Logstash pulls the Kinesis stream, and writes in batches to
ElasticSearch
Server Code
Amazon KinesisZMQ Logstash Amazon
ElasticSearch
30
We’re Hiring!Server Developer
Full Stack Web Developer
Algorithm Developer
DevOps Engineer
THANK [email protected]