© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#› vito Nikolay Golov Data Warehouse Architect [email protected] Evolving of Data Warehouse in Avito
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›
vito
Nikolay Golov
Data Warehouse [email protected]
Evolving of Data Warehouse in Avito
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›2
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›3
1) Company IFRS accounts, 2014 and 2013 revenue converted at an RUB/$ exchange rate of 39.6 and 31.9 respectively, 2014 and 2013 CBR average respectively
2) In terms of page views, calculated based on Liveinternet data for Russian top-50 classifieds
3) Number of sold used cars in Russia 2014 according to Autostat is 6.01m. Users indicated 2.4m cars as “sold on Avito”. 2.4/6.01=40%
Source: Internal data, Liveinternet is used for relative size calculation
Clear #1 in Russia
One Country, Many Cities, Five Verticals
Size Relative to #2 Classified Site in Key Verticals by Page Views
Moscow
St. Petersburg
Novosibirsk
N. Novgorod
Kazan
Samara
Rostov
Volgograd
Voronezh
UfaChelyabinsk
Omsk
Krasnoyarsk
Vladivostok
Yakutsk
IrkutskKhabarovsk
1.6x2
Size of Russian
classifieds market
Almost 50%
Page Views from
Mobile
40%(3)
of all used cars sold
in Russia in 2014
sold on Avito
250K – 300K
listers per day, of
which ca. 10% list for
the first time
RUB2,177m
2014 EBITDA(1)
(+221% to 2013)
7.9bn
Page Views in Dec-2014
across all devices
generated by 27mm people
Ca. 9,500+
Local SMEs on
subscription model
24.5mm
Active Items on the site as
of 2014 YE
RUB4,305m
2014 Revenue(1)
(+79% to 2013)
Auto
2.0x
Real Estate
15.8x
Jobs
1.5x
Services
75.8xAll Categories 4.8x
General
75.1x
By far the largest classifieds in
Russia and one of the biggest
general classifieds worldwide
Larger than Yandex in search
queries for many goods and
services
Top-of-mind in many categories
Ca. 600 employees, of which
ca.150 moderators
Profitable since Q2 2013
Selective introduction of
professional listing fees in Q1
2015
Domofond successfully
commercially launched in Q1
2015
Clear #1 in Russia
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›4
2014-2015
1. Craigslist.com (USA)
2. 58.com (China)
3. Avito.ru (Russia)
Avito in the world
2015-2016
1. Craigslist.com (USA)
2. Avito.ru (Russia)
3. 58.com (China)
1bln. $ 2.7 bln. $
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›5
Source: Google Analytics, LiveInternet, Internal data
Avito Business Development
0
400
800
1,200
1,600
2,000
2,400
Jan-09 Jul-09 Jan-10 Jul-10 Jan-11 Jul-11 Jan-12 Jul-12 Jan-13 Jul-13 Jan-14 Jul-14 Jan-15
Weekly Page Views (m)
Q1 2010
Focus on Moscow
and St.Pete
September 2010
Target 13 additional
cities
August 2011
Target total
of 28 cities
Q2 2013
Merger with Slando and
Olx reaffirmed #1 position
in the Russian market
+Vertical
+Listing Fees
+Pro toolsGoods C2C
+RE & Cars
+B2C
+Jobs
+Services
Path from Investment Stage to Cash Flow Generation
Stage 2Stage 1 Stage 3
Position • Competing with others • Ahead of competition • x times ahead of competition
• Heavy investment • Approximately break-even • High EBITDA marginEconomics
• Build user base • Develop business model and build leading brand • Focus on monetization enhancement; attract professional
classifieds market spendFocus
January 2012
Avito has national
coverage
Q2 2014
Launch of Domofond, a
dedicated real estate classified
Q4 2014
Launch of a new revenue
stream: Listing Fees
Traffic growth
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›6
… your future traffic
… all your future data sources
… your future data monetization tools
At the beginning you do not know:
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›7
… your traffic will grow
… you will connect more and more data sources
… you will launch more and more data monetization
initiatives
At the beginning you hope that:
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›8
Data Lake
• Data in a natural state
• Hadoop based
• Schemaless
Alternatives…
Our choice
• Full SQL support
• Single data model
• Normalization of all incoming
data
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›9
SCHEMA
ON
WRITE
Data Lake
SCHEMA
ON READ
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›10
SCHEMA
ON
WRITE
Data Lake
SCHEMA
ON READ
SCHEMA
ON READ
SCHEMA
ON READ
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›11
Powered by HP Vertica
Based on Anchor Modeling methodology
Stores changes of attributes of almost all entities
Avito Data Warehouse is:
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›12
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›13
• Simple rules for generating tables according to business model
• Simple rules for generating ETL processes for those tables
• Simple rules for tuning data distribution and query performance
• Clear for analysts – look like as ER model
Benefits of Anchor Modeling
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›14
Back office
Click stream
BI Team
Antifraud
MDM
CRM
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›15
Users
Adverts
Payment
Back office
Clicks
Searches
Device
Cookie
Avito DWH - beginning
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›16
Users
Adverts
Payment
Clicks
Searches
Device
Cookie
Avito DWH - evolving
Illicit content detection
Fraud estimation
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›17
BI Team
New task
DWH
New question
new reports
new data
new functionality
How data warehouse evolves
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›18
Users
Adverts
Payment
Clicks
Searches
Device
Cookie
Avito DWH - expandingPhone
Bot Net
Geo point
A/B test
marker
Is Human
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›19
Crowdfinding
data samples
for competition
algorithms
Data scientists
for hiring
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›20
Increasing User Engagement
• Personalized
recommendations
• Targeted offers
Self-Serve BI
• Enterprise
reporting to let
business dig deep
into data and
promote data-
centric culture
Product Pricing
• Optimizing VAS
prices for various
products in various
locations using
advanced statistical
modelling methods
Forecasting Avito Business Trends
Reliable
forecasting model
based on extensive
internal database
and applied
statistical methods
Jan-14 Dec-14 Nov-15
Operating Metric Forecast
Increasing Content Quality
Advanced Artificial Intelligence System
that performs 80% automatically
Fraud detection
Illicit content detection
Item duplicate detection
Pricing models
Bot detection
Advertising Optimization
• Network mediation – finding optimal
architecture of advertising networks
composition to maximize revenue
• Real-time user interests detection to
provide unique targeting capabilities for
advertisers
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›21
Avito DWH evolution
4
10
15
0
2
4
6
8
10
12
14
16
Cluster(s) Size(servers)
2013 2014 2015
11
26
51
0
10
20
30
40
50
60
Cluster(s) Size(TB)
2013 2014 2015
300
560
740
0
100
200
300
400
500
600
700
800
ClickStream size(Mln events/day)
2013 2014 2015
3
14
23
0
5
10
15
20
25
Integratedsystems count
2013 2014 2015
© 2014 NIMBLE STORAGE | CONFIDENTIAL: DO NOT DISTRIBUTE ‹#›
#SeizeTheData
Private | Confidential