8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
1/73
Software Development & Arch @ LinkedIn
1
Sid Anand
QCon SF 2014
@r39132
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
2/73
About Me
2
*
Current Life
Chief Architect @ ClipMine, a video discoverycompany
QCon SF Program Committee member
Dad to a very energetic 2 year old boy
Previous Life
Architect in Search and Distributed Data @LinkedIn
Cloud Data Architect @ Netflix
VP Engineering at Etsy
Software Developer at eBay
@r39132 2
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
3/73
A Closer Look @ LinkedIn
3@r39132 3
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
4/73
4
*
***
Then
Created in 2002 in Reid Hoffmans living room
In its first month of operation, LinkedIn added 4500 members!
@r39132 4
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
5/73
5
*
Then
Created in 2002 in Reid Hoffmans living room
In its first month of operation, LinkedIn added 4500 members!
Now
332M members in 200 countries
2 members sign up every second
>60% of members overseas
In Q314, 75% of new members came from overseas
@r39132 5
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
6/73
6
*
Then
Created in 2002 in Reid Hoffmans living room
In its first month of operation, LinkedIn added 4500 members!
Now
332M members in 200 countries
2 members sign up every second
>60% of members overseas
In Q314, 75% of new members are coming from overseas
Fastest growing demographic is not geographic, its students!
> 10% of user base already and growing!
@r39132 6
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
7/73
7
*
Member-growth started to ramp up during 2011, when we IPOd
2010 : 55M
2011 : 90M (IPO)
2012 : 145M
Q314 : 332M
(note : numbers reflect start of year)
We added ~ same number of users in 2010 than over previous 6 years!
@r39132 7
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
8/73
8
*
***
Employee-growth also started to ramp up during 2011
2010 : 500
2011 : 1K (IPO)
2012 : 2100
Q314: 6K (25% in Engineering)
(note : numbers reflect start of year)
@r39132 8
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
9/73
9@r39132 9
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
10/73
10@r39132 10
Alan Shepard
2ndman in space
5thperson to walk on the moon!
1st
person to hit a golf ball on themoon!
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
11/73
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
12/73
How did LinkedIn scale forcompanyand member growth?
12@r39132 12
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
13/73
Software Development
Challenges
13@r39132 13
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
14/73
14
Circa 2011
On my first day at LinkedIn, I felt pretty excited!
Software Development : Challenges
@r39132
Linux Desktop
8 Core
64GB RamMac Air
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
15/73
15
Circa 2011
On my first day at LinkedIn, I felt pretty excited!
Software Development : Challenges
@r39132
Linux Desktop
8 Core
64GB RamMac Air
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
16/73
16
Circa 2011
Then I tried to compile the code on my laptop!
Software Development : Challenges
@r39132
Linux Desktop
8 Core
64GB RamMac Air
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
17/73
17
Circa 2011
300+ code projects in a single SVN Repo
SVN checkout world &go-to-lunch
Needed a server-grade machine to compile it!
Ant build (world) &go-make-espresso
Almost every WAR was built from source not intermediate JARs
To test your code locally, you needed to locally deploy every service that
your code depended on! (maybe 20)
So, yes, you need a machine that typically lives in your data center!
Software Development : Challenges
@r39132
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
18/73
18
Circa 2011
Assume that your code is now
Written Compiled
Locally Tested
What Next?
Software Development : Challenges
@r39132
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
19/73
19
Circa 2011
500+ developers were checking code into the master branch on the single
repo!
So, someone broke master every day!
So
3 hours to write, build, and locally test code
3 days to commit it!
Software Development : Challenges
@r39132
S f C
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
20/73
20
Software Development : Challenges
@r39132
S f D l Ch ll
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
21/73
21
Now (Solved)
Do what the open-source world does with some improvements!
Break the monolithic repo into many individual Git Repos!
Have WARs depend on intermediate JARsdont not build the world!
Do not deploy the world for local testingjust connect your Dev
machine to a test environment!
What are the improvements?
Software Development : Challenges
@r39132
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
22/73
Software Development
Life Cycle
22@r39132 22
S ft D l t
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
23/73
23
Software Development
@r39132
1. Alice commits code to Git
2. Alice sends a Review Board request
to Bob & Cathy, owners of the files!
3. Both Bob & Cathy give ship-its
4. Alice amends her commit message with :
RB=
BUILD-WAR=
Code Reviews
S ft D l t
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
24/73
24
Software Development
@r39132
1. Alice pushes code to our Gitorious server where the following
verifications:
1. Pre-push Sanity Checks! Must pass of push rejected!
1. Have all owners of the changed files given ship-its?
2. Does the code build?
2. For JAR builds, also build upstream WARs!
3. Run Integration Tests!
Code Push (Git Push)
S ft D l t
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
25/73
25
Software Development
@r39132
1. Assuming that all checks passed, the WAR is now
available
2. Our system automatically deploys all wars to test
servers
3. QA verifies the new builds
QATest / Staging
S ft D l t
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
26/73
26
Software Development
@r39132
1. Service owner Dave canaries the new WAR
2. Our EKG system then compares the canary machine to one control
machine for 1 hour of product traffic for the following:1. CPU, Memory increase
2. Fan-in/Fan-out increase
3. Error rate increase
4. Latency increase
Production - Canary
S ft D l t
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
27/73
27
Software Development
@r39132
1. Service owner Dave reviews the EKG report
2. If it looks acceptable, he promotes the build to the rest of the cluster in all
data centers
Production - Promotion
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
28/73
How did LinkedIn scale forcompanyand membergrowth?
28@r39132 28
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
29/73
Architectural
Practices
29@r39132 29
LinkedIn Architecture
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
30/73
Web
Servers
Oracle
LinkedIn Architecture
@r39132 30
Proto-typical UseCase
A member updates her profile with new skills, job title,
and education
She also accepts a connection request from another
member
Behind the scenes
Web servers commit data to Oracle
What Happens Next?
LinkedIn Architecture
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
31/73
Web
Servers
Oracle
LinkedIn Architecture
@r39132 31
What Happens Next?
Profile Updates She should should become instantlysearchable by her
new skills, job title, & education!
New groups and job ads should be recommended to her
Connection Updates
The news feed should instantly reflect content updates
from her new connection!
Also, based on the new connection, the PYMK widget
should discover a new 2nddegree neighborhood!
LinkedIn Architecture
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
32/73
Web
Servers
(writers)
Oracle
LinkedIn Architecture
@r39132 32
Databus
Search
Caches
Graph
Recommender
Systems
(PYMK, Jobs)
DownstreamStreams
DW
LinkedIn : Architecture
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
33/73
33
We also have a data pipeline to capture high-throughput events
that we need to count!
Databases are not a good place to do high-TP atomic counting!
Kafka is!
This is typically used for ranking signals
E.g. counts member page views to determine who are hot
LinkedIn : Architecture
@r39132
LinkedIn Architecture
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
34/73
Web
Servers
(writers)
Oracle
LinkedIn Architecture
@r39132 34
Kafka
Databus
Search
Systems
Caches
Graph Systems
Recommender
Systems
DownstreamStreams
DW
LinkedIn Architecture : Single Data Center!
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
35/73
LinkedIn Architecture : Single Data Center!
@r39132 35
LinkedIn : Architecture : Single Data Center!
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
36/73
LinkedIn : Architecture : Single Data Center!
@r39132 36
LinkedIn : Architecture : Multi-data Center Project
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
37/73
LinkedIn : Architecture : Multi-data Center Project
@r39132 37
LinkedIn Architecture : Rule 1
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
38/73
LinkedIn Architecture : Rule 1
@r39132 38
Partition your user base across the data centers!
e.g. using Akamai GTM
LinkedIn Architecture : Rule1
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
39/73
LinkedIn Architecture : Rule1
@r39132 39
Problem!
User 1 (mapped to DC1) updates his profile! How will User 2 (mapped to DC2)
see it?
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
40/73
LinkedIn Architecture : Rule 2
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
41/73
LinkedIn Architecture : Rule 2
@r39132 41
Link your data centers together at the data fabric level!
Not a new concept! Cassandra has been doing it for a few years now in the
OLTP database space!
LinkedIns Sources of Truth
We have to make both work in acrossmultiple data centers!
LinkedIn Architecture : Rule 2
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
42/73
LinkedIn Architecture : Rule 2
@r39132 42
Link your data centers together at the data fabric level!
Not a new concept! Cassandra has been doing it for a few years now in the
OLTP database space!
LinkedIns Sources of Truth
We have to make both work in acrossmultiple data centers!
Oracle is fairly easy : we use Oracle
Golden-gate!
Kafka is also pretty easy!
LinkedIn : Kafka Multi-Data Center
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
43/73
LinkedIn : Kafka Multi Data Center
@r39132 43
Kafka
Local
Producer
Consumer
of Local
Events
Kafka Data Center 1
LinkedIn : Kafka Multi-Data Center
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
44/73
LinkedIn : Kafka Multi Data Center
@r39132 44
Kafka
Local
Producer
Consumer
of Local
Events
Kafka
Local
Producer
Consumer
of Local
Events
Kafka Data Center 2Kafka Data Center 1
LinkedIn : Kafka Multi-Colo
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
45/73
LinkedIn : Kafka Multi Colo
@r39132 45
Kafka
Local
Producer
Consumer
of Local
Events Consumerof Global
Events
Kafka
Local
Producer
Consumer
of Local
Events
Kafka Data Center 2Kafka Data Center 1
LinkedIn : Kafka Multi-Colo
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
46/73
LinkedIn : Kafka Multi Colo
@r39132 46
Kafka
Local
Producer
Kafka
Global
Consumer
of Local
Events Consumerof Global
Events
Kafka
Local
Producer
Consumer
of Local
Events
Kafka Data Center 2Kafka Data Center 1
LinkedIn : Kafka Multi-Colo
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
47/73
LinkedIn : Kafka Multi Colo
@r39132 47
Kafka
Local
Producer
Kafka
Global
Consumer
of Local
Events Consumerof Global
Events
Kafka
Local
Producer
Kafka
Global
Consumer
of Local
EventsConsumerof Global
Events
Kafka Data Center 2Kafka Data Center 1
LinkedIn Architecture : Rule 3
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
48/73
3
@r39132 48
Dont make any web service calls between data centers!
It kills latency, which kills availability!
LinkedIn : Architecture
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
49/73
@r39132 49
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
50/73
How did LinkedIn scale forcompanyand member growth?
50@r39132 50
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
51/73
LinkedIn Search
51@r39132 51
LinkedIn Search
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
52/73
52@r39132
Why is Search important to LinkedIn?
Search is a significant income driver!
332Mmembers that recruiters pay to find! (RecruiterSearch)
2M+jobs that companies pay to list so you can find them!
(Job Search)
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
53/73
What Makes LinkedIn SearchUnique?
53@r39132 53
LinkedIn Search : Federated
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
54/73
54@r39132
LinkedIn Search : Federated
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
55/73
@r39132 55
We index many entities
members, jobs, companies, groups, universities, articles, slides, etc..
These are separate (vertical) search-engines!
LinkedIn Search : Federated
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
56/73
@r39132 56
We index many entities
members, jobs, companies, groups, universities, articles, slides, etc..
These are separate (vertical) search-engines!
When a user enters sr software engineer, which index should we look in?
Jobs, members, groups?
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
57/73
LinkedIn Search : Federated
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
58/73
@r39132 58
We index many entities
members, jobs, companies, groups, universities, articles, slides, etc..
These are separate (vertical) search-engines!
When a user enters sr software engineer , which index should we look in?
Jobs, members, groups?
Can we simply send the request to all of the search engines and then showthe most relevant results?
No
Ranks (scores) are not comparable across verticals
What if we pick a vertical based on a user feature?
Job seeker sees jobs, recruiter sees members
Intent Detection : done by Federator
LinkedIn Search : Query Rewriting
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
59/73
@r39132 59
Say a recruiter searches for sr software eng
There are 20+ ways to represent this title
senior swe
sr swe
senior software engineer
LinkedIn Search : Query Rewriting
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
60/73
@r39132 60
Say a recruiter searches for sr software eng
There are 20+ ways to represent this title
senior swe
sr swe
senior software engineer
To solve this, we can use a title standarizer, though not every title may
have a canonical form!
If a standardized title exists, we can rewrite the user query
title:sr AND title:software AND title:eng std_title:sswe234
LinkedIn Search : Query Rewriting
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
61/73
@r39132 61
Say a recruiter searches for sr software eng
There are 20+ ways to represent this title
senior swe
sr swe
senior software engineer
To solve this, we developed a title standarizer!
If a standardized title exists, we can rewrite the user query
title:sr AND title:software AND title:eng std_title:sswe234
Query Rewriting helps by expanding the search space by methods such
as synonym expansion, spell correction, etc So we need it!
LinkedIn Search : Flexible Scoring
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
62/73
@r39132 62
We index many entities!
Companies, Members, Universities, etc
We use different scoring formulas and signals for each vertical
We need a way to easily plug-in different custom scorers!
LinkedIn Search : Open Source
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
63/73
@r39132 63
Leading open source alternatives (e.g. Lucene, ElasticSearch,
SOLR) do not offer these!
Search Federation
Pluggable Query Rewriting
Pluggable and Flexible Scoring
They DOoffer some distributed system management, which we will
have to re-invent unfortunately
LinkedIn Search : Open Source
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
64/73
@r39132 64
Leading open source alternatives (e.g. Lucene, ElasticSearch,
SOLR) do not offer these!
Search Federation
Pluggable Query Rewriting
Pluggable and Flexible Scoring
They DOoffer some distributed system management, which we will
have to re-invent unfortunately
So, we created Galene, LinkedIns new search architecture!
https://engineering.linkedin.com/search/did-you-mean-galene
https://engineering.linkedin.com/search/did-you-mean-galenehttps://engineering.linkedin.com/search/did-you-mean-galenehttps://engineering.linkedin.com/search/did-you-mean-galenehttps://engineering.linkedin.com/search/did-you-mean-galenehttps://engineering.linkedin.com/search/did-you-mean-galenehttps://engineering.linkedin.com/search/did-you-mean-galenehttps://engineering.linkedin.com/search/did-you-mean-galenehttps://engineering.linkedin.com/search/did-you-mean-galene8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
65/73
y Questions?
65@r39132 65
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
66/73
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
67/73
Galene Architecture
67@r39132 67
Galene Architecture : Querying
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
68/73
68@r39132
Federator
Frontend
Browser
Vertical
Search
Node
Vertical
Broker
Query Rewriting (Pluggable)
Scatter-gather across shards
Lucene (optionally sharded) Scoring (Pluggable)
Query Intent Detection
Result Blending
Other
Verticals
.
Galene Architecture : Indexing (Offline)
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
69/73
69@r39132
Federator
Frontend
Browser
VerticalSearch
Node
Hadoop
Vertical
Indexer
Node
Vertical
Broker
IndexDistribution
Service
Offline Index Building and
Distribution
Batch-oriented, built daily
Builds offline ranking and rewriting
models
Rebuilds Indexes when new fieldsadded
Galene Architecture
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
70/73
70@r39132
Federator
Frontend
Browser
VerticalSearch
Node
Hadoop
Vertical
Indexer
Node
Vertical
Broker
IndexDistribution
Service
Offline Index Building and
Distribution
Bit-Torrent-based Index Distribution Service
Pushes new indexes and models to running
services
Galene Architecture
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
71/73
71@r39132
Federator
Frontend
Browser
VerticalSearch
Node
Vertical
Live
Updater
Hadoop
Vertical
Indexer
Node
Vertical
Broker
IndexDistribution
Service
KafkaDatabus
Kafka
Samza
Online Index Updates
Online (near-real-time) indexer
Updates indexes between Hadoop builds
Galene Architecture
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
72/73
72@r39132
Federator
Frontend
Browser
VerticalSearch
Node
Vertical
Live
Updater
Hadoop
Vertical
Indexer
Node
Vertical
Broker
IndexDistribution
Service
KafkaDatabus
Kafka
Samza
Periodic Index
Optimization
Snapshots live data
into a compact format Send ss-index to
search nodes over bit-
torrent
8/10/2019 QCon SF2014 SidAnand Software Developmentand Architecture at LinkedIn
73/73
y Questions?