Currently on the Cloud
1 Our situation in 2014 2 How we improved 3 Sweet things 4 More sweet things 5 Future
1 Our situation in 2014 2 How we improved 3 Sweet things 4 More sweet things 5 Future
Our situation in 2014
1st gen “HSP” – (2014) from cybercafe/漫画木さ/PC방 platform
2nd gen ECDH-based key exchange
Platform Billing/AAA/Monitoring, etc. & Game servers
“LGC” – (cloud release) 3rd gen: “Trident” – (current)
Globalization Issues abroad
Loading… Loading… L o ad i n g … L o a d i n g …
Fail !
Process
Dev/QA/Sandbox/REAL…
Get VM
Get L4 binding
ACLs/storage, etc.
From hours to days…
Thus, the Game Cloud project began…
1 Our situation in 2014 2 How we improved 3 Sweet things 4 More sweet things 5 Future
Global
In our case GSLB (Global Server Load Balancing)
HAProxy instead of hardware L4 Multi-team effort
client, server, cloud We get
more flexibility less latency
LINE Global POP
New York Tokyo
Seoul
HK
Singapole
Beijing Frankfurt
a a
Network layer control High latency Fit for cloud
Global Testing in Thailand
TH KR
GSLB
TH SG KR
HAProxy
500
1000
1500
2 12 20 6 16
Process
Dev
Ops Government
The structure of your organization affects the structure of your software. And vice-versa!
DevOps Small Startup CHOOSE
Process
Progressive / Easy, simple
Requirements for our new platform : We have many third parties and technology stacks involved...
Etc.
Process For ourselves
simple reliable future proof
Process Why not…
Docker Swarm or CoreOS
Mesos Kubernetes
Process
KEEP THINGS SIMPLE AND RELIABLE!
For distributed systems, minimize coordination
A good paper: https://blog.acolyer.org/2016/01/19/dcft/
• Polling • 1way dataflow • Idempotency • Commutativity • l imited trust
LGC Story
Games planned for release were suddenly canceled but we needed to show results!
Strong “sales” efforts to release other games on the LGC platform
Putting Out Fires
The release was a success,followed by a quick scaling-up, and then our first fires…
TECHNICAL
Riak fire : the system works with Riak down
OE fire: the system works with OE down
Hardware and conf fires (TDI! Soon to come!)
Full container reboot improved our design through limited trust
1 Our situation in 2014 2 How we improved 3 Sweet things 4 More sweet things 5 Future
Domain HAP
Launch service
Configure/ load balance
Expose ports
Bind URL
In one click!
Monitoring
1 Our situation in 2014 2 How we improved 3 Sweet things 4 More sweet things 5 Future
Gearbox Auto Scaling System
High availability Low cost
Why Do We Need It?
How Does It Work?
How Did We Build it?
Data Collector
Monitoring API
Predicator
Metrics
Raw Metr ics
Scaler
States
Game Cloud API
Excute Scaling Gearbox
Challenges
Complex query Plenty of metrics
records millions of records
per day
Scalability of the auto-scaling system itself
Solutions Data Collector
Monitor ing API
Predicator
Metr ics Metr ics
Scaler
Sta tes
Game Cloud API
Excu te Sca l ing
Gearbox
Mod
ule
Sto
rage
ElasticSearch Admin Site
St ra tegy
Sca ler Log
Metr ics 1 .St ra t egy 2 .Metr ics
{ }
Pred ica to r Log
1 .St ra t egy 2 .Sca ler Log
Knife Admin Site
Deploying a New Service
Upgrade
Configuring the Auto-Scaling Policy
Back to Jojo What’s coming next!
1 Our situation in 2014 2 How we improved 3 Sweet things 4 More sweet things 5 Future
Future
QUIC
SDN ACL, IP by container, VLAN etc.
Cloud storage
TDI
Distributed GC – link paper
DCTCP
Image GC
Future
UX
Helpers/presets
Speed
Doc/tests/guides…
Reliability
QUIC Quick UDP Internet Connections
Cloud Storage
SDN- Software-Defined Networking
Container specific IP ACLs VLANs
TDI- Test-Driven Infrastructure
hardware OS configuration
images backup/ restore
Automated testing for
Etc.
firmware, version, etc.
Distributed GC
Max
Avg.GC pause
http://arxiv.org/pdf/1504.02578.pdf
Median
Std.Dev.
Mean
7.847
0.0
2.296
0.579
2.312
GC off
7.743
12.243
2.294
0.582
2.311
Blade
164.206
12.339
2.297
3.395
2.403
GC on
Added in Linux 3.18 https://kernelnewbies.org/
Linux_3.18 http://simula.stanford.edu/~alizade/Site/DCTCP.html
DCTCP- Data Center TCP
- high burst tolerance - low latency - high throughput
http://simula.stanford.edu/~alizade/Site/DCTCP.html
Distributed GC Because we generate tons of Docker images
And more and more…
• AP • Optional CP • Index/search • CRDT • Multiple backends • User ACL support
RIAK/Choose a Safe and Simple Friend Make a deliberate choice of consistency model
SQL
NoSQL
But actually…
With the authorization of Kingsbury Kyle (Aphyr)
DataScript / Maintain Queries