The fastest NoSQL database Talking about Go Performance Try it while I blab ! github.com/aerospike/aerospike-server github.com/aerospike/aerospike-client-go
Jul 11, 2015
The fastest NoSQL database!!
Talking about Go Performance!!
Try it while I blab !! github.com/aerospike/aerospike-server!
github.com/aerospike/aerospike-client-go!
Who am I ?
Brian [email protected][email protected]!
@bbulkow!
TRS-80, PC, Apple II, Vax 11/70, Wang First product: lightpen university teaching kiosk Palo Alto High School ( ‘85 )
Liberate / NetComputer through the boom
10B market cap in 1999, employee 32
2003-2007 “time off” ( startups ) Citrusleaf / Aerospike history
42 year old first-time CEO (me) 2008 Prototype 2010 First sales “get the band back together” 2011+ 3 rounds of funding (Draper, ALP, NEA, CNTP) 70 employees, 2 offices
Does brian know performance?
Brian [email protected][email protected]!
@bbulkow!
Undergrad project: image converter Single pass arbitrary scale and rotate w/ nyquist filters
Novell
Fastest Appletalk server + router available
Starlight Networks 150Mb/sec video server on P133
Liberate
HTML technology for embedded systems
Aggregate Knowledge Realtime reccommendations: 2x faster in first week
Aerospike 10x faster than existing NoSQL, 100x faster than RDBMs
Internet Technology Stack
MILLIONS OF CONSUMERS BILLIONS OF DEVICES
APP SERVERS
DATA WAREHOUSE INSIGHTS
WRITE CONTEXT
In-memory NoSQL
WRITE REAL-TIME CONTEXT READ RECENT CONTENT
PROFILE STORE
Cookies, email, deviceID, IP address, location, segments, clicks, likes, tweets, search terms...
REAL-TIME ANALYTICS
Best sellers, top scores, trending tweets
BATCH ANALYTICS Discover patterns,
segment data: location patterns, audience
affinity
Who uses Aerospike?
theTradeDesk
… to name a few!
Aerospike is High Performance
0 100000 200000 300000 400000 500000 600000 700000 800000 900000
1000000 1100000 1200000 1300000 1400000 1500000 1600000 1700000
Balanced Read-Heavy
Aerospike 3 (in-memory) Aerospike 3 (persistent) Aerospike 2 Cassandra MongoDB Couchbase 1.8 Couchbase 2.0
Easy Clients ( better than JSON )
Go!Python!
Also, analytics
http://www.aerospike.com/community/labs/!
If it is so good, why haven't I heard of it?
Established in 2009 (newer than most)
Used in Advertising – ad exchanges, data exchanges, targeting, real-time bidding, real-time attribution.
Open Sourced in June 2014
When should I use Aerospike? Redis, but with scale & flash
Cassandra, but fast
User data, session data, behavior, fraud…
API billing ~ retail actions ~ recommendations
Up and running in 10 minutes!( vagrant, EC2 …)!
Why does Aerospike care about Go? It’s cool !
Promises performance with expressive ( as an old C guy, Go is aimed at me )
Our customers are diving in, deploying
What about (other versions of other languages)…( sure, they’re cool too! )
Go!
Some old microbenchmarks
Profilers, how to run it
War story: optimizing our Go client
( sure, we know Go isn’t JUST about performance )
Let’s talk about….
Old Microbenchmark In Nov 22 2009, I posted to Golang Nuts
Old Microbenchmark Seconds (Nov 2009) 1.1 - python (CPython 2.6.2, the distro release with no tweaks) "4.6 - go (current hg release) "4.2 - ruby 1.8 (distro release) "1.1 - ruby 1.9 (distro release)
Pike said: "I suspect the great majority of the time in your benchmark is due to Go's current rudimentary garbage collector. Tests like this generate a lot of garbage that is collected slowly. From experiments I've done, a better implementation can make a huge difference. Profiling this test shows at least 50% of the time is in the allocator and collector, as opposed to about 5% printing the string and less than 15% in the map code. A better allocator and collector would make a dramatic change. ""The short answer: the Go runtime is new and completely untuned. The libraries need work too.
Microbenchmark “T1” for i := 0; i < 1000000; i++ { x = ( 2 * x ) + x + 1 }1.96 s (big integer only) Python 1.04 ms (2.17s big.Int) Go 5 ms (2.15s BigNum) Java Good news: go is right in the hunt, but easier to code Amazon m3.xlarge (4 core [email protected])"Python 2.6.9"Go 1.3.3"Java 1.7.0_71"Amazon Linux (3.16)
Microbenchmarks T5 – the 2009 benchmark12.5 sec Python 12.56 sec Go 2.56 sec Java Good news: not slower than python!Bad news: Holy Crap compared to Java
Amazon m3.xlarge (4 core [email protected])"Python 2.6.9"Go 1.3.3"Java 1.7.0_71"Amazon Linux (3.16)
Microbenchmarks – the old code T5 – the 2009 benchmark (slower CPU) for x := 0; x < 1000000; x++ { a := make(map[int] string); for a1 := 0; a1 < 50; a1++ { a[a1] = strconv.Itoa(a1); }}12.56 secondsAmazon m3.xlarge (4 core [email protected])"Python 2.6.9"Go 1.3.3"Java 1.7.0_71"Amazon Linux (3.16)
Microbenchmarks – tune the map T5 – the 2009 benchmark for x := 0; x < 1000000; x++ { a := make(map[int] string, 50); for a1 := 0; a1 < 50; a1++ { a[a1] = strconv.Itoa(a1); }}7.80 secondsAmazon m3.xlarge (4 core [email protected])"Python 2.6.9"Go 1.3.3"Java 1.7.0_71"Amazon Linux (3.16)
Microbenchmarks – remove the Itoa T5 – the 2009 benchmark for x := 0; x < 1000000; x++ { a := make(map[int] string, 50); for a1 := 0; a1 < 50; a1++ { a[a1] = "123456”; }}
5.45 secondsAmazon m3.xlarge (4 core [email protected])"Python 2.6.9"Go 1.3.3"Java 1.7.0_71"Amazon Linux (3.16)
Microbenchmarks – singleton Map T5 – the 2009 benchmarka := make(map[int] string, 50);for x := 0; x < 1000000; x++ { // a := make(map[int] string, 50); for a1 := 0; a1 < 50; a1++ { a[a1] = "123456”; }}2.03 seconds ! Finally better than Java ! Amazon m3.xlarge (4 core [email protected])"Python 2.6.9"Go 1.3.3"Java 1.7.0_71"Amazon Linux (3.16)
Microbenchmarks – Java T5 – the 2009 benchmarkfor (int x=0; x < 1000000; x++) {
HashMap<Integer, String> a = new HashMap<Integer, String>();for (int a1=0; a1 < 50; a1++) {
a.put(a1, Integer.toString(a1) );}
}2.56 secondsAmazon m3.xlarge (4 core [email protected])"Python 2.6.9"Go 1.3.3"Java 1.7.0_71"Amazon Linux (3.16)
Any ideas?
( I haven’t figured it out yet )
Next microbenchmarks ! Float, String
Go Channels vs Java Futures … couldn’t code the java part in time!
Simple TCP echo, but with transactions
Log processing
Ruby 2.1, Go 1.4…
Your votes ?
Profilers pprof is pretty great!
Import in all your main’s, does not seem to hurtimport _ "net/http/pprof”
Add the HTTP listener ( only on flag )
// launch http pprof listener if in profile mode if *profileMode { go func() { log.Println(http.ListenAndServe("localhost:6060", nil)) }()
}
Profilers Take a 30 second snapshotgo tool pprof http://localhost:6060/debug/pprof/profile?seconds=xx
pprof prompt: ‘top 10’ (pprof) top 10
Total: 3852 samples 1187 30.8% 30.8% 1254 32.6% syscall.Syscall 304 7.9% 38.7% 304 7.9% ExternalCode 172 4.5% 43.2% 175 4.5% github.com/aerospike/aerospike-client-go/pkg/ripemd160._Block 137 3.6% 46.7% 233 6.0% runtime.mallocgc 98 2.5% 49.3% 98 2.5% runtime.futex 79 2.1% 51.3% 86 2.2% runtime.MSpan_Sweep 77 2.0% 53.3% 77 2.0% scanblock 68 1.8% 55.1% 68 1.8% runtime.xchg 46 1.2% 56.3% 46 1.2% runtime.epollwait
Profilers (pprof) web
Profilers Good old ‘oprofile’, let’s not forget it –--- ( especially if you can get kernel symbols, hard )
sudo yum -y install oprofile Start capturing sudo opcontrol --reset sudo opcontrol --no-vmlinux sudo opcontrol –start
Run your program sudo opcontrol --dump sudo opcontrol --shutdown
Dump your resultsudo opreport -l --demangle=smart --debug-info
Cheat Sheet http://www.bonsai.com/wiki/howtos/tuning/oprofile/
Profilers opreportsamples % linenr info image name app name symbol name 28106 56.5877 (no location information) no-vmlinux no-vmlinux /no-vmlinux 6216 12.5151 rand.go:76 benchmark benchmark math/rand.(*Rand).Int31n 3940 7.9327 rng.go:232 benchmark benchmark math/rand.(*rngSource).Int63 1987 4.0006 benchmark.go:255 benchmark benchmark main.randString 1584 3.1892 rand.go:43 benchmark benchmark math/rand.(*Rand).Int63 1465 2.9496 rand.go:93 benchmark benchmark math/rand.(*Rand).Intn 1421 2.8610 rand.go:49 benchmark benchmark math/rand.(*Rand).Int31 354 0.7127 ripemd160block.go:45 benchmark benchmark github.com/aerospike/aerosp ike-client-go/pkg/ripemd160._Block 349 0.7027 mgc0.c:720 benchmark benchmark scanblock 307 0.6181 malloc.goc:40 benchmark benchmark runtime.mallocgc 205 0.4127 mgc0.c:1783 benchmark benchmark runtime.MSpan_Sweep 138 0.2778 memmove_amd64.s:33 benchmark benchmark runtime.memmove 131 0.2638 asm_amd64.s:600 benchmark benchmark runtime.xchg
Tuning the Aerospike Client
What does the client do?!!Maintain the DHT state!!Keep a connection pool!!Make requests to the right servers!!Box / unbox to wire protocol…!
SIMPLE
Tuning the Aerospike Client Attempt 1: run pprof!!The usual dance of making life!easy for the garbage collector !(just like java)!!pprof worked!!the hot objects showed up!!Cache easily with Sized Channels !!!!
Tuning the Aerospike Client
Attempt 2: oprofile!!oprofile found rand() taking time!!Optimization gave nothing!!… not sure why not …!!Currently happy with throughput!
Tuning the Aerospike Client Latency problem at customer site !!!User validating a server install with a quick Go client!“17 ms average latency @ 20K TPS” --- terrible!!!Server measured at 0.4 ms @ 40k TPS, ! -- ping ok! -- it’s the client!!Where’s the latency source? GC? Green Threads? Network?! -- Profile shows low GC load! -- Hard to measure thread latency!
EC2 m3.xlarge ($0.05/hr)!4 core E5-2670 @ 2.5 Ghz!Bare metal vs Virtual!Centos 6 vs Latest Kernel!Intel SSDs vs RAM!
Tuning the Aerospike Client GO!!!
Java!!!
What happened? • Not sure what happened at deployment !
(yet, suspect old kernel)!
• A week lost by developers using MacOS, Laptop!(MacOS is showing bad latency)!
• C code is running slower – we think it’s random fill of buffer!
• Lesson: just switch to Linux 3.12-ish kernels!
• Lesson: fewer lines ~ 11k Go, 17k Java!
• Lesson: for network / IO, these languages are THE SAME !