Addressing Performance Issues in Titan+Cassandra
Addressing Performance Issues in Titan+Cassandra
Introduction● Nakul Jeirath
● Senior security engineer at WellAware (wellaware.us)
● WellAware: Oil & gas startup building a SaaS monitoring & analytics platform
Transitioned ~2 years ago
Titan+Cassandra Performance Factors● Titan deployment methodology● Cassandra tuning● Titan JVM tuning● Data modeling● Indexing
○ Property indices○ Vertex centric indices
● Query structure● Caching
○ Transaction cache○ Database level cache
● Titan options
Titan+Cassandra Performance Factors● Titan deployment methodology● Cassandra tuning● Titan JVM tuning● Data modeling● Indexing
○ Property indices○ Vertex centric indices
● Query structure● Caching
○ Transaction cache○ Database level cache
● Titan options
Ted Wilmes - Cassandra Summit 2015:Slides: http://www.slideshare.net/twilmes/modeling-the-iot-with-titandb-and-cassandraVideo: https://vimeopro.com/user35188327/cassandra-summit-2015/video/143695770
This talkOur focus will be reads, check out Ted's talk for write optimization
A Toy Example
http://coachesbythenumbers.com/sportsource-college-football-data-packages/
2005 College Football Data
● Team names & conferences● Game record with dates and scores
● Interesting questions:○ Records for all teams in conference X○ Top 25 ranking using record + strength of opponents○ Three team loop (A beat B beat C beat A)
Toy Model
Label: team
name: Purdueconf: Big 10
Label: team
name: IUconf: Big 10
label: beat
date: 11/19/05score: 41-14
gremlin> g.V().count()==>239gremlin> g.E().count()==>718
Test BenchShut downTitan
ClearTitan DB
StartTitan
Load test dataset
Source code:https://github.com/njeirath/titan-perf-tester
Test Runnerpublic class PerfTestRunner { public static DescriptiveStatistics test(final TitanGraph graph, int iterations, PerfOperation op) { DescriptiveStatistics stats = new DescriptiveStatistics(); for (int i = 0; i < iterations; i++) { TitanTransaction tx = graph.newTransaction(); Date start = new Date(); op.run(tx); Date end = new Date(); stats.addValue(end.getTime() - start.getTime()); tx.rollback(); } return stats; }}
Pass in test query as LambdaStart new transaction
Run test query
Record time
Rollback transaction
Anatomy of Gremin Queries● Simplest form of OLTP query
○ picks an entry point(s) to graph ○ traverses from initial vertices
Initial graph entryselection
Edge traversal
Example:
How many games did a Big 10 team win?
g.V().has('conference', 'Big Ten Conference').outE('beat').count()
Selecting the Entry PointTypically won't have vertex ID(s) to select directly
Will select based on one or more vertex property
Feasible to scan all vertices in small graphs
Becomes prohibitively expensive on large graphs
Start from these
Property IndexTest query: g.V().has('conference', 'Big Ten Conference').toList()
Output:
07:45:24 WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx - Query requires iterating over all vertices [(conference = Big Ten Conference)]. For better performance, use indexes
Titan is nice enough to warn us of this issue
Creating Index on "Conference" Propertymgmt = graph.openManagement()conf = mgmt.getPropertyKey('conference')mgmt.buildIndex('byConference',
Vertex.class).addKey(conf).buildCompositeIndex()mgmt.commit()mgmt.awaitGraphIndexStatus(graph, 'byConference').call()mgmt = graph.openManagement()mgmt.updateIndex(mgmt.getGraphIndex("byConference"), SchemaAction.REINDEX).get()mgmt.commit()
Access graph management, create composite index, and commit
Wait for key to be available and reindex
Reference: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html
"Conference" Index Timing ComparisonWithout Index:
n: 10min: 127.0max: 203.0mean: 159.9std dev: 29.598986469134378median: 151.0
With Index:
n: 10min: 2.0max: 7.0mean: 3.2std dev: 1.6865480854231356median: 2.5
Represents 49.96875x increase
Property Indices in Titan● Composite Index
○ Supports equality comparison only○ Can handle combinations of properties but must be pre-defined (Ex: Name and Age)
● Mixed Index○ Greater conditionality support
○ Can handle lookups on arbitrary combinations of indexed keys
● Titan also has support for other external indexing backend
● Reference documentation: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html
High Order VerticesDon't always want to traverse all edges incident on a vertex
Filtering based on some edge properties is desirable
Similar to vertices: feasible to inspect each edge for low order vertices
Prohibitive on high order vertices
Traverse these edges
Vertex Centric IndexExample query: dateFilter = lt(20051000)g.V().has('conference', 'Big Ten Conference').as('team', 'wins', 'losses').select('team', 'wins', 'losses').by('name').by(__.outE().has('date', dateFilter).count()).by(__.inE().has('date', dateFilter).count())
Gets Big 10 team records for games played before October 2005
Notes on Vertex Centric IndicesFrom Titan 1.0.0 documentation:
Reference documentation: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html#vertex-indexes
Titan 0.4.4 does not automatically create vertex-centric indices
No need to create one for our example
May be necessary if a composite key query is being performedEx: Get Big 10 team records for games played before October 2005 and won by more than 14 points
Query StructuringOrder of steps in query can make a difference
Consider:g.V().order().by(__.outE().count(), decr).has('conference', 'Big Ten Conference').values('name')
vs
g.V().has('conference', 'Big Ten Conference').order().by(__.outE().count(), decr).values('name')
Mean times: 1032.8 ms vs 42.8 ms respectively
Titan CachingSupport for database and transaction level caching
Storage Backend
Titan
DB Cache
TX Cache
TX Cache
TX Cache
Client
Client
Client
Transaction CacheTransaction starts on graph access and ends on commit or rollback
Useful for workloads accessing same data repeatedly
A
B
C
D
Rank of team A is a count of these "beat" edges
Ex: Team Rankings g.V().order().by(__.out().out().count(), decr).as('team', 'score', 'wins', 'losses').select('team', 'score', 'wins', 'losses').by('name').by(__.out().out().count()).by(__.outE().count()).by(__.inE().count()).limit(25)
With TX cache: 3361 ms, without TX cache: 5206 ms
/r/mildlyinteresting/1. Texas2. USC3. Penn State4. Ohio State5. Virginia Tech6. TCU7. West Virginia8. Lousianna State9. Alabama
10. Oregon11. Louisville12. Georgia13. UCLA14. Miami (FL)
1. Texas2. USC3. Penn State4. Virginia Tech5. LSU6. Ohio State7. Georgia8. TCU9. West Virginia
10. Alabama11. Boston College12. Oklahoma13. Florida14. UCLA
http://www.collegefootballpoll.com/2005_archive_computer_rankings.html
2005 End of Season Computer Rankings
Our Query Results
Transaction Caching GotchasCache Thrashing
Symptom: Queries suddenly & significantly slow down as data size increases
Solve this by tuning transaction cache size
● Globally by setting cache.tx-cache-size● Per transaction using TransactionBuilder
Memory Leak
Transactions automatically started and are thread aware
With read only access in separate threads, transaction caches can leak
Solved by calling g.rollback() at the end of the thread execution (releases the TX cache)
Transaction Cache SettingsTransaction cache can be setup in properties files
Settings can be overridden when creating transaction using TransactionBuilder:
Example:
tx=graph.buildTransaction().vertexCacheSize(50000).start()
Other transaction settings can be found here: http://s3.thinkaurelius.com/docs/titan/1.0.0/tx.html#tx-config
Database Level Caching Database caching helps performance across transactions:
gremlin> stats2.getValues()==>[1016.0, 41.0, 27.0, 26.0, 24.0, 23.0, 24.0, 21.0, 18.0, 18.0]
Trades consistency for speed in clusters
Node 1
Node 2
Node n
Titan 1
Titan 2
Titan n
Cold cache
Warm cache
1. Read
2. Write
3. Read
Titan Optionsquery.batch - Whether traversal queries should be batched when executed against the storage backend. This can lead to significant performance improvement if there is a non-trivial latency to the backend.
query.fast-property - Whether to pre-fetch all properties on first singular vertex property access. This can eliminate backend calls on subsequent property access for the same vertex at the expense of retrieving all properties at once. This can be expensive for vertices with many properties
http://s3.thinkaurelius.com/docs/titan/1.0.0/titan-config-ref.html
Using "query.fast-property"Test query: g.V().group().by('conference').by('name')
query.fast-property = false:
n: 10min: 243.0max: 262.0mean: 250.4std dev: 5.125101625008685median: 250.0
query.fast-property = true:
n: 10min: 127.0max: 151.0mean: 138.1std dev: 7.233410137841088median: 139.5
Summary● Titan indices
○ Property indices - vertex/edge lookups
○ Vertex centric indices - edge traversals
● Generally limiting elements early in traversal is a good thing
● Caching○ Database level - improve speed while potentially increasing likelihood of stale data
○ Transaction level - helps when repeatedly visiting elements within a transaction
● Various other options available for specific tuning needs
Thanks For Watching
Questions
Nakul Jeirath@njeirathSenior Security Engineer - WellAware