Addressing performance issues in titan+cassandra

Addressing Performance Issues in Titan+Cassandra

Introduction● Nakul Jeirath

● Senior security engineer at WellAware (wellaware.us)

● WellAware: Oil & gas startup building a SaaS monitoring & analytics platform

Transitioned ~2 years ago

Titan+Cassandra Performance Factors● Titan deployment methodology● Cassandra tuning● Titan JVM tuning● Data modeling● Indexing

○ Property indices○ Vertex centric indices

● Query structure● Caching

○ Transaction cache○ Database level cache

● Titan options

Titan+Cassandra Performance Factors● Titan deployment methodology● Cassandra tuning● Titan JVM tuning● Data modeling● Indexing

○ Property indices○ Vertex centric indices

● Query structure● Caching

○ Transaction cache○ Database level cache

● Titan options

Ted Wilmes - Cassandra Summit 2015:Slides: http://www.slideshare.net/twilmes/modeling-the-iot-with-titandb-and-cassandraVideo: https://vimeopro.com/user35188327/cassandra-summit-2015/video/143695770

This talkOur focus will be reads, check out Ted's talk for write optimization

http://www.slideshare.net/twilmes/modeling-the-iot-with-titandb-and-cassandra

https://vimeopro.com/user35188327/cassandra-summit-2015/video/143695770

A Toy Example

http://coachesbythenumbers.com/sportsource-college-football-data-packages/

2005 College Football Data

● Team names & conferences● Game record with dates and scores

● Interesting questions:○ Records for all teams in conference X○ Top 25 ranking using record + strength of opponents○ Three team loop (A beat B beat C beat A)

Toy Model

Label: team

name: Purdueconf: Big 10

Label: team

name: IUconf: Big 10

label: beat

date: 11/19/05score: 41-14

gremlin> g.V().count()==>239gremlin> g.E().count()==>718

Test BenchShut downTitan

ClearTitan DB

StartTitan

Load test dataset

Source code:https://github.com/njeirath/titan-perf-tester

https://github.com/njeirath/titan-perf-tester

https://github.com/njeirath/titan-perf-tester

Test Runnerpublic class PerfTestRunner { public static DescriptiveStatistics test(final TitanGraph graph, int iterations, PerfOperation op) { DescriptiveStatistics stats = new DescriptiveStatistics(); for (int i = 0; i < iterations; i++) { TitanTransaction tx = graph.newTransaction(); Date start = new Date(); op.run(tx); Date end = new Date(); stats.addValue(end.getTime() - start.getTime()); tx.rollback(); } return stats; }}

Pass in test query as LambdaStart new transaction

Run test query

Record time

Rollback transaction

Anatomy of Gremin Queries● Simplest form of OLTP query

○ picks an entry point(s) to graph ○ traverses from initial vertices

Initial graph entryselection

Edge traversal

Example:

How many games did a Big 10 team win?

g.V().has('conference', 'Big Ten Conference').outE('beat').count()

Selecting the Entry PointTypically won't have vertex ID(s) to select directly

Will select based on one or more vertex property

Feasible to scan all vertices in small graphs

Becomes prohibitively expensive on large graphs

Start from these

Property IndexTest query: g.V().has('conference', 'Big Ten Conference').toList()

Output:

07:45:24 WARN com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx - Query requires iterating over all vertices [(conference = Big Ten Conference)]. For better performance, use indexes

Titan is nice enough to warn us of this issue

Creating Index on "Conference" Propertymgmt = graph.openManagement()conf = mgmt.getPropertyKey('conference')mgmt.buildIndex('byConference',

Vertex.class).addKey(conf).buildCompositeIndex()mgmt.commit()mgmt.awaitGraphIndexStatus(graph, 'byConference').call()mgmt = graph.openManagement()mgmt.updateIndex(mgmt.getGraphIndex("byConference"), SchemaAction.REINDEX).get()mgmt.commit()

Access graph management, create composite index, and commit

Wait for key to be available and reindex

Reference: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html

"Conference" Index Timing ComparisonWithout Index:

n: 10min: 127.0max: 203.0mean: 159.9std dev: 29.598986469134378median: 151.0

With Index:


Represents 49.96875x increase

Property Indices in Titan● Composite Index

○ Supports equality comparison only○ Can handle combinations of properties but must be pre-defined (Ex: Name and Age)

● Mixed Index○ Greater conditionality support

○ Can handle lookups on arbitrary combinations of indexed keys

● Titan also has support for other external indexing backend

● Reference documentation: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html

http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html



High Order VerticesDon't always want to traverse all edges incident on a vertex

Filtering based on some edge properties is desirable

Similar to vertices: feasible to inspect each edge for low order vertices

Prohibitive on high order vertices

Traverse these edges

Vertex Centric IndexExample query: dateFilter = lt(20051000)g.V().has('conference', 'Big Ten Conference').as('team', 'wins', 'losses').select('team', 'wins', 'losses').by('name').by(__.outE().has('date', dateFilter).count()).by(__.inE().has('date', dateFilter).count())

Gets Big 10 team records for games played before October 2005

Notes on Vertex Centric IndicesFrom Titan 1.0.0 documentation:

Reference documentation: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html#vertex-indexes

Titan 0.4.4 does not automatically create vertex-centric indices

No need to create one for our example

May be necessary if a composite key query is being performedEx: Get Big 10 team records for games played before October 2005 and won by more than 14 points

http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html#vertex-indexes

Query StructuringOrder of steps in query can make a difference

Consider:g.V().order().by(__.outE().count(), decr).has('conference', 'Big Ten Conference').values('name')

vs

g.V().has('conference', 'Big Ten Conference').order().by(__.outE().count(), decr).values('name')

Mean times: 1032.8 ms vs 42.8 ms respectively

Titan CachingSupport for database and transaction level caching

Storage Backend

Titan

DB Cache

TX Cache

TX Cache

TX Cache

Client

Client

Client

Transaction CacheTransaction starts on graph access and ends on commit or rollback

Useful for workloads accessing same data repeatedly

A

B

C

D

Rank of team A is a count of these "beat" edges

Ex: Team Rankings g.V().order().by(__.out().out().count(), decr).as('team', 'score', 'wins', 'losses').select('team', 'score', 'wins', 'losses').by('name').by(__.out().out().count()).by(__.outE().count()).by(__.inE().count()).limit(25)

With TX cache: 3361 ms, without TX cache: 5206 ms

/r/mildlyinteresting/1. Texas2. USC3. Penn State4. Ohio State5. Virginia Tech6. TCU7. West Virginia8. Lousianna State9. Alabama

10. Oregon11. Louisville12. Georgia13. UCLA14. Miami (FL)

1. Texas2. USC3. Penn State4. Virginia Tech5. LSU6. Ohio State7. Georgia8. TCU9. West Virginia

10. Alabama11. Boston College12. Oklahoma13. Florida14. UCLA

http://www.collegefootballpoll.com/2005_archive_computer_rankings.html

2005 End of Season Computer Rankings

Our Query Results

Transaction Caching GotchasCache Thrashing

Symptom: Queries suddenly & significantly slow down as data size increases

Solve this by tuning transaction cache size

● Globally by setting cache.tx-cache-size● Per transaction using TransactionBuilder

Memory Leak

Transactions automatically started and are thread aware

With read only access in separate threads, transaction caches can leak

Solved by calling g.rollback() at the end of the thread execution (releases the TX cache)

Transaction Cache SettingsTransaction cache can be setup in properties files

Settings can be overridden when creating transaction using TransactionBuilder:

Example:

tx=graph.buildTransaction().vertexCacheSize(50000).start()

Other transaction settings can be found here: http://s3.thinkaurelius.com/docs/titan/1.0.0/tx.html#tx-config

http://s3.thinkaurelius.com/docs/titan/1.0.0/tx.html#tx-config



Database Level Caching Database caching helps performance across transactions:

gremlin> stats2.getValues()==>[1016.0, 41.0, 27.0, 26.0, 24.0, 23.0, 24.0, 21.0, 18.0, 18.0]

Trades consistency for speed in clusters

Node 1

Node 2

Node n

Titan 1

Titan 2

Titan n

Cold cache

Warm cache

1. Read

2. Write

3. Read

Titan Optionsquery.batch - Whether traversal queries should be batched when executed against the storage backend. This can lead to significant performance improvement if there is a non-trivial latency to the backend.

query.fast-property - Whether to pre-fetch all properties on first singular vertex property access. This can eliminate backend calls on subsequent property access for the same vertex at the expense of retrieving all properties at once. This can be expensive for vertices with many properties

http://s3.thinkaurelius.com/docs/titan/1.0.0/titan-config-ref.html

Using "query.fast-property"Test query: g.V().group().by('conference').by('name')

query.fast-property = false:


query.fast-property = true:


Summary● Titan indices

○ Property indices - vertex/edge lookups

○ Vertex centric indices - edge traversals

● Generally limiting elements early in traversal is a good thing

● Caching○ Database level - improve speed while potentially increasing likelihood of stale data

○ Transaction level - helps when repeatedly visiting elements within a transaction

● Various other options available for specific tuning needs

Thanks For Watching

Questions

Nakul Jeirath@njeirathSenior Security Engineer - WellAware

Addressing performance issues in titan+cassandra

Technology