Top Banner
Multi-criteria Queries on a Cassandra Application Jérôme Mainaud
58

Ippon Technologies: Multi-criteria queries on a Cassandra application

Apr 11, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ippon Technologies: Multi-criteria queries on a Cassandra application

Multi-criteria Queries on a Cassandra Application

Jérôme Mainaud

Page 2: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Who am I

Jérôme Mainaud

➔ @jxerome

➔ Software Architect at Ippon Technologies, Paris

➔ DataStax Solution Architect Certified

Page 3: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Ippon Technologies

● 200 software engineers in France and the US

➔ Paris, Nantes, Bordeaux

➔ Richmond (Virginia), Washington (DC)

● Expertise

➔ Digital, Big Data and Cloud

➔ Java & Agile

● Open-source Projects :

➔ JHipster,

➔ Tatami …

● @ipponusa

Page 4: Ippon Technologies: Multi-criteria queries on a Cassandra application

Agenda

1. Context2. Technical Stack3. Modelisation4. Implementation5. Results

Page 5: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015

Warning

The following slideshow features data patterns and

code performed by professionals.

Accordingly, Ippon and conference organisers must

insist that no one attempt to recreate any data pattern

and code performed in this slideshow.

Page 6: Ippon Technologies: Multi-criteria queries on a Cassandra application

Once Upon a time an app …

Page 7: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Once Upon a time an app …

Invoice application in SAAS

➔ A single database for all users

➔ Data isolation for each user

High volume data

➔ 1 year

➔ 500 millions invoices

➔ 2 billions invoice lines

Page 8: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Once Upon a time an app …

Page 9: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Once Upon a time an app …

Page 10: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Back-end evolution

Page 11: Ippon Technologies: Multi-criteria queries on a Cassandra application

Technical Stack

Page 12: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Technical Stack

JHipster

➔ Spring Boot + AngularJS Application Generator

➔ Support JPA, MongoDB

➔ and now Cassandra!

Made us generate first version very fast

➔ Application skeleton ready in 5 minutes

➔ Add entities tables, objets and mapping

➔ Configuration, build, logs management, etc.

➔ Gatling Tests ready to use

http://jhipster.github.io

Page 13: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Technical Stack

Spring Boot

➔ Build on Spring

➔ Convention over configuration

➔ Many “starters” ready to use

Services Web

➔ CXF instead of Spring MVC REST

Cassandra

➔ DataStax Enterprise

Java 8

Page 14: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

JHipster — Code generator

● But

➔ Cassandra was not yet supported

➔ No AngularJS nor frontend

➔ CXF instead of Spring MVC

Page 15: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

JHipster — Code generator

● But

➔ Cassandra was not yet supported

➔ No AngularJS nor frontend

➔ CXF instead of Spring MVC

● JHipster alpha generator

➔ Secret Generator secret used to

validate concepts before writing

Yeoman generator

Page 16: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

JHipster — Code generator

Julien DuboisCode Generator

Page 17: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Cassandra Driver Configuration

Spring Boot Configuration

➔ No integration of driver DataStax Java Driver in Spring Boot

➔ Created Spring Boot autoconfiguration of DataStax Java Driver

➔ Use the standard YAML File

Offered to Spring Boot 1.3

➔ Github ticket #2064 « Add a spring-boot-starter-data-cassandra »

➔ Still opened

Improved by the Community

➔ JHipster version was improved by pull-request

➔ Authentication, Load-Balancer config

Page 18: Ippon Technologies: Multi-criteria queries on a Cassandra application

Data Model

Page 19: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Conceptual Model

Page 20: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Physical Model

Page 21: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

create table invoice ( invoice_id timeuuid, user_id uuid static, firstname text static, lastname text static, invoice_date timestamp static, payment_date timestamp static, total_amount decimal static, delivery_address text static, delivery_city text static, delivery_zipcode text static, item_id timeuuid, item_label text, item_price decimal, item_qty int, item_total decimal, primary key (invoice_id, item_id));

Table

Page 22: Ippon Technologies: Multi-criteria queries on a Cassandra application

Multi-criteria Search

Page 23: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Multi-criteria Search

Mandatory Criteria

➔ User (implicit)

➔ Invoice date (range of dates)

Additional Criteria

➔ Client lastname

➔ Client firstname

➔ City

➔ Zipcode

Paginated Result

Page 24: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Shall we use Solr ?

Page 25: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Shall we use Solr ?

● Integrated in DataStax Enterprise

● Atomic and Automatic Index update

● Full-Text Search

Page 26: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Shall we use Solr ?

● We search on static columns

➔ Solr don’t support them

● We search partitions

➔ Solr search lines

Page 27: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Shall we use Solr ?

● We search on static columns

➔ Solr don’t support them

● We search partitions

➔ Solr search lines

Page 28: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Shall we use secondary indexes ?

● Only one index used for a query

● Hard to get good performance with them

Page 29: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Index Table

Use index tables

➔ Partition Key : Mandatory criteria and one additional criterium

○ user_id

○ invoice day (truncated invoice date)

○ additional criterium

➔ Clustering columns : Invoice UUID

Page 30: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Index Table

Page 31: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Materialized view

CREATE MATERIALIZED VIEW invoice_by_firstname

AS

SELECT invoice_id

FROM invoice

WHERE firstname IS NOT NULL

PRIMARY KEY ((user_id, invoice_day, firstname), invoice_id)

WITH CLUSTERING ORDER BY (invoice_id DESC)

new in

3.0

Page 32: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Parallel Search on indexes

in memorymerge by application

Page 33: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Parallel item detail queries

Result Page (id)

Page 34: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Search

Search on date range

➔ loop an every days in the range and stop

when there is enough result for a page

Page 35: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Search Complexity

Query count

➔ For each day in date range

○ 1 query per additional criterium filled (partition by query)

➔ 1 query per item in result page (partition by query)

Search Complexity

➔ partitions by query

Example: 3 criteria, 7 days, 100 items per page

➔ query count ≤ 3 × 7 + 100 = 121

Page 36: Ippon Technologies: Multi-criteria queries on a Cassandra application

JAVAIndexes

Page 37: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Index — Instances

@Repository

public class InvoiceByLastNameRepository extends IndexRepository<String> {

public InvoiceByLastNameRepository() {

super("invoice_by_lastname", "lastname", Invoice::getLastName, Criteria::getLastName);

}

}

@Repository

public class InvoiceByFirstNameRepository extends IndexRepository<String> {

public InvoiceByFirstNameRepository() {

super("invoice_by_firstname", "firstname", Invoice::getFirstName, Criteria::getFirstName);

}

}

Page 38: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Index — Parent Class

public class IndexRepository<T> {

@Inject

private Session session;

private final String tableName;

private final String valueName;

private final Function<Invoice, T> valueGetter;

private final Function<Criteria, T> criteriumGetter;

private PreparedStatement insertStmt;

private PreparedStatement findStmt;

private PreparedStatement findWithOffsetStmt;

@PostConstruct

public void init() { /* initialize PreparedStatements */ }

Page 39: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Index — Insert

@Override

public void insert(Invoice invoice) {

T value = valueGetter.apply(invoice);

if (value != null) {

session.execute(

insertStmt.bind(

invoice.getUserId(),

Dates.toDate(invoice.getInvoiceDay()),

value,

invoice.getId()));

}

}

Page 40: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Index — Insert — Prepare Statement

insertStmt = session.prepare(

QueryBuilder.insertInto(tableName)

.value("user_id", bindMarker())

.value("invoice_day", bindMarker())

.value(valueName, bindMarker())

.value("invoice_id", bindMarker())

);

Page 41: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Index — Insert — Date conversion

public static Date toDate(LocalDate date) {

return date == null ? null :

Date.from(date.atStartOfDay().atZone(ZoneOffset.systemDefault()).toInstant());

}

Page 42: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Index — Search

@Override

public CompletableFuture<Iterator<UUID>> find(Criteria criteria, LocalDate day, UUID offset) {

T criterium = criteriumGetter.apply(criteria);

if (criterium == null) {

return CompletableFuture.completedFuture(null);

}

BoundStatement stmt;

if (invoiceIdOffset == null) {

stmt = findStmt.bind(criteria.getUserId(), Dates.toDate(day), criterium);

} else {

stmt = findWithOffsetStmt.bind(criteria.getUserId(), Dates.toDate(day), criterium, offset);

}

return Jdk8.completableFuture(session.executeAsync(stmt))

.thenApply(rs -> Iterators.transform(rs.iterator(), row -> row.getUUID(0)));

}

Page 43: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Index — Search — Prepare Statement

findWithOffsetStmt = session.prepare(

QueryBuilder.select()

.column("invoice_id")

.from(tableName)

.where(eq("user_id", bindMarker()))

.and(eq("invoice_day", bindMarker()))

.and(eq(valueName, bindMarker()))

.and(lte("invoice_id", bindMarker()))

);

Page 44: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Index — Search (Guava to Java 8)

public static <T> CompletableFuture<T> completableFuture(ListenableFuture<T> guavaFuture) {

CompletableFuture<T> future = new CompletableFuture<>();

Futures.addCallback(guavaFuture, new FutureCallback<T>() {

@Override

public void onSuccess(T result) {

future.complete(result);

}

@Override

public void onFailure(Throwable t) {

future.completeExceptionally(t);

}

});

return future;

}

Page 45: Ippon Technologies: Multi-criteria queries on a Cassandra application

JAVASearch Service

Page 46: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Service — Class

@Service

public class InvoiceSearchService {

@Inject

private InvoiceRepository invoiceRepository;

@Inject

private InvoiceByDayRepository byDayRepository;

@Inject

private InvoiceByLastNameRepository byLastNameRepository;

@Inject

private InvoiceByFirstNameRepository byLastNameRepository;

@Inject

private InvoiceByCityRepository byCityRepository;

@Inject

private InvoiceByZipCodeRepository byZipCodeRepository;

Page 47: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Service — Search

public ResultPage findByCriteria(Criteria criteria) {

return byDateInteval(criteria, (crit, day, offset) -> {

CompletableFuture<Iterator<UUID>> futureUuidIt;

if (crit.hasIndexedCriteria()) {

/*

* ... Doing multi-criteria search; see next slide ...

*/

} else {

futureUuidIt = byDayRepository.find(crit.getUserId(), day, offset);

}

return futureUuidIt;

});

}

Page 48: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Service — Search

CompletableFuture<Iterator<UUID>>[] futures = Stream.<IndexRepository> of(

byLastNameRepository, byFirstNameRepository, byCityRepository, byZipCodeRepository)

.map(repo -> repo.find(crit, day, offset))

.toArray(CompletableFuture[]::new);

futureUuidIt = CompletableFuture.allOf(futures).thenApply(v ->

Iterators.intersection(TimeUUIDComparator.desc,

Stream.of(futures)

.map(CompletableFuture::join)

.filter(Objects::nonNull)

.collect(Collectors.toList())));

Page 49: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Service — UUIDs Comparator

/**

* TimeUUID Comparator equivalent to Cassandra’s Comparator:

* @see org.apache.cassandra.db.marshal.TimeUUIDType#compare()

*/

public enum TimeUUIDComparator implements Comparator<UUID> {

desc {

@Override

public int compare(UUID o1, UUID o2) {

long delta = o2.timestamp() - o1.timestamp();

if (delta != 0)

return Ints.saturatedCast(delta);

return o2.compareTo(o1);

}

};

}

Page 50: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Service — Days Loop

@FunctionalInterface

private static interface DayQuery {

CompletableFuture<Iterator<UUID>> find(Criteria criteria, LocalDate day, UUID invoiceIdOffset);

}

private ResultPage byDateInteval(Criteria criteria, DayQuery dayQuery) {

int limit = criteria.getLimit();

List<Invoice> resultList = new ArrayList<>(limit);

LocalDate dayOffset = criteria.getDayOffset();

UUID invoiceIdOffset = criteria.getInvoiceIdOffset();

/* ... Loop on days ; to be seen in next slide ... */

return new ResultPage(resultList);

}

Page 51: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Service — Days Loop

LocalDate day = criteria.getLastDay();

do {

Iterator<UUID> uuidIt = dayQuery.find(criteria, day, invoiceIdOffset).join();

limit -= loadInvoices(resultList, uuidIt, criteria, limit);

if (uuidIt.hasNext()) {

return new ResultPage(resultList, day, uuidIt.next());

}

day = day.minusDays(1);

invoiceIdOffset = null;

} while (!day.isBefore(criteria.getFirstDay()));

Page 52: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Service — Invoices Loading

private int loadInvoices(List<Invoice> resultList, Iterator<UUID> uuidIt, int limit) {

List<CompletableFuture<Invoice>> futureList = new ArrayList<>(limit);

for (int i = 0; i < limit && uuidIt.hasNext(); ++i) {

futureList.add(invoiceRepository.findOne(uuidIt.next()));

}

futureList.stream()

.map(CompletableFuture::join)

.forEach(resultList::add);

return futureList.size();

}

Page 53: Ippon Technologies: Multi-criteria queries on a Cassandra application

Results

Page 54: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Limits

● We got an exact-match search

➔ No full text search

➔ No « start with » search

➔ No pattern base search

● Requires highly discriminating mandatory criteria

➔ user_id & invoice_day

● Pagination doesn’t give total item count

➔ Could be done with additionnal query cost

● No sort availaible

Page 55: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Hardware

● Hosted by Ippon Hosting

● 8 nodes

➔ 16 Gb RAM

➔ Two SSD drives with 256 Gb in RAID 0

● 6 nodes dedicated to Cassandra cluster

● 2 nodes dedicated to the application

Page 56: Ippon Technologies: Multi-criteria queries on a Cassandra application

Ippon Technologies © 2015#CassandraSummit

Application

● 5,000 concurrent users

● 9 months of data loaded

➔ Legacy system: store 1 year; search on last 3 months.

➔ Target: 3 years of history

● Real-time search Result

➔ Data are immediately available

➔ Legacy system: data available next day

● Cost Killer

Page 57: Ippon Technologies: Multi-criteria queries on a Cassandra application

Q & A

Page 58: Ippon Technologies: Multi-criteria queries on a Cassandra application

PARISBORDEAUX

NANTESWASHINGTON

NEW-YORKRICHMOND

[email protected] - www.ippon-hosting.com - www.ippon-digital.fr

@ippontech-

01 46 12 48 48