Infinispan,Lucene,Hibername OGM

Padova, InfoCamereJBoss User Group

12 Aprile 2012

Chi sono?

• Team Hibernate

– Hibernate Search

– Hibernate OGM

• Infinispan

– Infinispan Core

– Infinispan Query

– JGroups

• Apache Lucene

Sanne GrinoveroItaliano, Olandese, NewcastleRed Hat: JBoss, Engineering

Infinispan

• Cache distribuita

• Datagrid scalabile e transazionale: performance estreme e cloud

• NoSQL “DataBase”: key-value store

– Come si interroga un data grid ?

SELECT * FROM GRID

Interrogare una “Grid”

Object v = cache.get(“c7”);

Senza chiave, non puoi ottenere il valore.

É pratico il solo accesso per chiave?

Test sulla mia libreria

• Dov'é Hibernate Search in Action?

• Mi passi

ISBN 978-1-933988-17-7 ?

• Prendi i libri su Gaudí ?

Come implementare queste funzioni su un

Key/Value store?

• Dov'é Hibernate Search in Action?

• Mi passi ISBN 978-1-933988-17-7 ?

• Trovi i libri su Gaudí ?

document based NoSQL: Map/Reduce

Infinispan non é propriamente document based ma offre Map/Reduce.

Eppure non é escluso l'uso di JSON, XML, YAML, Java:public class Book implements Serializable {

final String title; final String author; final String editor;

public Book(String title, String author, String editor) { this.title = title; this.author = author; this.editor = editor; }

Iterate & collectclass TitleBookSearcher implements Mapper<String, Book, String, Book> { final String title; public TitleBookSearcher(String t) { title = t; } public void map(String key, Book value, Collector collector){ if ( title.equals( value.title ) ) collector.emit( key, value ); }

class BookReducer implements Reducer<String, Book> { public Book reduce(String reducedKey, Iterator<Book> iter) { return iter.next(); }}

Implementare queste semplici funzioni:

✔ Trova “Hibernate Search in Action”?

✔ Trova per codice “ISBN 978-1-933988-17-7” ?

✗ Quanti libri a proposito di “Shakespeare” ?

• Per uno score corretto in ricerche fulltext servono le frequenze dei frammenti di testo relative al corpus.

• Il Pre-tagging é poco pratico e limitante

Apache Lucene

• Progetto open source Apache™

• Integrato in innumerevoli progetti

• .. tra cui Hibernate via Hibernate Search

• Clusterizzabile via Infinispan

– Performance

– Real time

– High availability

Cosa offre Lucene?

• Ricerche per Similarity score

• Analisi del testo

– Sinonyms, Stopwords, Stemming, ...

• Reusable declarative Filters

• TermVectors

• MoreLikeThis

• Faceted Search

• Veloce!

Lucene: Stopwords

a, able, about, across, after, all, almost, also, am, among, an, and, any, are, as, at, be, because, been, but, by, can, cannot, could, dear, did, do, does, either, else, ever, every, for, from, get, got, had, has, have, he, her, hers, him, his, how, however, i, if, in, into, is, it, its, just, least, let, like, likely, may, me, might, most, must, my, neither, no, nor, not, of, off, often, on, only, or, other, our, own, rather, said, say, says, she, should, since, so, some, than, that, the, their, them, then, there, these, they, this, tis, to, too, twas, us, wants, was, we, were, what, when, where, which, while, who, whom, why, will, with, would, yet, you, your

Filters

Faceted Search

Facciamo un bel motore di ricerca che restituisce i risultati in ordine alfabetico?

Chi usa Lucene?

Dov'é la fregatura?

• Necessita di un indice: risorse fisiche e di amministrazione.

– in memory

– on filesystem

– in Infinispan

• Sostanzialmente immutable segments

– Ottimizzato per data mining / query, non per updates.

• Un mondo di stringhe e vettori di frequenze

Infinispan Query quickstart• Abilita indexing=true nella

configurazione

• Aggiungi il modulo infinispan-query.jar al classpath

• Annota i POJO inseriti nella cache per le modalitá di indicizzazione

<dependency> <groupId>org.infinispan</groupId> <artifactId>infinispan-query</artifactId> <version>5.1.3.FINAL</version></dependency>

Configurazione tramite codice

Configuration c = new Configuration() .fluent() .indexing() .addProperty("hibernate.search.default.directory_provider", "ram") .build();

CacheManager manager = new DefaultCacheManager(c);

Configurazione / XML

<?xml version="1.0" encoding="UTF-8"?><infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:infinispan:config:5.0 http://www.infinispan.org/schemas/infinispan-config-5.0.xsd" xmlns="urn:infinispan:config:5.0"><default> <indexing enabled="true" indexLocalOnly="true"> <properties> <property name="hibernate.search.option1" value="..." /> <property name="hibernate.search.option2" value="..." /> </properties> </indexing></default>

Annotazioni sul modello

@ProvidedId @Indexedpublic class Book implements Serializable {

@Field String title; @Field String author; @Field String editor;

public Book(String title, String author, String editor) { this.title = title; this.author = author; this.editor = editor; }

Esecuzione di Query

SearchManager sm = Search.getSearchManager(cache); Query query = sm.buildQueryBuilderForClass(Book.class) .get() .phrase() .onField("title") .sentence("in action") .createQuery(); List<Object> list = sm.getQuery(query).list();

Architettura• Integra Hibernate Search (engine)

– Listener a eventi Hibernate & transazioni

• Eventi Infinispan & transazioni

– Mappa tipi Java e grafi del modello a Documents di Lucene

– Thin-layer design

Index mapping

Tests perInfinispan Query

https://github.com/infinispan/infinispan

org.apache.lucene.search.Query luceneQuery =

queryBuilder.phrase()

.onField( "description" )

.andField( "title" )

.sentence( "a book on highly scalable query engines" )

.enableFullTextFilter( “ready-for-shipping” )

.createQuery();

CacheQuery cacheQuery =

searchManager.getQuery( luceneQuery, Book.class);

List<Book> objectList = cacheQuery.list();

Architettura: Infinispan Query

Problemi di scalabilitá

• Writer locks globali

• Sharing su NFS molto problematico

Queue-based clustering(filesystem)

Index stored in Infinispan

Quickstart Hibernate Search

• Aggiungi la dipendenza ad hibernate-search:

<groupId>org.hibernate</groupId>

<artifactId>hibernatesearchorm</artifactId>

<version>4.1.0.Final</version>

</dependency>

• Tutto il resto é opzionale:

– Come gestire gli indici

– Moduli di estensione, Analyzer custom

– Performance tuning

– Mapping custom dei tipi

– Clustering

• JGroups

• Infinispan

• JMS

@Entitypublic class Essay { @Id public Long getId() { return id; }

public String getSummary() { return summary; } @Lob public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...

@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; }

public String getSummary() { return summary; } @Lob public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...

@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; } @Field public String getSummary() { return summary; } @Lob public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...

@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; } @Field public String getSummary() { return summary; } @Lob @Field @Boost(0.8) public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...

@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; } @Field public String getSummary() { return summary; } @Lob @Field @Boost(0.8) public String getText() { return text; } @ManyToOne @IndexedEmbedded public Author getAuthor() { return author; }...

@Entitypublic class Author {

@Id @GeneratedValueprivate Integer id;private String name;@OneToManyprivate Set<Book>

books;}

@Entitypublic class Book { private Integer id; private String title;}

Un secondo esempio

@Entity @Indexedpublic class Author {

@Id @GeneratedValueprivate Integer id;

@Field(store=Store.YES)

private String name;@OneToMany

@IndexedEmbeddedprivate Set<Book>

books;}

@Entitypublic class Book { private Integer id; @Field(store=Store.YES) private String title;}

Struttura dell'indice

String[] productFields = {"summary", "author.name"};

Query luceneQuery = // query builder or any Lucene Query

FullTextEntityManager ftEm = Search.getFullTextEntityManager(entityManager);

FullTextQuery query = ftEm.createFullTextQuery( luceneQuery, Product.class );

List<Product> items = query.setMaxResults(100).getResultList();

int totalNbrOfResults = query.getResultSize();

TotalNbrOfResults= 8.320.000(0.002 seconds)

Uso della DSL

Sui risultati:

• Managed POJO: modifiche alle entitá applicati sia a Lucene che al database

• Paginazione JPA, familiari (standard):

– .setMaxResults( 20 ).setFirstResult( 100 );

• Restrizioni sul tipo, query fulltext polimorifiche:

– .createQuery( luceneQuery, A.class, B.class, ..);

• Projection

• Result mapping

FiltersFullTextQuery ftQuery = s // s is a FullTextSession

.createFullTextQuery( query, Product.class )

.enableFullTextFilter( "filtroMinori" )

.enableFullTextFilter( "offertaDelGiorno" )

.setParameter( "day", “20120412” )

.enableFullTextFilter( "inStockA" )

.setParameter( "location", "Padova" );

List<Product> results = ftQuery.list();

Uso di Infinispan per la distribuzione degli indici

Clustering di un uso Lucene “diretto”

• Usando org.apache.lucene

– Tradizionalmente difficile da distribuire su nodi multipli

– Su qualsiasi cloud

Nodo singoloidea di performance

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 5000 10000 15000 20000 25000

Queries/sec

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 50 100 150 200 250 300 350 400

Write ops/sec

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 5000 10000 15000 20000 25000

Queries/sec

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 50 100 150 200 250 300 350 400

Write ops/sec

Nodi multipliidea di performance

Le scritture non scalano?

Suggerimenti per performance ottimali

• Calibra il chunk_size per l'uso effettivo del vostro indice (evita i read lock evitando la frammentazione)

• Verifica la dimensione dei pacchetti network: blob size, JGroups packets, network interface and hardware.

• Scegli e configura un CacheLoader adatto

Requisiti di memoria

• RAMDirectory: tutto l'indice (e piú) in RAM.

• FSDirectory: un buon OS sa fare un ottimo lavoro di caching di IO – spesso meglio di RAMDirectory.

• Infinispan: configurabile, fino alla memoria condivisa tra nodi

– Flexible

– Fast

– Network vs. disk

Moduli per cloud deployment scalabili

One Infinispan to rule them all

– Store Lucene indexes

– Hibernate second level cache

– Application managed cache

– Datagrid

– EJB, session replication in AS7

– As a JPA “store” via Hibernate OGM

Ingredienti per la cloud• JGroups DISCOVERY protocol

– MPING

– TCP_PING

– JDBC_PING

– S3_PING

• Scegli un CacheLoader

– Database based, Jclouds, Cassandra, ...

Futuro prossimo• Semplificare la scalabilitá in scrittura

• Auto-tuning dei parametri di clustering – ergonomics!

• Parallel searching: multi/core + multi/node

• A component of

– http://www.cloudtm.eu

JPA for NoSQL

NoSQL:la flessibilitá costa

• Programming model• one per product :-(

• no schema => app driven schema• query (Map Reduce, specific DSL, ...)• data structure transpires• Transaction• durability / consistency

Esempio: Infinispan

Distributed Key/Value store

• (or Replicated, local only efficient cache,

invalidating cache)Each node is equal

• Just start more nodes, or kill some

No bottlenecks

• by design

Cloud-network friendly

• JGroups

• And “cloud storage” friendly too!

ABC di Infinispan

map.put( “user-34”, userInstance );

map.get( “user-34” );

map.remove( “user-34” );

É una ConcurrentMap !

map.put( “user-34”, userInstance );

map.get( “user-34” );

map.remove( “user-34” );

map.putIfAbsent( “user-38”, another );

Qualche altro dettaglio su Infinispan

● Support for Transactions (XA)● CacheLoaders

●Cassandra, JDBC, Amazon S3 (jclouds),...● Tree API for JBossCache compatibility● Lucene integration

● Two-fold● Some Hibernate integrations

● Second level cache● Hibernate Search indexing backend

Obiettivi di Hibernate OGM

• Encourage new data usage patterns

• Familiar environment

• Ease of use

• easy to jump in

• easy to jump out

• Push NoSQL exploration in enterprises

• “PaaS for existing API” initiative

Cos'é

• JPA front end to key/value stores• Object CRUD (incl polymorphism and

associations)• OO queries (JP-QL)

• Reuses• Hibernate Core• Hibernate Search (and Lucene)• Infinispan

• Is not a silver bullet• not for all NoSQL use cases

Entitá come blob serializzati?

• Serialize objects into the (key) value• store the whole graph?

• maintain consistency with duplicated objects• guaranteed identity a == b• concurrency / latency• structure change and (de)serialization,

class definition changes

OGM’s approach to schema

• Keep what’s best from relational model• as much as possible• tables / columns / pks

• Decorrelate object structure from data structure

• Data stored as (self-described) tuples• Core types limited

• portability

• Hibernate Search indexes entities• Store Lucene indexes in Infinispan• JP-QL to Lucene query transformation

• Works for simple queries• Lucene is not a relational SQL engine

E ora?

• MongoDB• EHCache / Terracotta• Redis• Voldemort• Neo4J• Dynamo• ... Git? Spreadsheet? ...CapeDwarf?

@Infinispan@Hibernate@SanneGrinovero

http://infinispan.orghttp://in.relation.tohttp://jboss.org

Infinispan,Lucene,Hibername OGM

Technology

Slides Lucene

Infinispan Data Grid Platform

javaone handout infinispan - JBoss

Hibernate OGM - JPA for Infinispan and NoSQL

Infinispan 9.1 User Guide

Infinispan - Open Source Data Grid rev2

datasheet a4 infinispan - JBoss · 2015. 3. 4. ·...

What's New in Infinispan 6.0

Infinispan from POC to Production

OrientDB & Lucene

Faceting with Lucene Block Join Query - Lucene/Solr...

Infinispan @ Red Hat Forum 2013

Infinispan codemotion - Codemotion Rome 2015

Introduction to Infinispan - JBoss · 2020. 6. 4. ·...

Lucene Part2. Lucene Jarkarta Lucene ( is a high-...

Lucene @ Yelp