Top Banner
Padova, InfoCamere JBoss User Group 12 Aprile 2012
75

Infinispan,Lucene,Hibername OGM

May 06, 2015

Download

Technology

JBug Italy

Sanne Grinovero - JBug Milano
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Infinispan,Lucene,Hibername OGM

Padova, InfoCamereJBoss User Group

12 Aprile 2012

Page 2: Infinispan,Lucene,Hibername OGM

Chi sono?

• Team Hibernate

– Hibernate Search

– Hibernate OGM

• Infinispan

– Infinispan Core

– Infinispan Query

– JGroups

• Apache Lucene

Sanne GrinoveroItaliano, Olandese, NewcastleRed Hat: JBoss, Engineering

Page 3: Infinispan,Lucene,Hibername OGM

Infinispan

• Cache distribuita

• Datagrid scalabile e transazionale: performance estreme e cloud

• NoSQL “DataBase”: key-value store

– Come si interroga un data grid ?

SELECT * FROM GRID

Page 4: Infinispan,Lucene,Hibername OGM
Page 5: Infinispan,Lucene,Hibername OGM

Interrogare una “Grid”

Object v = cache.get(“c7”);

Page 6: Infinispan,Lucene,Hibername OGM

Senza chiave, non puoi ottenere il valore.

Page 7: Infinispan,Lucene,Hibername OGM

É pratico il solo accesso per chiave?

Page 8: Infinispan,Lucene,Hibername OGM

Test sulla mia libreria

• Dov'é Hibernate Search in Action?

• Mi passi

ISBN 978-1-933988-17-7 ?

• Prendi i libri su Gaudí ?

Page 9: Infinispan,Lucene,Hibername OGM
Page 10: Infinispan,Lucene,Hibername OGM
Page 11: Infinispan,Lucene,Hibername OGM
Page 12: Infinispan,Lucene,Hibername OGM

Come implementare queste funzioni su un

Key/Value store?

• Dov'é Hibernate Search in Action?

• Mi passi ISBN 978-1-933988-17-7 ?

• Trovi i libri su Gaudí ?

Page 13: Infinispan,Lucene,Hibername OGM

document based NoSQL: Map/Reduce

Infinispan non é propriamente document based ma offre Map/Reduce.

Eppure non é escluso l'uso di JSON, XML, YAML, Java:public class Book implements Serializable {

final String title; final String author; final String editor;

public Book(String title, String author, String editor) { this.title = title; this.author = author; this.editor = editor; }

}

Page 14: Infinispan,Lucene,Hibername OGM

Iterate & collectclass TitleBookSearcher implements Mapper<String, Book, String, Book> { final String title; public TitleBookSearcher(String t) { title = t; } public void map(String key, Book value, Collector collector){ if ( title.equals( value.title ) ) collector.emit( key, value ); }

class BookReducer implements Reducer<String, Book> { public Book reduce(String reducedKey, Iterator<Book> iter) { return iter.next(); }}

Page 15: Infinispan,Lucene,Hibername OGM

Implementare queste semplici funzioni:

✔ Trova “Hibernate Search in Action”?

✔ Trova per codice “ISBN 978-1-933988-17-7” ?

✗ Quanti libri a proposito di “Shakespeare” ?

• Per uno score corretto in ricerche fulltext servono le frequenze dei frammenti di testo relative al corpus.

• Il Pre-tagging é poco pratico e limitante

Page 16: Infinispan,Lucene,Hibername OGM

Apache Lucene

• Progetto open source Apache™

• Integrato in innumerevoli progetti

• .. tra cui Hibernate via Hibernate Search

• Clusterizzabile via Infinispan

– Performance

– Real time

– High availability

Page 17: Infinispan,Lucene,Hibername OGM

Cosa offre Lucene?

• Ricerche per Similarity score

• Analisi del testo

– Sinonyms, Stopwords, Stemming, ...

• Reusable declarative Filters

• TermVectors

• MoreLikeThis

• Faceted Search

• Veloce!

Page 18: Infinispan,Lucene,Hibername OGM

Lucene: Stopwords

a, able, about, across, after, all, almost, also, am, among, an, and, any, are, as, at, be, because, been, but, by, can, cannot, could, dear, did, do, does, either, else, ever, every, for, from, get, got, had, has, have, he, her, hers, him, his, how, however, i, if, in, into, is, it, its, just, least, let, like, likely, may, me, might, most, must, my, neither, no, nor, not, of, off, often, on, only, or, other, our, own, rather, said, say, says, she, should, since, so, some, than, that, the, their, them, then, there, these, they, this, tis, to, too, twas, us, wants, was, we, were, what, when, where, which, while, who, whom, why, will, with, would, yet, you, your

Page 19: Infinispan,Lucene,Hibername OGM

Filters

Page 20: Infinispan,Lucene,Hibername OGM

Faceted Search

Page 21: Infinispan,Lucene,Hibername OGM

Facciamo un bel motore di ricerca che restituisce i risultati in ordine alfabetico?

Page 22: Infinispan,Lucene,Hibername OGM

Chi usa Lucene?

Nexus

Page 23: Infinispan,Lucene,Hibername OGM

Dov'é la fregatura?

• Necessita di un indice: risorse fisiche e di amministrazione.

– in memory

– on filesystem

– in Infinispan

• Sostanzialmente immutable segments

– Ottimizzato per data mining / query, non per updates.

• Un mondo di stringhe e vettori di frequenze

Page 24: Infinispan,Lucene,Hibername OGM

Infinispan Query quickstart• Abilita indexing=true nella

configurazione

• Aggiungi il modulo infinispan-query.jar al classpath

• Annota i POJO inseriti nella cache per le modalitá di indicizzazione

<dependency> <groupId>org.infinispan</groupId> <artifactId>infinispan-query</artifactId> <version>5.1.3.FINAL</version></dependency>

Page 25: Infinispan,Lucene,Hibername OGM

Configurazione tramite codice

Configuration c = new Configuration() .fluent() .indexing() .addProperty("hibernate.search.default.directory_provider", "ram") .build();

CacheManager manager = new DefaultCacheManager(c);

Page 26: Infinispan,Lucene,Hibername OGM

Configurazione / XML

<?xml version="1.0" encoding="UTF-8"?><infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:infinispan:config:5.0 http://www.infinispan.org/schemas/infinispan-config-5.0.xsd" xmlns="urn:infinispan:config:5.0"><default> <indexing enabled="true" indexLocalOnly="true"> <properties> <property name="hibernate.search.option1" value="..." /> <property name="hibernate.search.option2" value="..." /> </properties> </indexing></default>

Page 27: Infinispan,Lucene,Hibername OGM

Annotazioni sul modello

@ProvidedId @Indexedpublic class Book implements Serializable {

@Field String title; @Field String author; @Field String editor;

public Book(String title, String author, String editor) { this.title = title; this.author = author; this.editor = editor; }

}

Page 28: Infinispan,Lucene,Hibername OGM

Esecuzione di Query

SearchManager sm = Search.getSearchManager(cache); Query query = sm.buildQueryBuilderForClass(Book.class) .get() .phrase() .onField("title") .sentence("in action") .createQuery(); List<Object> list = sm.getQuery(query).list();

Page 29: Infinispan,Lucene,Hibername OGM

Architettura• Integra Hibernate Search (engine)

– Listener a eventi Hibernate & transazioni

• Eventi Infinispan & transazioni

– Mappa tipi Java e grafi del modello a Documents di Lucene

– Thin-layer design

Page 30: Infinispan,Lucene,Hibername OGM

Index mapping

Page 31: Infinispan,Lucene,Hibername OGM

Tests perInfinispan Query

https://github.com/infinispan/infinispan

Page 32: Infinispan,Lucene,Hibername OGM

org.apache.lucene.search.Query luceneQuery =

queryBuilder.phrase()

.onField( "description" )

.andField( "title" )

.sentence( "a book on highly scalable query engines" )

.enableFullTextFilter( “ready-for-shipping” )

.createQuery();

CacheQuery cacheQuery =

searchManager.getQuery( luceneQuery, Book.class);

List<Book> objectList = cacheQuery.list();

Page 33: Infinispan,Lucene,Hibername OGM

Architettura: Infinispan Query

Page 34: Infinispan,Lucene,Hibername OGM

Problemi di scalabilitá

• Writer locks globali

• Sharing su NFS molto problematico

Page 35: Infinispan,Lucene,Hibername OGM

Queue-based clustering(filesystem)

Page 36: Infinispan,Lucene,Hibername OGM

Index stored in Infinispan

Page 37: Infinispan,Lucene,Hibername OGM
Page 38: Infinispan,Lucene,Hibername OGM

Quickstart Hibernate Search

• Aggiungi la dipendenza ad hibernate-search:

<dependency>

   <groupId>org.hibernate</groupId>

   <artifactId>hibernate­search­orm</artifactId>

   <version>4.1.0.Final</version>

</dependency>

Page 39: Infinispan,Lucene,Hibername OGM

Quickstart Hibernate Search

• Tutto il resto é opzionale:

– Come gestire gli indici

– Moduli di estensione, Analyzer custom

– Performance tuning

– Mapping custom dei tipi

– Clustering

• JGroups

• Infinispan

• JMS

Page 40: Infinispan,Lucene,Hibername OGM

Quickstart Hibernate Search

@Entitypublic class Essay {   @Id   public Long getId() { return id; }

   public String getSummary() { return summary; }   @Lob    public String getText() { return text; }   @ManyToOne    public Author getAuthor() { return author; }...

Page 41: Infinispan,Lucene,Hibername OGM

Quickstart Hibernate Search

@Entity @Indexedpublic class Essay {   @Id   public Long getId() { return id; }

   public String getSummary() { return summary; }   @Lob    public String getText() { return text; }   @ManyToOne    public Author getAuthor() { return author; }...

Page 42: Infinispan,Lucene,Hibername OGM

Quickstart Hibernate Search

@Entity @Indexedpublic class Essay {   @Id   public Long getId() { return id; }   @Field   public String getSummary() { return summary; }   @Lob    public String getText() { return text; }   @ManyToOne    public Author getAuthor() { return author; }...

Page 43: Infinispan,Lucene,Hibername OGM

Quickstart Hibernate Search

@Entity @Indexedpublic class Essay {   @Id   public Long getId() { return id; }   @Field   public String getSummary() { return summary; }   @Lob @Field @Boost(0.8)   public String getText() { return text; }   @ManyToOne    public Author getAuthor() { return author; }...

Page 44: Infinispan,Lucene,Hibername OGM

Quickstart Hibernate Search

@Entity @Indexedpublic class Essay {   @Id   public Long getId() { return id; }   @Field   public String getSummary() { return summary; }   @Lob @Field @Boost(0.8)   public String getText() { return text; }   @ManyToOne @IndexedEmbedded    public Author getAuthor() { return author; }...

Page 45: Infinispan,Lucene,Hibername OGM

@Entitypublic class Author {

@Id @GeneratedValueprivate Integer id;private String name;@OneToManyprivate Set<Book>

books;}

@Entitypublic class Book { private Integer id; private String title;}

Un secondo esempio

Page 46: Infinispan,Lucene,Hibername OGM

@Entity @Indexedpublic class Author {

@Id @GeneratedValueprivate Integer id;

@Field(store=Store.YES)

private String name;@OneToMany

@IndexedEmbeddedprivate Set<Book>

books;}

@Entitypublic class Book { private Integer id; @Field(store=Store.YES) private String title;}

Struttura dell'indice

Page 47: Infinispan,Lucene,Hibername OGM

String[] productFields = {"summary", "author.name"};

Query luceneQuery = // query builder or any Lucene Query

FullTextEntityManager ftEm =   Search.getFullTextEntityManager(entityManager);

FullTextQuery query =   ftEm.createFullTextQuery( luceneQuery, Product.class );

List<Product> items =   query.setMaxResults(100).getResultList();

int totalNbrOfResults = query.getResultSize();

Query

TotalNbrOfResults= 8.320.000(0.002 seconds)

Page 48: Infinispan,Lucene,Hibername OGM

Uso della DSL

Page 49: Infinispan,Lucene,Hibername OGM

Sui risultati:

• Managed POJO: modifiche alle entitá applicati sia a Lucene che al database

• Paginazione JPA, familiari (standard):

– .setMaxResults( 20 ).setFirstResult( 100 );

• Restrizioni sul tipo, query fulltext polimorifiche:

– .createQuery( luceneQuery, A.class, B.class, ..);

• Projection

• Result mapping

Page 50: Infinispan,Lucene,Hibername OGM

FiltersFullTextQuery ftQuery = s // s is a FullTextSession

   .createFullTextQuery( query, Product.class )

   .enableFullTextFilter( "filtroMinori" )

   .enableFullTextFilter( "offertaDelGiorno" )

      .setParameter( "day", “20120412” )

   .enableFullTextFilter( "inStockA" )

      .setParameter( "location", "Padova" );

List<Product> results = ftQuery.list();

Page 51: Infinispan,Lucene,Hibername OGM

Uso di Infinispan per la distribuzione degli indici

Page 52: Infinispan,Lucene,Hibername OGM

Clustering di un uso Lucene “diretto”

• Usando org.apache.lucene

– Tradizionalmente difficile da distribuire su nodi multipli

– Su qualsiasi cloud

Page 53: Infinispan,Lucene,Hibername OGM

Nodo singoloidea di performance

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 5000 10000 15000 20000 25000

Queries/sec

qu

eri

es

pe

r s

eco

nd

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 50 100 150 200 250 300 350 400

Write ops/sec

Page 54: Infinispan,Lucene,Hibername OGM

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 5000 10000 15000 20000 25000

Queries/sec

qu

eri

es

pe

r s

eco

nd

Infinispan Local

FSDirectory

Infinispan D40

Infinispan D4

Infinispan 0

RAMDirectory

0 50 100 150 200 250 300 350 400

Write ops/sec

Nodi multipliidea di performance

Page 55: Infinispan,Lucene,Hibername OGM

Le scritture non scalano?

Page 56: Infinispan,Lucene,Hibername OGM

Suggerimenti per performance ottimali

• Calibra il chunk_size per l'uso effettivo del vostro indice (evita i read lock evitando la frammentazione)

• Verifica la dimensione dei pacchetti network: blob size, JGroups packets, network interface and hardware.

• Scegli e configura un CacheLoader adatto

Page 57: Infinispan,Lucene,Hibername OGM

Requisiti di memoria

• RAMDirectory: tutto l'indice (e piú) in RAM.

• FSDirectory: un buon OS sa fare un ottimo lavoro di caching di IO – spesso meglio di RAMDirectory.

• Infinispan: configurabile, fino alla memoria condivisa tra nodi

– Flexible

– Fast

– Network vs. disk

Page 58: Infinispan,Lucene,Hibername OGM

Moduli per cloud deployment scalabili

One Infinispan to rule them all

– Store Lucene indexes

– Hibernate second level cache

– Application managed cache

– Datagrid

– EJB, session replication in AS7

– As a JPA “store” via Hibernate OGM

Page 59: Infinispan,Lucene,Hibername OGM

Ingredienti per la cloud• JGroups DISCOVERY protocol

– MPING

– TCP_PING

– JDBC_PING

– S3_PING

• Scegli un CacheLoader

– Database based, Jclouds, Cassandra, ...

Page 60: Infinispan,Lucene,Hibername OGM

Futuro prossimo• Semplificare la scalabilitá in scrittura

• Auto-tuning dei parametri di clustering – ergonomics!

• Parallel searching: multi/core + multi/node

• A component of

– http://www.cloudtm.eu

Page 61: Infinispan,Lucene,Hibername OGM
Page 62: Infinispan,Lucene,Hibername OGM

JPA for NoSQL

Page 63: Infinispan,Lucene,Hibername OGM

NoSQL:la flessibilitá costa

• Programming model• one per product :-(

• no schema => app driven schema• query (Map Reduce, specific DSL, ...)• data structure transpires• Transaction• durability / consistency

Page 64: Infinispan,Lucene,Hibername OGM

Esempio: Infinispan

Distributed Key/Value store

• (or Replicated, local only efficient cache,

invalidating cache)Each node is equal

• Just start more nodes, or kill some

No bottlenecks

• by design

Cloud-network friendly

• JGroups

• And “cloud storage” friendly too!

Page 65: Infinispan,Lucene,Hibername OGM

ABC di Infinispan

map.put( “user-34”, userInstance );

map.get( “user-34” );

map.remove( “user-34” );

Page 66: Infinispan,Lucene,Hibername OGM

É una ConcurrentMap !

map.put( “user-34”, userInstance );

map.get( “user-34” );

map.remove( “user-34” );

map.putIfAbsent( “user-38”, another );

Page 67: Infinispan,Lucene,Hibername OGM

Qualche altro dettaglio su Infinispan

● Support for Transactions (XA)● CacheLoaders

●Cassandra, JDBC, Amazon S3 (jclouds),...● Tree API for JBossCache compatibility● Lucene integration

● Two-fold● Some Hibernate integrations

● Second level cache● Hibernate Search indexing backend

Page 68: Infinispan,Lucene,Hibername OGM

Obiettivi di Hibernate OGM

• Encourage new data usage patterns

• Familiar environment

• Ease of use

• easy to jump in

• easy to jump out

• Push NoSQL exploration in enterprises

• “PaaS for existing API” initiative

Page 69: Infinispan,Lucene,Hibername OGM

Cos'é

• JPA front end to key/value stores• Object CRUD (incl polymorphism and

associations)• OO queries (JP-QL)

• Reuses• Hibernate Core• Hibernate Search (and Lucene)• Infinispan

• Is not a silver bullet• not for all NoSQL use cases

Page 70: Infinispan,Lucene,Hibername OGM

Entitá come blob serializzati?

• Serialize objects into the (key) value• store the whole graph?

• maintain consistency with duplicated objects• guaranteed identity a == b• concurrency / latency• structure change and (de)serialization,

class definition changes

Page 71: Infinispan,Lucene,Hibername OGM

OGM’s approach to schema

• Keep what’s best from relational model• as much as possible• tables / columns / pks

• Decorrelate object structure from data structure

• Data stored as (self-described) tuples• Core types limited

• portability

Page 72: Infinispan,Lucene,Hibername OGM
Page 73: Infinispan,Lucene,Hibername OGM

Query

• Hibernate Search indexes entities• Store Lucene indexes in Infinispan• JP-QL to Lucene query transformation

• Works for simple queries• Lucene is not a relational SQL engine

Page 74: Infinispan,Lucene,Hibername OGM

E ora?

• MongoDB• EHCache / Terracotta• Redis• Voldemort• Neo4J• Dynamo• ... Git? Spreadsheet? ...CapeDwarf?

Page 75: Infinispan,Lucene,Hibername OGM

Q&A

@Infinispan@Hibernate@SanneGrinovero

http://infinispan.orghttp://in.relation.tohttp://jboss.org