Alfresco Search Internals 1 Andy Hind Senior Developer, Alfresco twitter: @andy_hind
Dec 05, 2014
1
Alfresco Search Internals
Andy HindSenior Developer, Alfresco
twitter: @andy_hind
2
Agenda
• Overview• Direction• Challenges• Alfresco FTS• CMIS Query Language
3
Overview
Data Modelling Options
<property name="cmis:name">
...
<type>d:text</type>
...
<index enabled="true">
<tokenised>both</tokenised>
<atomic>true</atomic>
<stored>false</stored>
</index>
....
</property>
4
Overview
Data Modelling Options
<property name="cmis:name">
...
<type>d:text</type>
...
<index enabled="true">
<tokenised>both</tokenised>
<atomic>true</atomic>
<stored>false</stored>
</index>
....
</property>
Type drives analysis
5
Overview
Data Modelling Options
<property name="cmis:name">
...
<type>d:text</type>
...
<index enabled="true">
<tokenised>both</tokenised>
<atomic>true</atomic>
<stored>false</stored>
</index>
....
</property>
truefalse
6
Overview
Data Modelling Options
<property name="cmis:name">
...
<type>d:text</type>
...
<index enabled="true">
<tokenised>both</tokenised>
<atomic>true</atomic>
<stored>false</stored>
</index>
....
</property>
truefalseboth
7
Overview
Data Modelling Options
<property name="cmis:name">
...
<type>d:text</type>
...
<index enabled="true">
<tokenised>both</tokenised>
<atomic>true</atomic>
<stored>false</stored>
</index>
....
</property>
truefalse
(d:content)
8
Overview
Data Modelling Options
<property name="cmis:name">
...
<type>d:text</type>
...
<index enabled="true">
<tokenised>both</tokenised>
<atomic>true</atomic>
<stored>false</stored>
</index>
....
</property>
truefalse
9
Overview
Configuration
• IndexerAndSearcher interface and related factory• Redirection by store protocol or value• Factories
• AVM• DM• Unindexed• All lucene based with options set via properties
• Analysis by data type and locale• alfresco/model/dataTypeAnalyzers_{local}.properties
10
Overview
Configuration
<bean id="indexerAndSearcherFactory" class="org.alfresco.repo.service.StoreRedirectorProxyFactory">
<property name="proxyInterface">
<value>org.alfresco.repo.search.impl.lucene.LuceneIndexerAndSearcher</value>
</property>
<property name="defaultBinding">
<ref bean="admLuceneIndexerAndSearcherFactory"></ref>
</property>
<property name="redirectedProtocolBindings">
<map>
<entry key="workspace">
<ref bean="admLuceneIndexerAndSearcherFactory"></ref>
</entry>
<entry key="avm">
<ref bean="avmLuceneIndexerAndSearcherFactory"></ref>
</entry>
</map>
</property>
<property name="redirectedStoreBindings">
<map>
<entry key="workspace://lightWeightVersionStore">
<ref bean="admLuceneUnIndexedIndexerAndSearcherFactory"></ref>
</entry>
<entry key="workspace://version2Store">
<ref bean="admLuceneUnIndexedIndexerAndSearcherFactory"></ref>
</entry>
</map>
</property>
</bean>
11
Overview
Configuration properties
lucene.maxAtomicTransformationTime=20
lucene.query.maxClauses=10000
lucene.indexer.cacheEnabled=true
lucene.indexer.maxDocIdCacheSize=10000
lucene.indexer.maxDocumentCacheSize=100
lucene.indexer.maxParentCacheSize=10000
lucene.indexer.maxIsCategoryCacheSize=-1
lucene.indexer.maxLinkAspectCacheSize=10000
lucene.indexer.maxPathCacheSize=10000
lucene.indexer.maxTypeCacheSize=10000
12
Overview
Configuration properties
lucene.indexer.mergerTargetIndexCount=5
lucene.indexer.mergerTargetOverlayCount=5
lucene.indexer.mergerTargetOverlaysBlockingFactor=1
lucene.indexer.mergerMergeBlockingFactor=1
lucene.indexer.maxDocsForInMemoryMerge=10000
lucene.indexer.maxRamInMbForInMemoryMerge=16
lucene.indexer.postSortDateTime=true
lucene.indexer.defaultMLIndexAnalysisMode=EXACT_LANGUAGE_AND_ALL lucene.indexer.defaultMLSearchAnalysisMode=EXACT_LANGUAGE_AND_ALL
lucene.indexer.maxFieldLength=10000
13
Overview
Authorization
• Post query filter for READ• Configuration
• system.acl.maxPermissionCheckTimeMillis=10000• system.acl.maxPermissionChecks=1000
• Also set at query time• Read performance
• 1.4 old model• 2.2/3.0 new• 3.4 improved read• system/admin/others
14
Overview
Query Support
• Lucene based• Lucene with Alfresco extensions (PATH, ...)• Alfresco FTS• CMIS QL + extensions
• DB based• XPath• Specific APIs – (using the child association table)
• NodeService – selecting children• PersonService – lookup people
15
Overview
Issues
• Factory abstraction• Transaction vs Snapshot• Query language abstraction
• Repo reliance on the lucene index• Cross locale support• Rebuild• Cluster (loss of consistency)• Lucene limitations
• Delete/add and reindexing
• DB schema for properties• Read permission evaluation • One big store• Analyser configuration• Associations• Richer data model control - analysis
16
Direction
Query Language Abstraction
• Alfresco FTS• CMIS QL
Parser•Alfresco FTS•CMIS QL• ... Abstract Query
Representation
Query Engine•Lucene
17
Direction
Query Language Abstraction
• Alfresco FTS• CMIS QL
Parser•Alfresco FTS•CMIS QL• ... Abstract Query
Representation
Query Engine•Lucene
SOLRDB/SQL
18
Direction
SOLR
• Data model integration• Tracking – eventual consistency
• Not suitable for RM
• Query time ACL filtering• PATH support• SOLR scalability and elasticity• faceting etc
19
Alfrecso FTS
Introduction
• CMIS QL FTS (almost)• Google • Lucene• Developer/App Customisation
• Define the default namespace (e.g. Allow the user to drop cm: )• Disable/enable/modify certain language features• Define templates• Define the default field, simple templates for users• Share defines the “keywords” template as the default field• "%(cm:name cm:title cm:description ia:whatEvent
ia:descriptionEvent lnk:title lnk:description TEXT)
20
Alfresco FTS
Syntax
• Term (exact/tokenised)• Phrase• Conjunction/Disjunction/Negation/Boosting• Fields• Wildcards• Ranges• Fuzzy matching• Proximity• Templates• See http://wiki.alfresco.com/wiki/Full_Text_Search_Query_Syntax
21
Alfresco FTS
Template example
"%(cm:name cm:title cm:description ia:whatEvent ia:descriptionEvent lnk:title lnk:description TEXT
=keywords:woof
=cm:name:woof =cm:title:woof =cm:description:woof =ia:whatEvent:woof =ia:descriptionEvent:woof =lnk:title:woof =lnk:description:woof =TEXT:woof
22
Alfresco FTS
Template example – relevance tuning
"%(cm:name cm:title cm:description ia:whatEvent ia:descriptionEvent lnk:title lnk:description TEXT^2
=keywords:woof
=cm:name:woof =cm:title:woof =cm:description:woof =ia:whatEvent:woof =ia:descriptionEvent:woof =lnk:title:woof =lnk:description:woof =TEXT:woof^2
23
CMIS QL
Introduction
• Use via CMIS or the SearchService• Read-only relational view of the repository • Subset of SQL-92 with extensions
• Type inheritance• Multi-valued properties• Full text search
• CONTAINS()• SCORE()
• Location • IN_FOLDER() • IN_TREE()
24
CMIS QL
Alfresco extensions
• JOIN to aspects only• SELECT D.*, O.* FROM cmis:document AS D
JOIN cm:ownable AS O ON D.cmis:objectId = O.cmis:objectId
• no JOIN between types/nodes yet• use Alfresco FTS instead of SQL QL FTS
• SELECT * from cmis:documentWHERE CONTAINS('cmis:name:\'test*\'')
• relax some constraints• SCORE() can be used on its own
• mvps can use svp syntax for IN, LIKE and comparisons• Queries more robust if the data model changes
25
Learn Morewiki.alfresco.comforums.alfresco.comtwitter: @AlfrescoECM