Top Banner
Apache ManifoldCF Alfresco WebScript Repository Connector Alfresco Meetup Rome 2013
30

Alfresco WebScript Connector for Apache ManifoldCF

Jan 15, 2015

Download

Technology

A quick overview about Apache ManifoldCF and the latest work about the new Alfresco connector shown at Codemotion Rome 2013
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Alfresco WebScript Connector for Apache ManifoldCF

Apache ManifoldCFAlfresco WebScript Repository Connector

Alfresco Meetup Rome 2013

Page 2: Alfresco WebScript Connector for Apache ManifoldCF

About me● Open Source ECM Specialist at Sourcesence

● Author and Technical Reviewer at Packt Publishing○ Alfresco 3 Web Services (2010)○ GateIn Cookbook (2012)

● Alfresco Community (nickname OpenPj)○ Alfresco Community Star○ Alfresco Wiki Gardener○ Top 10 supporter (english and italian) ○ Moderator of the italian forum

● PMC Member and Committer at the Apache Software Foundation

● JBoss Community○ Content editor for jboss.org○ Project Leader and Committer for PortletSwap / Blog / Wiki

Page 3: Alfresco WebScript Connector for Apache ManifoldCF

Overview● Introducing Apache ManifoldCF

○ What is ManifoldCF?○ Why ManifoldCF?○ Architecture○ Who is using ManifoldCF?○ The book

● How ManifoldCF supports Alfresco● The goal of the new connector

○ Architecture○ Roadmap○ The team

● Resources

Page 4: Alfresco WebScript Connector for Apache ManifoldCF

The storyThe original ManifoldCF code base was granted by MetaCarta to the Apache Software Foundation in December 2009. The MetaCarta effort represented more than five years of successful development and testing in multiple, challenging enterprise environments. The project was graduated as Apache Top Level Project in July 2012.

Page 5: Alfresco WebScript Connector for Apache ManifoldCF

What is ManifoldCF?Open Source crawler● crawling model (add, change, delete)● schedule jobs to create indexes

○ get contents from repositories○ push contents on search servers

Apache ManifoldCF

Repository 1

Repository 2

Repository 3

Search Server 1

Search Server 2

Search Server 3

Page 6: Alfresco WebScript Connector for Apache ManifoldCF

What is ManifoldCF?

● Out-Of-The-Box it is distributed as a webapp

○ REST API

○ Authority Service

○ Crawler UI

● can be embedded in any Java application

Page 7: Alfresco WebScript Connector for Apache ManifoldCF

Why ManifoldCF?● Reliability

● Incremental

● Flexible

● Multi repositories

● Security model

● Monitoring

Page 8: Alfresco WebScript Connector for Apache ManifoldCF

Why ManifoldCF? - ReliabilityJobs scheduling and configuration are stored in the database to

maintain the state of all the executions

Pull Agent Daemon

Database

Repository Search Serverconfiguration and scheduling

Page 9: Alfresco WebScript Connector for Apache ManifoldCF

Why ManifoldCF? - Incrementalget content changesets obtained from the repository API

completechangesets Apache ManifoldCF

Repository

Page 10: Alfresco WebScript Connector for Apache ManifoldCF

Why ManifoldCF? - Flexible

incompletechangesets

N1N2

ChangeDiscovery

Apache Manifold CF

Repository

If the repository can't supply all the changes Manifold can

discover them through crawling

Page 11: Alfresco WebScript Connector for Apache ManifoldCF

Why ManifoldCF? - Multi repositoriesJobs can retrieve contents from the following repositories:

● CMIS-compliant● Alfresco ● IBM FileNet● EMC Documentum ● Microsoft SharePoint● OpenText LiveLink● Autonomy Meridio● Memex Patriarch● Windows Share/DFS ● Generic JDBC ● Generic Filesystem ● Generic RSS and Web

Page 12: Alfresco WebScript Connector for Apache ManifoldCF

Why ManifoldCF? - Multi repositoriesJobs can ingest contents to the following search servers:● Apache Solr● ElasticSearch ● OpenSearchServer● MetaCarta GTS

Page 13: Alfresco WebScript Connector for Apache ManifoldCF

Why ManifoldCF? - Security model

Retrieve per-content ACLsAuthority Service

Pull Agent Daemon

Repository 1

Repository 2

Repository 3

Authority 1

Authority 2

Authority 3

Search Server

user access tokens

doc access tokens

user specific search results

Page 14: Alfresco WebScript Connector for Apache ManifoldCF

Why ManifoldCF? - MonitoringUI Crawler allows you to:● configure jobs and connectors● monitor jobs execution● monitor contents ingestion

○ status reports■ document status■ queue status

○ history reports ■ simple history■ maximum activity■ maximum bandwidth■ result histogram

Page 15: Alfresco WebScript Connector for Apache ManifoldCF

Architecture - Job

JobRepository Search Server

ACLs

- metadata mapping- content ingestion

retrieve content ACL

- verbal description- crawling model- scheduling

query to retrieve contents

Repository Connector

Output Connector

Authority Connector

Page 16: Alfresco WebScript Connector for Apache ManifoldCF

Who is using ManifoldCF?

Page 17: Alfresco WebScript Connector for Apache ManifoldCF

The book: ManifoldCF in Action

ManifoldCF in Action

by Karl Wright

published by Manning

Karl is the original developer and the

principal committer of Apache ManifoldCF

The book is available at http://www.manning.com/wright

Page 18: Alfresco WebScript Connector for Apache ManifoldCF

How ManifoldCF supports Alfresco● CMIS Repository Connector based on OpenCMIS

● The current Alfresco Repository Connector only supports CML

○ works on any version of Alfresco 2.x, 3.x and 4.x

○ no support for quering Solr from Alfresco

○ it will die at the end of the year

○ Please see the Alfresco Roadmap

Page 19: Alfresco WebScript Connector for Apache ManifoldCF

Alfresco Solr search subsystem● Remote crawling of contents and ACLs into Solr

○ REST API for retrieving changesets from Alfresco db● Solr server provided by Alfresco

○ based on Apache Solr 1.4.1 (uhm...really!!!???)● hardcoded● can't be used with your own Solr instance

○ customers have newer version of Solr■ interested in new features (SolrCloud, sharding...)■ hundred of improvements available in 3.x and 4.x

Page 20: Alfresco WebScript Connector for Apache ManifoldCF

Alfresco Solr search subsystem

AlfrescoSolr 1.4.1

(provided by Alfresco)

Alfresco REST Client

alf_transactionalf_acl_*

alf_node_*

Transactions and ACL

Indexes

Page 21: Alfresco WebScript Connector for Apache ManifoldCF

Roadmap

Page 22: Alfresco WebScript Connector for Apache ManifoldCF

Goal - 1Create a new connector using the Alfresco REST Client

● provided and supported by Alfresco

○ for us is a Maven dependency :)

● invokes the Alfresco Solr API

Page 23: Alfresco WebScript Connector for Apache ManifoldCF

Goal - 2 - check feasibilityCreate a real Enterprise alternative for managing indexes

● compatibility with the SearchService of Alfresco

● repository takes care only of contents

● indexes are managed externally

● no redundancy for indexes

effort to redirect queries executions

Page 24: Alfresco WebScript Connector for Apache ManifoldCF

Goal - 3 - SecurityImplement an Alfresco authority connector○ manages ACLs indexing

Page 25: Alfresco WebScript Connector for Apache ManifoldCF

Goal - 4Manage indexes using ManifoldCF against any supported

search server

● Apache Solr 3.x / 4.x

● ElasticSearch

● Open Search Server

● MetaCarta

Page 26: Alfresco WebScript Connector for Apache ManifoldCF

Architecture

Alfresco

alf_transactionalf_acl_*

alf_node_*

ManifoldCFAlfresco WebScript

Repository Connector

Alfresco REST Client

Output Connector

Search Server

Indexes

Page 27: Alfresco WebScript Connector for Apache ManifoldCF

The team of the new connector

● Piergiorgio Lucidi (Sourcesense + ASF)

● Maurizio Pillitu (Alfresco)

● Aingaran Pillai (Zaizi) [new entry]

● Fran Alvarez (Zaizi) [new entry]

● Abraham Ayala (Zaizi) [new entry]

Page 28: Alfresco WebScript Connector for Apache ManifoldCF

Join us!

● We are looking for developers

● this is a work in progress

● don't fork the project feel free to join us

^__^

Page 29: Alfresco WebScript Connector for Apache ManifoldCF

Resources

● Apache ManifoldCFhttp://manifoldcf.apache.org/

● The connector hosted on github:https://github.com/maoo/alfresco-webscript-manifold-connector

● it will be included in Apache ManifoldCF

Page 30: Alfresco WebScript Connector for Apache ManifoldCF

Thank you for your attention!

http://www.open4dev.com