A Content Management Primer: What I Wish I Knew Richard Esplin Community Technology
May 08, 2015
A Content Management Primer:What I Wish I Knew
Richard Esplin
Community Technology
Patterns for Handling Content in Applications
Richard Esplin
Community Technology
Why Relational Won't Cut It
Richard Esplin
Community Technology
Solving SharePoint Type Problems With An Open Source Stack
Richard Esplin
Community Technology
Agenda
● Making the case for content management● Best practices: the platform approach● Introducing CMIS● Live examples
What is Alfresco?
Enterprise content management platform across cloud, on-premise, or both
API for content applications that can run in the cloud, on-premise, or both
Content hub for your enterprise tablets1cloud on-premise hybrid cloud sync
What is “content”?
● Data● Don't mistake Code for Content
● Unstructured Data● Structured data works well in a relational data store, XML store, or
key-value store
● Unstructured Binary Data● Unstructured non-binary data works well in source control
● Examples:● Audio, Video, Images, Office Documents, Engineering Files,
Reports
What is a “content-centric application”?
● Applications that access binary files● Files are often generated collaboratively● Often must deal with large numbers of files● May include a mix of structured and unstructured
content● May also include business processes
A few examples
● Web site with catalogs, white papers, and videos● Expense report review and approval● Contract negotiation, creation, and review● Research study authoring● Sales / Marketing collateral creation and communication● Course guide authoring and publishing● Images and media in games● Media curation, transformation, and delivery● Legal compliance and corporate records management
Or the business is saying . . .
● I've got a ton of files,● I've got people that
produce and consume them,
● I've got systems that use them,
● I want to make it easier!
Doug Waldron (cc attribution share-alike)http://www.flickr.com/photos/dougww/922328173/
Let's build it ourselves!
Pasukaru76 (cc attribution) http://www.flickr.com/photos/pasukaru76/4277763808/
DIY approach seems simple . . .
● “This is simple stuff.”● Grab a web-application toolkit● Favorite front-end / presentation framework● Store a bunch of files● Relational Database
● Data Model / Metadata● Comments / Ratings● Tagging / Categorization
File storage options
● On disk● Amazon S3 or an internal CAS filer● Source code control repository● XML database● NoSQL document store
Relational may not cut it
● Good at text and numbers. Not so good at binary.
● Good at static table definitions. Not so good at dynamic aspects.
● Size limits.● Random seek (streaming).● Search: Some relational databases can index
into blobs, but not all.
Once files are figured out . . .
● Ensure security● Execute a workflow● Transform the content between
types● Schedule a job● Provide shared drive access● Versioning● Replication● API Access● Integrate with authoring tools
Lotsof
custom code!
The optimistic scenario
gobucks2 (cc attribution non-commercial share-alike) http://www.flickr.com/photos/69331170@N00/2854583096
The pessimistic scenario
http://commons.wikimedia.org/wiki/File:Professor_Lucifer_Butts.gif
Evaluating DIY reasonableness
● Number and size of documents● Number and concurrency of users● Number and nature of integration points● Business process volatility and complexity● Time and cost of
● Integrating all of these services / sub-systems● Maintaining all of that code . . . forever
● Access to off-the-shelf alternatives
Introducing the content repository
● Content = a file + metadata● File system
● Content binaries● Search indexes
● Database● Relations (associations)● Metadata
● Repository● Abstraction layer
Components of content-centric systems● User Interface● Persistence / Data Model
/ Metadata● Business Process /
Workflow● Library Services
(Upload / Download, Versioning, Check-in / Check-out)
● Security● Search● Scheduler
● Transformation / Rendition / Thumbnails
● Tagging / Categorization● Authoring tool integration● Remote API● Transfer / Publication● Comments● Ratings● Activity Streams /
Notification● Quotas
Packaged systems
Open source content management
● Alfresco● Nuxeo● Knowledge Tree● Magnolia● Apache Jackrabbit● Plone
● (cmis4plone)
Best Practice: The Platform Approach
Platform approach
● The common problems have been solved● Content Platform = Repository + Services
● Find a platform that meets your needs● Extend the platform with your own business logic● Customize the UI that the platform provides● Or write your own front-end using whatever language or
framework makes sense
● Meets your current needs while providing a roadmap for the future
Evaluating content platforms
● Agility● Applicable to a broad set
of solutions vs a vertical specific solution
● Scale up, scale down● Developer ergonomics
● Fast and friendly developer model
● Open Source● Troubleshooting● Bug tracking● Community
● Standards compliance● Easier integration● Lower migration costs● Developer familiarity
General architecture
Web Applications Knowledge Portals Web Services
Virtual File System High Availability
BusinessProcessEngine
CRM
Portal Server
AppServer
Desktop
Mobile
Social Media Channels
Web Services Public Alfresco Cloud
Corporate Systems
Open Web APIs
CMISJSR-168
Connectors
WebDAVCMISCIFS
SharePointProtocol
Open WebAPIsCMIS
CMIS-basedAlfresco Sync
CMISWebDAV
andand
What is CMIS?
● Content Management Interoperability Services
● Language-independent, vendor-neutral API for content management
● Least-common-denominator (some vendors have extensions)● CRUD functions for nodes● Check-in / check-out● Associations● Permissions (Access Control Lists)● Policies● Queries● Repository Traversal
What is CMIS?
● OASIS standard● 30+ ECM vendors agreed to implement
● Two parts● Interoperability through standard SOAP and AtomPub
bindings– JSON bindings coming soon
● SQL-based query language for rich content repositories
● Vendor specific extensions may be useful
Use cases
● Collaborative content creation
● Portals
● Client application Integration
● Mashups
● Embedded content store
Client
Content Repository
Content Repository
Content Repository
Client
Content RepositoryContent
RepositoryContent Repository
● Workflow & BPM● Archival● Documents generation● Digital Asset Management (DAM)● Web Content Mangaement (WCM)
The beauty of CMIS
?
Presentation Tier
Content Services Tier
?Enterprise Apps Tier
REST SOAP
Meet CMIS
Client
Content Repository
Services
Domain Model
read write
Con
sum
er
Pro
vid
er
Vendor Mapping
ContentManagementInteroperabilityServices
CMIS lets you read, search, write, update, delete, version, control, … content and metadata!
Types
Document● Content● Renditions● Version History
Folder● Container● Hierarchy● Filing
Relationship● Source Object● Target Object
ACL● Target Object
Described byType Definitions
Policy● Target Object
Type Definitions
*
Custom Type
Object● Type Id● Parent● Display Name● Queryable● Controllable
Document● Versionable● Allow Content
Folder Relationship● Source Types● Target Types
Policy
Property● Property Id● Display Name● Type● Required● Default Value● …
Apache Chemistry
● Open Source implementations of CMIS● Umbrella project for all CMIS related projects within the
ASF● OpenCMIS (Java, client and server)● cmislib (Python, client)● phpclient (PHP, client)● DotCMIS (.NET, client)
● De-facto reference for CMIS and used by CMIS technical committee to test 1.1 features
Examples
My setup
● Debian Mint Wheezy● OpenJDK 1.6.0_24● Python 2.7.2● Alfresco Community Edition 4.0.d● Open CMIS Workbench 0.7.0
CMIS Workbench
● Download● http://chemistry.apache.org/java/developing/tools
/dev-tools-workbench.html● Connect to Alfresco
● http://localhost:8080/alfresco/cmisatom● Good tool for figuring out what CMIS can do● Check out the Groovy Console!
Python● In the shell:
virtualenv . ./bin/easy_install cmislib ./bin/python
from cmislib.model import CmisClient client = CmisClient( "http://192.168.56.1:8080/alfresco/cmisatom", "admin", "admin") repo = client.defaultRepository repo.id repo.name for (k,v) in repo.getCapabilities().iteritems(): print "%s: %s" %(k,v)
for (k,v) in repo.getRepositoryInfo().iteritems(): print "%s: %s" %(k,v)
root = repo.getRootFolder() root.name folder = root.createFolder('cmis-demo') folder.id folder.name for (k,v) in folder.properties.iteritems(): print "%s: %s" %(k,v)
● Continued:
props = {}props["cmis:objectTypeId"]="cmis:document"doc = folder.createDocumentFromString('testdoc.txt', props, contentString="This is a test showing how to create a text document", contentType='text/plain')doc.isCheckedOut()props = {}props['cmis:name'] = "test-updated.txt"doc = doc.updateProperties(props)doc.namedoc.delete()len(folder.getChildren())result = repo.query("select * from cmis:folder where cmis:name like '%alf%'")len(result)for i in result: print i.name
result = repo.query("select * from cmis:document where contains('name')")for i in result: print i.name
PHP and Drupal
● Drupal CMIS Views● http://drupal.org/project/cmis_views
● Built on Drupal CMIS● http://drupal.org/project/cmis● Configure a repository in settings.php● Enable cmis_sync● Bundles an early release of phplib
● Currently read-only● Good for exposing unstructured data alongside a
structured web page
Where to learn more
● cmis.alfresco.com includes a public CMIS server and links to CMIS resources (check out the cheet sheet)
● Read the CMIS specification● Apache Chemistry site has clients, lightweight server,
documentation● “Getting Started with CMIS” tutorial shows how to use
"cURL to hit AtomPub bindings directly"● Slideshare has some CMIS related presentations from
Alfresco DevCon here and here
Questions?
Attribution and Licensing
● Copyright 2012, Alfresco Software● Some images used in this presentation are
licensed under the Creative Commons by-attribution non-commercial share-alike license.
● Original work in this presentation is licensed under the Creative Commons by-attribution license.
● Thanks to Jeff Potts for allowing me to base my presentation on his.