Building a Scalable Digital Asset Management Platform in the Cloud (MED402) | AWS re:Invent 2013

Post on 26-Jan-2015

105 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

With the breadth of AWS services available that are relevant to digital media, organizations can readily build out complete content/asset management (DAM/MAM/CMS) solutions in the cloud. This session provides a detailed walkthrough for implementing a scalable, rich-media asset management platform capable of supporting a variety of industry use cases. The session includes code-level walkthrough, AWS architecture strategies, and integration best practices for content storage, metadata processing, discovery, and overall library management functionality—with particular focus on the use of Amazon S3, Amazon Elastic Transcoder, Amazon DynamoDB and Amazon CloudSearch. Customer case study will highlight successful usage of Amazon CloudSearch by PBS to enable rich discovery of programming content across the breadth of their network catalog.

Transcript

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

MED402: Building a Scalable Video / Digital Asset

Management (DAM) Platform in the Cloud

Michael Limcaco – Enterprise Solutions Architect (AWS)

Jonathan Rivers – Director, Technical Operations (PBS)

November 15, 2013

Agenda

• The big picture

• Architecture

• Build-out exercise

• Customer case study (PBS)

• Observations and summary

Big Picture: Enterprise Media Architecture

Transcoders

Store output

profile and file

Content

Management,

Discovery &

Delivery

Store output

profile and file

Media

Files

RTMP

MPEG-TS

HD-SDI

Camera

Physical

Media

Live

Stream

Integrated

Workflow

Big Picture: Digital Asset Management (DAM)

Transcoders

Store output

profile and file

Content

Management,

Discovery, &

Delivery

Store output

profile and file

Media

Files

RTMP

MPEG-TS

HD-SDI

Camera

Physical

Media

Live

Stream

Integrated

Workflow

DAM

Ingest Processing Discovery &

Delivery

Workflow Management

Storage

Ingest Processing Discovery &

Delivery

Workflow Management

Storage

Key DAM Requirements

• Ingest

• Metadata extraction

• Create renditions

• Build the catalog

• Enable rich search

• Manage storage lifecycle

• Provide efficient delivery of media assets

Key DAM Requirements

• Ingest

• Metadata extraction

• Create renditions

• Build the catalog

• Enable rich search

• Manage storage lifecycle

• Provide efficient delivery of media assets

Key DAM Requirements

• Ingest

• Metadata extraction

• Create renditions

• Build the catalog

• Enable rich search

• Manage storage lifecycle

• Provide efficient delivery of media assets

Why Scalable?

• Increasing volume, variety, velocity – Collectors, cameras, sensors and sources

• Ex: UGC, raw source, Mezzanine, B-roll, creative collateral

• Final content

– Formats and standards • Transport, containers, codecs, metadata

• SD, HD, 4K …. 8K

– Devices and user expectations

• Opportunities through cloud enablement – Media platform as a service

– Multitenancy

What about Search? Ugh …

• Core elements – Project, keyword, asset name, tags, date/time capture, timecode range,

subject, format, size

• Extended structured search – Dublin core, XMP, MPEG-7, IPTC, EXIF, FCXML, SMPTE, MISB

• Unstructured search – Comments, notes, transcript, closed captioning

Enough Theory …

Let’s Build a DAM in the Cloud!

The User Experience

(Notional Reference Client)

(Demo)

Architecture

S3 Buckets

For Renditions,

Metadata Sidecar

Files

Auto scaling

Group

EC2 Workers

Auto scaling

Group DynamoDB

Amazon

CloudSearch

EC2 Workers

AWS

Beanstalk

DAM

Web Service

DAM

Storage &

Archive

Catalog

Rendition

Processing

Metadata

Processing

Event

Handler

Mailbox

Mailbox

DAM

Interface

Delivery

Cache

S3 Buckets

For Renditions,

Metadata Sidecar

Files

Auto scaling

Group

EC2 Workers

Auto scaling

Group DynamoDB

Amazon

CloudSearch

EC2 Workers

AWS

Beanstalk

DAM

Web Service

DAM

Storage &

Archive

Catalog

Rendition

Processing

Metadata

Processing

Event

Handler

Mailbox

Mailbox

DAM

Web

Interface

Delivery

Cache

S3 Buckets

For Renditions,

Metadata Sidecar

Files

Auto scaling

Group

EC2 Workers

Auto scaling

Group DynamoDB

Amazon

CloudSearch

EC2 Workers

AWS

Beanstalk

DAM

Web Service

DAM

Storage &

Archive

Catalog

Rendition

Processing

Metadata

Processing

Event

Handler

Mailbox

Mailbox

DAM

Web

Interface

Delivery

Cache

S3 Buckets

For Renditions,

Metadata Sidecar

Files

Auto scaling

Group

EC2 Workers

Auto scaling

Group DynamoDB

Amazon

CloudSearch

EC2 Workers

AWS

Beanstalk

DAM

Web Service

DAM

Storage &

Archive

Catalog

Rendition

Processing

Metadata

Processing

Event

Handler

Mailbox

Mailbox

DAM

Web

Interface

Delivery

Cache

S3 Buckets

For Renditions,

Metadata Sidecar

Files

Auto scaling

Group

EC2 Workers

Auto scaling

Group DynamoDB

Amazon

CloudSearch

EC2 Workers

AWS

Beanstalk

DAM

Web Service

DAM

Storage &

Archive

Catalog

Rendition

Processing

Metadata

Processing

Event

Handler

Mailbox

Mailbox

DAM

Web

Interface

Delivery

Cache

S3 Buckets

For Renditions,

Metadata Sidecar

Files

Auto scaling

Group

EC2 Workers

Auto scaling

Group DynamoDB

Amazon

CloudSearch

EC2 Workers

AWS

Beanstalk

DAM

Web Service

DAM

Storage &

Archive

Catalog

Rendition

Processing

Metadata

Processing

Event

Handler

Mailbox

Mailbox

DAM

Interface

Delivery

Cache

S3 Buckets

For Renditions,

Metadata Sidecar

Files

Auto scaling

Group

EC2 Workers

Auto scaling

Group DynamoDB

Amazon

CloudSearch

EC2 Workers

AWS

Beanstalk

DAM

Web Service

DAM

Storage &

Archive

Catalog

Rendition

Processing

Metadata

Processing

Event

Handler

Mailbox

Mailbox

DAM

Interface

Delivery

Cache

Tools Available to Us Need Description AWS Service

Ingest Integrate w / existing file-based workflows Amazon S3

Metadata Process inline and sidecar files EC2 / Elastic Beanstalk

Renditions Autogenerate thumbnails and proxies Amazon Elastic Transcoder

Catalog part 1 Administrative entities, simple retrieval Amazon DynamoDB

Catalog part 2 Field and free-form search Amazon CloudSearch

Storage Nearline, online, offline infinite storage Amazon S3, Amazon Glacier

Delivery Global caching and streaming footprint Amazon CloudFront

Catalog: A word on why DynamoDB

Container-A

Header

Layer-1

Layer-2

Container-B

Header

Layer-1

Layer-2

Core Elem1 Core Elem2 Elem from A Elem from B

Name_A Size Some_Field

Name_B Size Some_Field

Name_C Size

NoSQL Data Model

Container-C

Header

Catalog: A Word on Why CloudSearch

• Video and text

– Header fields with textual descriptions, synopsis, comments

– Tracks with speech to text, closed caption data

– Links to scripts

• Video and structured elements

– XMP dynamic media

– Sidecar files

• A managed search engine dedicated to these kinds of problems

– Case folding, stemming, stopword removal, synonyms

– Also accent normalization, UTF-8 normalization, etc.

Other Goodies

• Back-end services – AWS CLI

– Open source decode utilities

• EXIFtool

• MediaInfo

– ETL support

• Talend (representative)

• Front-end services – Node.js + AWS Node SDK

S3 Buckets

For Renditions,

Metadata Sidecar

Files

Auto scaling

Group

EC2 Workers

Auto scaling

Group DynamoDB

Amazon

CloudSearch

EC2 Workers

AWS

Beanstalk

DAM

Web Service

DAM

Storage &

Archive

Catalog

Rendition

Processing

Metadata

Processing

Event

Handler

Mailbox

Mailbox

DAM

Interface

Delivery

Cache

Amazon SNS Topic

Amazon S3 Storage

For Source,

Renditions, Metadata

Sidecar Files

Amazon SQS Queue

Rendition Jobs

Amazon SQS

Queue

Metadata

Processing Jobs Metadata

Workers

EC2 ASG

Rendition

Workers Amazon

DynamoDB

Amazon

CloudSearch

EC2 ASG

Media

Content

AWS Elastic

Beanstalk

Elastic Transcoder

Proxy / Thumbnail

Generation

DAM

Web Service

CloudFront

Download

Distribution

DAM

Catalog

EC2 Crawler

Walkthrough

(Dual Screen)

Setup

• Amazon Simple Storage Service (S3) buckets ready to go – External staging locations

– Internal working locations

• Amazon Simple Notification Service (SNS) + Amazon Simple Queue Service (SQS) wired up

• Catalog data models established – Amazon DynamoDB table “catalog” created

– Amazon CloudSearch search domain “catalog” created

1. Ingest, Crawl, Notify

a. End user initiates data copy

b. EC2 worker scans Amazon S3 staging bucket

c. EC2 worker copies or moves content

d. EC2 worker broadcasts “NEW DATA” event

Amazon SNS Topic

Amazon S3 Storage

For Source,

Renditions, Metadata

Sidecar Files

SQS Queue

Rendition Jobs

SQS Queue

Metadata

Processing

Jobs Metadata

Workers

EC2 ASG

Rendition

Workers Amazon

DynamoDB

Amazon

CloudSearch

EC2 ASG

Media

Content

AWS Elastic

Beanstalk

Elastic Transcoder

Proxy / Thumbnail

Generation

DAM

Web Service

CloudFront

Download

Distribution

DAM

Catalog

EC2 Crawler

Amazon SNS Topic

Amazon S3 Storage

For Source,

Renditions, Metadata

Sidecar Files

SQS Queue

Rendition Jobs

SQS Queue

Metadata

Processing

Jobs Metadata

Workers

EC2 ASG

Rendition

Workers Amazon

DynamoDB

Amazon

CloudSearch

EC2 ASG

Media

Content

AWS Elastic

Beanstalk

Elastic Transcoder

Proxy / Thumbnail

Generation

DAM

Web Service

CloudFront

Download

Distribution

DAM

Catalog

EC2 Crawler

Amazon SNS Topic

Amazon S3 Storage

For Source,

Renditions, Metadata

Sidecar Files

SQS Queue

Rendition Jobs

SQS Queue

Metadata

Processing

Jobs Metadata

Workers

EC2 ASG

Rendition

Workers

Amazon

CloudSearch

EC2 ASG

Media

Content

AWS Elastic

Beanstalk

Elastic Transcoder

Proxy / Thumbnail

Generation

DAM

Web Service

CloudFront

Download

Distribution

EC2 Crawler

Amazon

DynamoDB

DAM

Catalog

Amazon SNS Topic

Amazon S3 Storage

For Source,

Renditions, Metadata

Sidecar Files

SQS Queue

Rendition Jobs

SQS Queue

Metadata

Processing

Jobs Metadata

Workers

EC2 ASG

Rendition

Workers

Amazon

CloudSearch

EC2 ASG

Media

Content

AWS Elastic

Beanstalk

Elastic Transcoder

Proxy / Thumbnail

Generation

DAM

Web Service

CloudFront

Download

Distribution

EC2 Crawler

Amazon

DynamoDB

DAM

Catalog

Amazon SNS Topic

Amazon S3 Storage

For Source,

Renditions, Metadata

Sidecar Files

SQS Queue

Rendition Jobs

SQS Queue

Metadata

Processing

Jobs Metadata

Workers

EC2 ASG

Rendition

Workers

Amazon

CloudSearch

EC2 ASG

Media

Content

AWS Elastic

Beanstalk

Elastic Transcoder

Proxy / Thumbnail

Generation

DAM

Web Service

CloudFront

Download

Distribution

EC2 Crawler

Amazon

DynamoDB

DAM

Catalog

1. Ingest, Crawl, Notify

a. End user initiates data copy

b. EC2 worker scans Amazon S3 staging bucket

c. EC2 worker copies or moves content

d. EC2 worker broadcasts “NEW DATA” event

(SNS)

2. Metadata Extraction

a. EC2 worker polls inbox (SQS)

b. EC2 worker pulls down media asset from Amazon S3

c. EC2 worker parses media files

d. EC2 worker pumps metadata through ETL flow to prepare for catalog insertion

e. EC2 worker inserts into catalog (Amazon DynamoDB)

Amazon SNS Topic

Amazon S3 Storage

For Source,

Renditions, Metadata

Sidecar Files

SQS Queue

Rendition Jobs

SQS Queue

Metadata

Processing

Jobs Metadata

Workers

EC2 ASG

Rendition

Workers

Amazon

CloudSearch

EC2 ASG

Media

Content

AWS Elastic

Beanstalk

Elastic Transcoder

Proxy / Thumbnail

Generation

DAM

Web Service

CloudFront

Download

Distribution

EC2 Crawler

Amazon

DynamoDB

DAM

Catalog

Amazon SNS Topic

Amazon S3 Storage

For Source,

Renditions, Metadata

Sidecar Files

SQS Queue

Rendition Jobs

SQS Queue

Metadata

Processing

Jobs Metadata

Workers

EC2 ASG

Rendition

Workers

Amazon

CloudSearch

EC2 ASG

Media

Content

AWS Elastic

Beanstalk

Elastic Transcoder

Proxy / Thumbnail

Generation

DAM

Web Service

CloudFront

Download

Distribution

EC2 Crawler

Amazon

DynamoDB

DAM

Catalog

2. Metadata Extraction

a. EC2 worker polls inbox (SQS)

b. EC2 worker pulls down media asset from Amazon S3

c. EC2 worker parses media files

d. EC2 worker pumps metadata through ETL flow to prepare for catalog insertion

e. EC2 worker inserts into catalog (Amazon DynamoDB)

Preparing for Amazon DynamoDB Insert

{

"COMPLETE_NAME" :

{ "S" : "01_01_SoccerF_05_A.mp4" },

"FORMAT" :

{ "S" : "MPEG-4" },

"CODEC_ID" :

{ "S" : "mp42" }

}

Model It and Deploy to EC2! (Talend)

3. Catalog Processing

a. Store metadata record in Amazon DynamoDB

b. Reflect searchable subset to Amazon

CloudSearch

c. Go crazy (HTTP GET)

Amazon SNS Topic

Amazon S3 Storage

For Source,

Renditions, Metadata

Sidecar Files

SQS Queue

Rendition Jobs

SQS Queue

Metadata

Processing

Jobs Metadata

Workers

EC2 ASG

Rendition

Workers

Amazon

CloudSearch

EC2 ASG

Media

Content

AWS Elastic

Beanstalk

Elastic Transcoder

Proxy / Thumbnail

Generation

DAM

Web Service

CloudFront

Download

Distribution

EC2 Crawler

Amazon

DynamoDB

DAM

Catalog

Amazon SNS Topic

Amazon S3 Storage

For Source,

Renditions, Metadata

Sidecar Files

SQS Queue

Rendition Jobs

SQS Queue

Metadata

Processing

Jobs Metadata

Workers

EC2 ASG

Rendition

Workers

Amazon

CloudSearch

EC2 ASG

Media

Content

AWS Elastic

Beanstalk

Elastic Transcoder

Proxy / Thumbnail

Generation

DAM

Web Service

CloudFront

Download

Distribution

EC2 Crawler

2

Amazon

DynamoDB

DAM

Catalog

1

Customer Case Study (PBS)

Merlin: PBS CMS/DAM

• Code name Merlin

• Structured metadata

• 200+ web object records daily

– 29,046 web objects

• 150+ Video objects daily

– 91,436 videos

• Users from over 150 stations 30 national producers

– Frontline

– Downton Abbey

– PBS Newshour

What’s It Do?

• Large multitenant system – 1200 registered users

• 250 million streams per month

• 20 million unique viewers

• 8 PB of video delivered monthly

Getting Data In

• 33 ingestible web feeds

– Content editors

– Web page listings

• Batch video ingest API

– Video content editors

– External workflow integration

• Manually entered videos

– Video content editors from all 50 states

– Number of user accounts

System Overview

Workflow

Service

Content

API

Amazon

S3 Amazon

RDS

Amazon

RDS

Search Util Amazon

CloudSearch

Amazon

SWF

DAM (Merlin)

RSS

Ingest

API

User

Input

CDN

Basic Workflow

• Object registered with Merlin

• Images registered and processed with ITS

– Stored in CDN fronted Amazon S3 bucket

• Videos registered with VTS

– Jobs sent to Zencoder for processing

– Video stored in CDN fronted Amazon S3 bucket

• Objects ready for clients

– Objects rendered for consumption in Amazon S3

– Objects registered with APIs

– Objects discoverable

Making It Discoverable

• Search util service

• Runs every hour

– Re-indexes last several hours each time

• Polls APIs

– Content API

– Modified time

• Updates Amazon CloudSearch index

– 2 primary indexes

Search Considerations

• Hidden objects

• Rights management

• Partitioned search – Local station search

– Results by geo

– Restrict results for international customers

• Unify and normalize existing APIs – Flatten data model

• Users looking for programs – Specific searches

– Suitable for structured data

Challenges

• No native time field – Convert dates to integers

– Epoch time

• Versioning of documents – Epoch for versioning

• Exposing two versions of most fields – Text searchable

– Facets (copy of text version)

Search Consumers (PBS.org)

Site Search

Search Consumers (Video Portal)

Site Search Programs A-Z

Xbox / OTT

Summary

Summary

• Build an enterprise-scale DAM platform now

– Managed storage and archive (Amazon S3, Amazon Glacier)

– Managed database for catalog processing (Amazon DynamoDB, Amazon

Relational Database Service [RDS])

– Managed search (CloudSearch)

• Application development accelerators

– Elastic Beanstalk harness (web, API, and worker roles)

– Reduced effort with the AWS CLI

• (Almost) fire and forget

AWS Marketplace Can Help

• AWS online software store – Customer can find, research, buy software

– Simple pricing, aligns with EC2 usage model

– 1-click launch in minutes

– Marketplace billing integrated into your AWS account

– 1,000+ products across 24 categories

• Digital asset management related options Include: – WebDAM – centralize, store, manage and distribute collateral

– Digital asset management cloud – web-based open source DAM

– Widen – manage and distribute digital media and brand assets with

user roles and permissions

– Adobe Experience Manager – unified asset management including

mobile

Learn more at: http://aws.amazon.com/marketplace

“DAM!”

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

MED-402 Building a Scalable Video / DAM Solution in the Cloud

top related