Top Banner
METADATA MATTERS Metadata and Taxonomies for Organizing your Content - April 29, 2015
68
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Metadata matters

METADATA MATTERSMetadata and Taxonomies for Organizingyour Content - April 29, 2015

Page 2: Metadata matters

ABOUT AIIM▪ AIIM (Association for Information and Image Management) is the

global community of information professionals. Our mission is to help you and your organization survive and thrive in this era of Information Chaos by solving these 4 key business problems:▪ How do we manage the risk of growing volumes of content?

▪ How do we automate our content-intensive business processes?

▪ How do we use content to better engage and collaborate?

▪ How do we gain business insight from all of this information?

▪ www.aiim.org

29-A

pr-1

2015

Pre

cisio

n Co

nten

t Aut

horin

g So

lutio

ns In

c.

2

Page 3: Metadata matters

ABOUT AIIM TORONTO

29-A

pr-1

2015

Pre

cisio

n Co

nten

t Aut

horin

g So

lutio

ns In

c.

3

▪ The First Canadian Chapter services▪ Toronto

▪ Montreal, and

▪ Ottawa

▪ Brings together members for education and networking

▪ Looking for volunteers to help with running the chapter

Page 4: Metadata matters

ABOUT YOUR PRESENTER

29-A

pr-1

2015

Asc

an In

form

atio

n Ar

chite

cts L

imite

d

4

▪ Rob Hanna, ECMs▪ President of Precision Content Authoring Solutions

Inc. and a director of AIIM First Canadian Chapter

▪ Expert in structured authoring and content management practices and technology

▪ Instructor at the University of Toronto School of Continuing Studies – Metadata and Controlled Vocabularies

Page 5: Metadata matters

WHAT IS METADATA?And how does it relate to content?

Page 6: Metadata matters

WHAT IS CONTENT?

Data Information

ContentKnowledge

Page 7: Metadata matters

METADATA DEFINED▪ Coined in the 1960’s by Jack Myers

▪ Data about Data

▪ Stuff about Stuff

▪ Essential properties stored within the content or external to the content that identify and define context, history, and management of the content

Page 8: Metadata matters

MetadataMETADATA IS INFORMATION ABOUT A RESOURCE

Page 9: Metadata matters

APPLICATION OF METADATA▪ Metadata is

▪ applied to all structured and unstructured content in a corpus

▪ visible to the user or it can be hidden from view

▪ both machine-driven and manually entered

▪ internal or external to the content

▪ mandatory, optional, or conditional

Page 10: Metadata matters

MANY FORMS OF METADATA▪ Corporate metadata is structured data about content

▪ Metadata is relational or hierarchical

▪ Metadata may take the form of▪ Rich-text or binary

▪ Plain-text

▪ Controlled values/pick-lists/lookup values

▪ Syntax encoded values▪ date/time (e.g., yyyy-mm-dd hh:mm:ss)▪ financial ($0.00, -$0.00)▪ numeric - integer/floating values (#,###)▪ boolean (true/false)▪ special (phone numbers, postal codes, or social insurance numbers)

Metadata

Page 11: Metadata matters

MANY ROLES OF METADATA▪ The primary role of metadata is to facilitate the identification, retrieval,

and processing of content in any media.

▪ Secondarily, metadata may also▪ appear as content to the content consumer, and

▪ serve as corporate structured data for analysis and business intelligence.

Metadata

Page 12: Metadata matters

METADATA IS THE SOUP CANContent is the soup

29-A

pr-1

2015

Pre

cisio

n Co

nten

t Aut

horin

g So

lutio

ns In

c.

12

Page 13: Metadata matters

METADATA ISN’T THE MESSAGE

▪ Twitter post(118 chars)

▪ Twitter status message metadata (1,938 chars)

{"id"=>12296272736"text"=>"An early look at Annotations:http://groups.google.com/group/twitter-api-announce/browse_thread/thread/fa5da2608865453","created at"=>"Fri Apr 16 17:55:46 +0000 2010","in_reply_to_user_id"=>nil,"in_reply_to_screen_name"=>nil,"in_reply_to_status_id"=>nil,"favorited"=>false,"truncated"=>false,"user"=>{"id"=>6253282,

"screen_name"=>"twitterapi""name"=>"Twitter API","description"=>"The Real Twitter API. I tweet about API changes, service issues and happily answer questions about

Twitter and our API. Don't qet an answer? It's on my website.","url"=>"http://apiwiki.twitter.com","location"=>"San Francisco, CA","profile_background_color"=>"cldfee","profile_background_image_url"=>"http://a3.twimg.com/profile_background_images/59931895/twitterapi-background-new.png ","profile_background_tile"=>false,"profile_image_url"=>"http://a3.twimg.com/profile_images/689684365/api_normal.png","profile_link_color"=>"0000ff","profile_sidebar_border_color"=>"87bc44","profile_sidebar_fill_color"=>"e0ff92","profile_text_color"=>"000000","created_at"=>"Wed May 23 06:01:13 +0000 2007","contributors_enabled"=>true,"favourites_count"=>1"statuses_count"=>1628"friends_count"=>13"time_zone"=>"Pacific Time (US & Canada)","utc_offset"=>-28800,"lang"=>"en","protected"=>false,"followers_count"=>100581,"geo_enabled"=>true,"notifications"=>false,"following"=>true"verified"=>true}"contributors"=>[3191321]"geo"=>nil"coordinates"=>nil"place"=>{"id"=>"2b6ff8c22edd9576",

"url"=>"http ://api.twitter.com/1/geo/id/2b6ff8c22ed9576.json","name">"SoMa","full_name"=>"SoMa, San Francisco","place_type"=>"neighborhood","country_code"=>"US","country "=>"The United States of America","bounding_box"=>{"coordinates"=>[[[-122.42284884, 37.76893497],

[-122 .3964, 37.76893497],[-122.3964, 37.78752897],[-122.42284884, 37.78752897]]],

"type"=>"Polygon"}},"source"=> "web"}

An early look at Annotations:http://groups.google.com/group/twitter-api-

announce/browse_thread/thread/fa5da2608865453

Page 14: Metadata matters

WHY METADATA MATTERSCollection and use of metadata has been known to be controversial when viewed out of context of the content it carries.

Electronic Frontier Foundation30 December 2013

Metadata Importance of Metadata▪ They know you rang a phone sex service at 2:24 am and spoke for 18 minutes. But they don’t know what you talked about.

▪ They know you called the suicide prevention hotline from the Golden Gate Bridge. But the topic of the call remains a secret.

▪ They know you spoke with an HIV testing service, then your doctor, then your health insurance company in the same hour. But they don’t know what was discussed

Page 15: Metadata matters

TYPES OF METADATALibrary of Congress states that metadata consists of• Descriptive Metadata• Administrative Metadata, and• Structural Metadata

29-A

pr-1

2015

Pre

cisio

n Co

nten

t Aut

horin

g So

lutio

ns In

c.

15

Page 16: Metadata matters

DESCRIPTIVE METADATAAnd how it is applied through classification

Page 17: Metadata matters

▪ Classification is the ordering of entities (things or concepts) into groups or classes on the basis of their similarity

▪ an activity that we do everyday

▪ metadata and controlled vocabularies are tools that can be used for classification

THINKING ABOUT CLASSIFICATION

Page 18: Metadata matters

analyst brake market staplerseat traders alternator investorcalculators scissors engine pedal

dashboard pen backers markertape profit starter ruler prospects

THINKING ABOUT CLASSIFICATIONHow many words can you memorize in 20 seconds?

Page 19: Metadata matters

analyst brake market stapler

dashboard pen backer marker

seat trader alternator investor

pedalcalculator scissors engine

tape profit starter ruler prospect

THINKING ABOUT CLASSIFICATION1. Filter out all of the noise

Page 20: Metadata matters

analyst brake market stapler

dashboard pen

backer

marker

seat

trader

alternator

investor

pedalcalculator scissors engine

tapeprofitstarter

ruler prospect

THINKING ABOUT CLASSIFICATION2. Break into smaller groupings

Page 21: Metadata matters

dashboardalternator pedalbrake seatengine starter

marker

staplerscissorstape

pen calculatorruler

analyst market backer

investortraderprofitprospect

THINKING ABOUT CLASSIFICATION3. Organize words by similarities

Page 22: Metadata matters

dashboardalternator pedalbrake seatengine starter

marker

staplerscissorstape

pen calculatorruler

analyst market backer

investortraderprofitprospect

Stock market Office supplies

Car parts

THINKING ABOUT CLASSIFICATION4. Classify and label groups

Page 23: Metadata matters

THINKING ABOUT CLASSIFICATION

Stock market Office supplies Car partsanalyst stapler brakemarket calculator seattrader scissors dashboard

investor pen enginebacker marker alternatorprofit tape starter

prospect ruler pedal

How well did you do?

Page 24: Metadata matters

THINKING ABOUT CLASSIFICATION

Vegetables Computer parts Instrumentspeas hard drive violin

endive sound card harpcarrots monitor pianospinach mouse trumpetcelery processor cello

broccoli flash drive flutetomato keyboard guitar

Now how many words can youmemorize in 20 seconds?

Page 25: Metadata matters

CONTROLLED VOCABULARIES▪ Some metadata requires a classification, controlled list of values or terms to

define it, for example:▪ Film rating: G, PG, 14A, 18A, R, A▪ Ebay seller location:

▪ Control is exercised over modifications to the list

Page 26: Metadata matters

Controlled vocabularies defined▪ A list of terms▪ All terms in a controlled vocabulary must

have an unambiguous, non-redundant definition. (Source: ANSI/NISO Z39.19-2005)

Controlled VocabulariesWhat is a controlled vocabulary?Why use controlled vocabularies?Types of controlled vocabularies

Page 27: Metadata matters

BRIDGING BOUNDARIES -WHICH TERM IS “RIGHT”?

Accessible parking spaces

Accessible permit parking

Disabled permit parking

Designated disabled parking spaces

Handicapped parking

Disabled parking spaces

Page 28: Metadata matters

TOWARDS A COMMON VOCABULARY

Accessible parking spaces

Accessible permit parking

Disabled permit parking

Designated disabled parking spaces

Handicapped parking

Disabled parking spaces

Page 29: Metadata matters

CARD SORTINGTechniques for developing controlled vocabularies

Page 30: Metadata matters

MANAGING CONTROLLED VOCABULARIES

Page 31: Metadata matters

TYPES OF CLASSIFICATION SCHEMES▪ Subject

▪ Identify content topics

▪ Organization Structure▪ Depicts business units

▪ Functional▪ Defined by business processes

Page 32: Metadata matters

SUBJECT TAXONOMIES▪ Describes the topic of the resource

▪ Structured from broad to narrow / general to specific

▪ Often stable over time

Page 33: Metadata matters

SUBJECT CLASSIFICATION

Source: http://popchartlab.com/products/the-very-very-many-varieties-of-beer

Page 34: Metadata matters

ORGANIZATION CLASSIFICATION▪ Shows business unit relationships

▪ Can be used to identify:▪ Ownership of content

▪ Maintenance responsibilities

▪ A person’s place in the organization

▪ Often change frequently

Page 35: Metadata matters

ORGANIZATIONAL CLASSIFICATION

Page 36: Metadata matters

FUNCTIONAL CLASSIFICATION▪ Describes the breakdown of business processes

▪ Function – Activity - Task

▪ Stable in nature unless new processes or functions are introduced

Taxonomy

Page 37: Metadata matters

FUNCTIONAL CLASSIFICATION

Source: http://www.iskouk.org/conf2009/papers/milne_ISKOUK2009.pdf

Page 38: Metadata matters

TAXONOMIES▪ Types of taxonomies

▪ Lists

▪ Trees

▪ Hierarchies and polyhierarchies

▪ Matricies, and

▪ System maps

Page 39: Metadata matters

TAXONOMY TYPES▪ List style taxonomy

Page 40: Metadata matters

TAXONOMY TYPES▪ Simple tree style taxonomy

Taxonomy Types

Page 41: Metadata matters

TAXONOMY TYPES▪ Classical hierarchical style

taxonomy

Page 42: Metadata matters

TAXONOMY TYPES▪ Polyhierarchical style taxonomy

Page 43: Metadata matters

TAXONOMY TYPES▪ Matrix style taxonomy

▪ With 3 facets

Page 44: Metadata matters

TAXONOMY TYPES▪ System map style taxonomy

Page 45: Metadata matters

ADMINISTRATIVE METADATAFor managing the content

Page 46: Metadata matters

ADMINISTRATIVE METADATA▪ Information about the metadata record itself – its creation,

modification, relationship to other records, etc. ▪ Audit trails may capture the date and time when a file’s title was changed.

▪ Common subsets of administrative metadata are:▪ Rights Management: metadata that deals with intellectual property rights

▪ Preservation: information needed to archive / preserve a resource

Source: Understanding Metadata – NISO 2004

Page 47: Metadata matters

SEPARATION OF STATUS METADATA▪ Much of the administrative metadata is applied automatically by the

system

▪ Other administrative metadata may live with the workflow rather than the record itself

29-A

pr-1

2015

Pre

cisio

n Co

nten

t Aut

horin

g So

lutio

ns In

c.

47

Page 48: Metadata matters

STRUCTURAL METADATADefining the structure of a resource

Page 49: Metadata matters

ABOUT STRUCTURAL METADATA▪ Describe the structure of a resource

▪ Book

▪ Document

▪ Website

▪ Table of contents

▪ Site map

▪ Internal structure

Page 51: Metadata matters

XML IS EVERYWHEREXML defines meaningful data structures for documents and data. It is a human-readable file format used to power

• manufacturing assembly lines

• medical devices

• military applications, and

• many other things.

XML is the language of the Web. It enables smart phones and web browsers.

29-A

pr-1

2015

Pre

cisio

n Co

nten

t Aut

horin

g So

lutio

ns In

c.

51

Page 52: Metadata matters

WHAT ARE MARKUP LANGUAGES?▪ pre-date desktop publishing and the Internet

▪ tell computers how to handle data▪ such as how to render electronic content on a page

▪ categorized as either▪ presentation, or

▪ semantic markup

Page 53: Metadata matters

PRESENTATION MARKUP▪ With electronic presentation markup, we markup the

paragraph and italicize the citation for publication

▪ This is typical of web pages using hypertext markup (HTML)

The Cancer Journal: The Journal of Principles & Practice of Oncology provides an integrated view of modern oncology across all disciplines.

<p><i>The Cancer Journal: The Journal of Principles & Practice of Oncology</i> provides an integrated view of modern oncology across <i>all</i> disciplines.</p>

The Cancer Journal: The Journal of Principles & Practice of Oncology provides an integrated view of modern oncology across all disciplines.

Page 54: Metadata matters

SEMANTIC MARKUP▪ With semantic markup, we markup the content to describe the meaning

of the text

▪ Publishing stylesheets interpret the meaning from the markup and apply appropriate styles specific to the publishing context

The Cancer Journal: The Journal of Principles & Practice of Oncology provides an integrated view of modern oncology across all disciplines.

<intro><cite>The Cancer Journal: The Journal of Principles & Practice of Oncology</cite> provides an integrated view of modern oncology across <em>all</em> disciplines.</intro>

The Cancer Journal: The Journal of Principles & Practice of Oncology provides an integrated view of modern oncology across all disciplines.The Cancer Journal: The Journal of Principles & Practice of Oncology provides an integrated view of modern oncology across all disciplines.

Page 55: Metadata matters

SEMANTIC MARKUP▪ Using semantic markup, we

can▪ disambiguate content▪ search based on meaning▪ connect to other content, and▪ reuse or substitute new text.

Page 56: Metadata matters

MULTI-CHANNEL PUBLISHING

▪ Supports complex, multi-channel publishing to many common output formats

▪ Add new formats or styles easily

?

Page 57: Metadata matters

INTELLIGENT CONTENT▪ Content that is

▪ not limited to one▪ purpose▪ technology, or▪ output

▪ structurally rich and semantically aware, making it▪ discoverable▪ reusable▪ reconfigurable, and▪ adaptable.

Page 58: Metadata matters

INTEROPERABILITY OF METADATADemonstration

Page 59: Metadata matters

Communicating the benefits

Demonstrating interoperability with business examples

Keywords Fort York; children, soldier, history

Creator Jose San Juan

Asset Credit City of Toronto

HeadlineA British soldier in historical red uniform salutes children at Fort York

Page 60: Metadata matters

Communicating the benefits

Demonstrate reuse with business examples

write Headline once using DAL or Adobe CS: “A British soldier in historical red uniform salutes children at Fort York”

Reuse Headline during design, as alt-tag for screen readers (to comply with AODA)

Reuse Headlineto search for files in DAL

Page 61: Metadata matters

USE OF STANDARDSWhy are they important?

Page 62: Metadata matters

“Let me tell you how dangerous it is to design a classification scheme. It’s very dangerous. I have

suffered.

People attribute all kinds of motives to you. Apart from that, if anything goes wrong, they will pounce upon

you.”

– Melvil Dewey

Page 63: Metadata matters

Dublin Core Metadata StandardInternational Press Telecommunications Council – Photo MetadataAdobe XMP – Extensible Metadata PlatformRules for Archival Description

Page 64: Metadata matters

DUBLIN CORE

▪ maintains a vocabulary of metadata properties and encoding schemes

▪ core set of 15 properties for use in describing resources:

Metadata

ContributorCoverageCreatorDateDescription

FormatIdentifierLanguagePublisherRelation

RightsSourceSubjectTitleType

Page 65: Metadata matters

ISO METADATA STANDARDS▪ ISO 23081 – Metadata for Records

▪ Recommendations for metadata required to manage records▪ Metadata about the record itself▪ Metadata about the business rules or policies and mandates▪ Metadata about the agents▪ Metadata about business activities or processes▪ Metadata about records management processes

Metadata

Page 66: Metadata matters

ISO 2788 – DEVELOPMENT OF MONOLINGUAL THESAURI• Latest edition published in 1986

• Media- and Language-Agnostic

• Applicable across both broad and narrow subject areas and describes how to deal with multiple domains

• Intended to ensure consistency of practice across different agencies

• Provides recommendations rather than mandatory instructions

• Outlines optional procedures for many special cases where a standard approach may not be applicable

Thesaurus

Page 67: Metadata matters

QUESTIONS?Rob HannaContact me through• www.linkedin.com/in/singlesourceror• [email protected]

Page 68: Metadata matters

WHO IS PRECISION CONTENT AUTHORING SOLUTIONS INC.?▪ We help organizations across North America make their information

easier to use

▪ Our solutions consist of ▪ Content strategy

▪ Detailed information architecture

▪ Content lifecycle design and development

▪ Turn-key content transformation

▪ Tools selection and development

▪ Multi-channel publishing

▪ www.precisioncontent.com

29-A

pr-1

2015

Pre

cisio

n Co

nten

t Aut

horin

g So

lutio

ns In

c.

68