Transcript

Rachel Lovinger @rlovinger

Confab, 22 May, 2015

Image via Bond

2

©2015 All rights reserved.

• Experience Director, Content Strategy; Razorfish New York

• Co-editor of scatter/gather, a content strategy blog: http://scattergather.razorfish.com

• Author of Nimble: A Razorfish Report on Publishing in the Digital Age (June 2010): http://nimble.razorfish.com

• Twitter: @rlovinger

4

5

6

7

8

9

10

11

©2015 All rights reserved.

is

HARDCORE

12

©2015 All rights reserved.

2006

2009

2008

2012

2011

2010

13

©2015 All rights reserved.

Metadata = Context

Context enables Connections

How does one convey that in a concise and powerful way?

14

Photo by Jesse Chan-Norris

Metadata Is A

Love note

To the Future

16

Tweet and photo by Erin Kissane, Tumblr by Austin Kleon

429 notes

82 retweets

18

Photo by Rachel Lovinger

21

• Nearly 60,000 files archived

• Mostly from 1980-1995

• Collected and curated since 1998

• Almost no metadata

Textfiles.com

22

Who needs a database?

23

Metadata Skeptic transformed into… Metadata Warrior

Photos by Jason Scott and Rachel Lovinger

24

Photo by Rachel Lovinger

25

• Me?

Photo by Rachel Lovinger

ENTERTAINMENT WEEKLY

Metadata for Journalism Products

27

©2015 All rights reserved.

~3 years online content ~10 years magazine content

28

©2015 All rights reserved.

Imported from text files to CMS

29

©2015 All rights reserved.

Semi-structured information allowed us to map the files to content types and site sections, and add some metadata (author, published date, keywords, etc.)

10 years x 50 issues per yearx 100 files per issue (approx.)

50,000 estimated articles

30

©2015 All rights reserved.

Once in the CMS, we could add photos, links, formatting, etc.

31

©2015 All rights reserved.

For the content already in the CMS, keywords had been manually typed in by authors

• 6790 “different” keywords

• Removed 12% during clean up

• Typos

• Redundant

• Not Useful

33

©2015 All rights reserved.

• Star Wars: Episode I -- The Phantom Menace• Episode 1• Episode I• Phantom Menace• Star Wars Episode I The Phantom Menace• Star Wars Episode I: The Phantom Menace• Star Wars prequel• Star Wars: Episode 1 -- The Phantom Menace• Star Wars: Episode i -- the Phantom Menace• Star Wars: Episode I: The Phantom Menace• Star Wars: Episode I--The Phantom Menace• Star Wars: Episode I--The Phantom Menance• Star Wars: Episode One -- The Phantom Menace• Star Wars: The Phantom Menace• Star Wars: The Phantom Menace -- Episode I• The Phantom Menace• The Phanton Menace

34

©2015 All rights reserved.

• TAFKAP?

35

©2015 All rights reserved.

• TAFKAP?

• The Artist• Artist Formerly Known as Prince• The Artist Formerly Known As Prince• The Artist formerly known as Prince• the Artist Formerly Known as Prince• The Artist Formerly Known as Prince (PKA)

37

©2015 All rights reserved.

• The magazine was once a week

• The website published new articles several times a day

• Plus: Over 50,000 past articles!

• How could we better use all that content?

38

©2015 All rights reserved.

If you like James Bond, we wanted it to be easy for you to discover everything we had.

Cover Story

Interview

Photo Gallery

Etc.

39

Entertainment Weekly

Journalism

IMDb-like

Information

40

41

©2015 All rights reserved.

We put our controlled vocabulary into categories, to make them more distinct and meaningful.

For example:

• Book > Product > Harry Potter and the Goblet of Fire

• Movie > Product > Harry Potter and the Goblet of Fire

• Person > Individual > Daniel Radcliffe

• Person > Individual > J.K. Rowling

42

Capsule

Move

Review

Preview

Move ReviewDVD Review

43

• Relationships defined for each media type

• Managed separately from the article content

• The full set of metadata was available to all articles

44

©2015 All rights reserved.

• Standard relationships

• For example, for Movie:

- Lead Performers

- Director

- Writer

- Release Date

- EW Grade

- Etc.

• Select a related category for each relationship, as applicable

• Some allow multiple values

45

• Authors just selected the primary category

• Related metadata pulled in automatically

• Updates appeared on all articles

*Metadata categories and relationships were managed by a dedicated data librarian

46

47

©2015 All rights reserved.

• “Best Results” linked directly to an aggregated page based on the category.

• For example:

- “Cats & Dogs” vs. “The Truth About Cats & Dogs”

- The Green Mile (Movie) vs. The Green Mile (Book)

49

• Wal-mart sold gallon jars of Vlasic pickles for $2.97.

• A popular item – priced so low it nearly put Vlasic out of business.

• By achieving their goals, they put themselves in a position they might not survive.

See: http://www.fastcompany.com/47593/wal-mart-you-dont-know

50

©2015 All rights reserved.

• We wanted people to discover older content, and they did!

• By 2006, we had 16 years of magazine and web content.

• Other Time Inc. publications were interested in using our categorization system, too.

51

Not well-suited for our expensive

and frequent database calls.

52

Our webservers were optimized to

serve up the latest “issue” of content.

40% of Time Inc.’s database calls,

only 25% of the total traffic

53

A 2007 redesign removed the “third column” entirely.

54

©2015 All rights reserved.

The creator of Freebase (a semi-semantic UGC site for structured content, now read-only) said EW.com was way ahead of its time.

METADATA WARRIOR

The making of a

57

Who needs a database?

58

“The hardest part of [recording] history is to be there when it happens.”

Photo by Rachel Lovinger

59

60

• An informal post on August 4th

• Notification sent out September 30th

• Shut down October 31st

61

“What happened to my web page on my husband, Bob Champine, that took me many years to put together on his career and which meant a lot to me and to the aviation community. I noticed with 9.0 I lost the left margin and the picture of him exiting the X-1. I need to restore it to the internet as it is history. Please tell me what to do. I will be glad to retype it, I just don’t want it lost to the world. I need help. Gloria Champine”

62

Illustration from “Fire in the Library,” MIT Technology Review

63

“Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever.”

64

65

66

67

68

69

70

71

72

• In 6 months Archive Team saved 900 Gb

• Estimated 4-5 Tb total

• Other people saved additional pages, but probably ¼ is gone forever

• For many people, Geocities was their first web presence

73

74

75

76

Those screenshots were automatically generated from Geocities sites rescued by Archive Team in 2009

See more at One Terabyte of Kilobyte Age Photo Op:http://oneterabyteofkilobyteage.tumblr.com/

77

Due to lack of metadata:

• The rescued data was less useful • Really bulky files

• Case-sensitive filenames difficult to access and read

• Not in a web-ready format (WARC)

• The process was less efficient and more error prone• Poor tracking of completed activity

• Lots of duplication of data

• Took way too long (6 months vs. 3 days)

• Could have gotten all the data in a month (estimated)

78

79

©2015 All rights reserved.

Mission: The Internet Archive’s purposes include offering permanent accessfor researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format.

Photo by Ulf Benjaminsson

80

81

82

83

Save the history before it's lost forever

Offer permanent access to historical collections that exist in

digital format

84

©2015 All rights reserved.

Internet Archive contains: web pages, texts, videos, audio files, software, and images. (Plus concerts and collections)

• Media Type makes it Readable or Playable

• Emulator (for software) makes it Executable

• Subject Keywords makes it Findable

86

©2015 All rights reserved.

• Is it Accurate?

• Is it Credible?

• What is the Source? (machines or people)

• It’s a lot of Effort. Do we have enough people and time?

88

©2015 All rights reserved.

Additional processing takes place, depending on the type

89

• Description and keywords are required, but open fields

• Other metadata is optional

90

91

• Metadata attributes determined by the community

92

©2015 All rights reserved.

• For user-generated content, it’s just easier for people not to.

• Internet Archive will never have enough people on staff to do it properly.

93

Crowdsource manual creation of metadata

Photo by Pascal

94

• Small a pool of volunteers, and their drive didn’t last long

• Tools didn’t provide immediate feedback/satisfaction. They had to email their inputs and wait.

Photo by psyberartist

95

• 10 most common words + 10 most common 2-word phrases

• Applied to 200,000 items

• Much more scalable

• Heavily machine assisted: a person can validate data and create collections

Photo by James St. John

96

97

“Controversial, but roughly as good as a bored intern.”

98

Topics:switch, atari, antenna, game, cable, terminals, console, television, video, program, power supply, console unit, video computer, game program, computer system, atari game, power switch, switch box, atarivideo, screw terminals

99

Having the stuff is vital, the most important thing. But it’s also vital to have a system by which these things are described.

“If a person can’t get the information they need, then we’re failing.”

Photo by Rachel Lovinger

101

• Jason had converted to a metadata advocate

But I realized that…

• Content strategists who care about the long game should think like historians, archivists and futurists, too.

NATURALIS BIODIVERSITY

CENTER

Metadata from the past

103

• Dutch leader in academic research and education on biodiversity and taxonomy.

• Has a collection of 37 million natural history objects.

104

Describe, understand and explore biodiversity for human wellbeing and the future of our planet.

They do this with:

• Accessible collections

• Contributions to global scientific research

• Awe of natural history

• Openly shared knowledge

105

• From 2010 to June 2015• 250 staff members & 450 volunteers• Digitizing 7 million objects in detail• Adding metadata for the other 30 million objects

106

• Information is more easily discovered, studied, and used.

• Scientists worldwide can access it directly online, without assistance.

• Some of this data has never been available in digital form before.

107

• Scientific name

• Where it was found

• When it was found

• Who found it

“Objects [in the collection] have no scientific value without this information.” - Suzanne de Jong-Kole

108

109

Employees enter data, verbatim, into the collection registration system.

110

This allows them to retrieve the physical specimen if requested.

111

• Vele Handen = Many Hands

• People helped transcribe hand written labels

• In 9 months, people did 200,000, of which about half were usable.

112

The person who collected the specimen wrote the metadata on the label.

This could be a professional researcher, or a non-professional enthusiast.

113

Darwin’s Finches

114

The oldest is this Spanish pepper from 1550!

115

When they wrote this metadata, they had no idea that nearly half a millennium later people would be “digitizing” it.

116

©2015 All rights reserved.

The ‘love note’ is when you behave selflessly for a partner – or customer –that doesn’t exist yet.

A drawing Jason drew in my notebook in high

school, 20+ years before we ever dated.

Rachel Lovinger @rlovinger

Image via Bond

top related