Transcript
2
©2015 All rights reserved.
• Experience Director, Content Strategy; Razorfish New York
• Co-editor of scatter/gather, a content strategy blog: http://scattergather.razorfish.com
• Author of Nimble: A Razorfish Report on Publishing in the Digital Age (June 2010): http://nimble.razorfish.com
• Twitter: @rlovinger
13
©2015 All rights reserved.
Metadata = Context
Context enables Connections
How does one convey that in a concise and powerful way?
16
Tweet and photo by Erin Kissane, Tumblr by Austin Kleon
429 notes
82 retweets
19
Content Strategy for Mobile by Karen McGrane
21
• Nearly 60,000 files archived
• Mostly from 1980-1995
• Collected and curated since 1998
• Almost no metadata
Textfiles.com
23
Metadata Skeptic transformed into… Metadata Warrior
Photos by Jason Scott and Rachel Lovinger
29
©2015 All rights reserved.
Semi-structured information allowed us to map the files to content types and site sections, and add some metadata (author, published date, keywords, etc.)
10 years x 50 issues per yearx 100 files per issue (approx.)
50,000 estimated articles
31
©2015 All rights reserved.
For the content already in the CMS, keywords had been manually typed in by authors
• 6790 “different” keywords
• Removed 12% during clean up
• Typos
• Redundant
• Not Useful
33
©2015 All rights reserved.
• Star Wars: Episode I -- The Phantom Menace• Episode 1• Episode I• Phantom Menace• Star Wars Episode I The Phantom Menace• Star Wars Episode I: The Phantom Menace• Star Wars prequel• Star Wars: Episode 1 -- The Phantom Menace• Star Wars: Episode i -- the Phantom Menace• Star Wars: Episode I: The Phantom Menace• Star Wars: Episode I--The Phantom Menace• Star Wars: Episode I--The Phantom Menance• Star Wars: Episode One -- The Phantom Menace• Star Wars: The Phantom Menace• Star Wars: The Phantom Menace -- Episode I• The Phantom Menace• The Phanton Menace
35
©2015 All rights reserved.
• TAFKAP?
• The Artist• Artist Formerly Known as Prince• The Artist Formerly Known As Prince• The Artist formerly known as Prince• the Artist Formerly Known as Prince• The Artist Formerly Known as Prince (PKA)
37
©2015 All rights reserved.
• The magazine was once a week
• The website published new articles several times a day
• Plus: Over 50,000 past articles!
• How could we better use all that content?
38
©2015 All rights reserved.
If you like James Bond, we wanted it to be easy for you to discover everything we had.
Cover Story
Interview
Photo Gallery
Etc.
41
©2015 All rights reserved.
We put our controlled vocabulary into categories, to make them more distinct and meaningful.
For example:
• Book > Product > Harry Potter and the Goblet of Fire
• Movie > Product > Harry Potter and the Goblet of Fire
• Person > Individual > Daniel Radcliffe
• Person > Individual > J.K. Rowling
43
• Relationships defined for each media type
• Managed separately from the article content
• The full set of metadata was available to all articles
44
©2015 All rights reserved.
• Standard relationships
• For example, for Movie:
- Lead Performers
- Director
- Writer
- Release Date
- EW Grade
- Etc.
• Select a related category for each relationship, as applicable
• Some allow multiple values
45
• Authors just selected the primary category
• Related metadata pulled in automatically
• Updates appeared on all articles
*Metadata categories and relationships were managed by a dedicated data librarian
47
©2015 All rights reserved.
• “Best Results” linked directly to an aggregated page based on the category.
• For example:
- “Cats & Dogs” vs. “The Truth About Cats & Dogs”
- The Green Mile (Movie) vs. The Green Mile (Book)
49
• Wal-mart sold gallon jars of Vlasic pickles for $2.97.
• A popular item – priced so low it nearly put Vlasic out of business.
• By achieving their goals, they put themselves in a position they might not survive.
See: http://www.fastcompany.com/47593/wal-mart-you-dont-know
50
©2015 All rights reserved.
• We wanted people to discover older content, and they did!
• By 2006, we had 16 years of magazine and web content.
• Other Time Inc. publications were interested in using our categorization system, too.
52
Our webservers were optimized to
serve up the latest “issue” of content.
40% of Time Inc.’s database calls,
only 25% of the total traffic
54
©2015 All rights reserved.
The creator of Freebase (a semi-semantic UGC site for structured content, now read-only) said EW.com was way ahead of its time.
58
“The hardest part of [recording] history is to be there when it happens.”
Photo by Rachel Lovinger
61
“What happened to my web page on my husband, Bob Champine, that took me many years to put together on his career and which meant a lot to me and to the aviation community. I noticed with 9.0 I lost the left margin and the picture of him exiting the X-1. I need to restore it to the internet as it is history. Please tell me what to do. I will be glad to retype it, I just don’t want it lost to the world. I need help. Gloria Champine”
62
Illustration from “Fire in the Library,” MIT Technology Review
63
“Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever.”
72
• In 6 months Archive Team saved 900 Gb
• Estimated 4-5 Tb total
• Other people saved additional pages, but probably ¼ is gone forever
• For many people, Geocities was their first web presence
76
Those screenshots were automatically generated from Geocities sites rescued by Archive Team in 2009
See more at One Terabyte of Kilobyte Age Photo Op:http://oneterabyteofkilobyteage.tumblr.com/
77
Due to lack of metadata:
• The rescued data was less useful • Really bulky files
• Case-sensitive filenames difficult to access and read
• Not in a web-ready format (WARC)
• The process was less efficient and more error prone• Poor tracking of completed activity
• Lots of duplication of data
• Took way too long (6 months vs. 3 days)
• Could have gotten all the data in a month (estimated)
79
©2015 All rights reserved.
Mission: The Internet Archive’s purposes include offering permanent accessfor researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format.
Photo by Ulf Benjaminsson
83
Save the history before it's lost forever
Offer permanent access to historical collections that exist in
digital format
84
©2015 All rights reserved.
Internet Archive contains: web pages, texts, videos, audio files, software, and images. (Plus concerts and collections)
• Media Type makes it Readable or Playable
• Emulator (for software) makes it Executable
• Subject Keywords makes it Findable
86
©2015 All rights reserved.
• Is it Accurate?
• Is it Credible?
• What is the Source? (machines or people)
• It’s a lot of Effort. Do we have enough people and time?
92
©2015 All rights reserved.
• For user-generated content, it’s just easier for people not to.
• Internet Archive will never have enough people on staff to do it properly.
93
Crowdsource manual creation of metadata
Photo by Pascal
94
• Small a pool of volunteers, and their drive didn’t last long
• Tools didn’t provide immediate feedback/satisfaction. They had to email their inputs and wait.
Photo by psyberartist
95
• 10 most common words + 10 most common 2-word phrases
• Applied to 200,000 items
• Much more scalable
• Heavily machine assisted: a person can validate data and create collections
Photo by James St. John
98
Topics:switch, atari, antenna, game, cable, terminals, console, television, video, program, power supply, console unit, video computer, game program, computer system, atari game, power switch, switch box, atarivideo, screw terminals
99
Having the stuff is vital, the most important thing. But it’s also vital to have a system by which these things are described.
“If a person can’t get the information they need, then we’re failing.”
Photo by Rachel Lovinger
101
• Jason had converted to a metadata advocate
But I realized that…
• Content strategists who care about the long game should think like historians, archivists and futurists, too.
103
• Dutch leader in academic research and education on biodiversity and taxonomy.
• Has a collection of 37 million natural history objects.
104
Describe, understand and explore biodiversity for human wellbeing and the future of our planet.
They do this with:
• Accessible collections
• Contributions to global scientific research
• Awe of natural history
• Openly shared knowledge
105
• From 2010 to June 2015• 250 staff members & 450 volunteers• Digitizing 7 million objects in detail• Adding metadata for the other 30 million objects
106
• Information is more easily discovered, studied, and used.
• Scientists worldwide can access it directly online, without assistance.
• Some of this data has never been available in digital form before.
107
• Scientific name
• Where it was found
• When it was found
• Who found it
“Objects [in the collection] have no scientific value without this information.” - Suzanne de Jong-Kole
111
• Vele Handen = Many Hands
• People helped transcribe hand written labels
• In 9 months, people did 200,000, of which about half were usable.
112
The person who collected the specimen wrote the metadata on the label.
This could be a professional researcher, or a non-professional enthusiast.
115
When they wrote this metadata, they had no idea that nearly half a millennium later people would be “digitizing” it.
116
©2015 All rights reserved.
The ‘love note’ is when you behave selflessly for a partner – or customer –that doesn’t exist yet.
A drawing Jason drew in my notebook in high
school, 20+ years before we ever dated.
top related