Metadata Quality Assurance in the DLESE Community Collection
Jan 01, 2016
2
2
DLESE Community Collection
• Initial DLESE collection, continues to grow
• Approx 4200 items• Public cataloging tool but majority
of items from “known” sources and funded catalogers
3
3
Distribution of Cataloging
DPC5%
AGI64%
MSU19%
Other Community
12%
I
Data estimated as of June 2003, little change over last year
4
4
Quality Assurance Measures – Four stages
• Catalog system provides feedback on duplicate and similar entires
• Every record is reviewed by a person for metadata completeness and quality
• Additional technical checks for vocabulary and required metadata completeness
• Regular, periodic checks for URL viability, syntax and duplication/mirrors
5
5
DLESE Catalog System
• Disallows exact duplicate URL’s
• Provides list of similar URL’s in all stages of submission for decision to catalog or not• Discourage overlapping records
6
6
Human-mediated checks -1
• URL functional • Appropriate URL is cataloged
(granularity and duplication)• Written description aligns with
content at site• Complete sentences, spelling• Avoid repeating redundant
information (-: (technical info, creator)
7
7
Human-mediated checks - 2
• Required metadata is present; review resource and add or amend to follow best practices
• Controlled vocabularies properly assigned- resource type, technical
• Suggested metadata reviewed for accuracy, if present• Keywords• Relation• Coverage• Standards
8
8
Pre-accessioning technical checks
• URL viability checked• Check for missing required metadata
and proper vocabularies.• Coverage errors are flagged, though
some require a move to special directory for edit and subsequent accessioning (crossing the date line)
• Upon accessioning, additional check for duplicate ID numbers and duplicate resource content
9
9
Post-accessioning, ongoing checks
• Linkchecking 2x a day, reports issued twice a week or on demand
• Provides report on resource and relation URLs, indicating error type
• “Vitality” over time (too low is <50% available over 6 previous days)
• Duplication of URL or content (catches mirrors) and mirror URL differs from primary URL alerts
• Email syntax
10
10
Actions taken
• Email syntax and permanent redirects fixed
• Duplications investigated• “Vitality too low” group receive further
investigation to repair
11
11
“Vitality too low” = broken link
• First try to sleuth out new URL and fix it
• If unsuccessful, send email to creator/contact inquiring about status
• If creator replies, fix as indicated• If no reply, remove from discovery
but don’t delete• <1% of DCC collection is “broken”
at any given time
12
12
Ongoing development
• New DCS will support• multiple frameworks (ADN, collection,
anno)• more front-end quality controls; spell
check, completeness notification during cataloging
• Suggest-a-URL to replace full public cataloging
• Ongoing cataloging training and discussion with regular catalogers