Top Banner
CLiMB - Columbia Universi ty 1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors Columbia University Funded by the Andrew W. Mellon Foundation 2002-2004
44

CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

Dec 26, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 1

Project CLiMB

Computational Linguistics for

Metadata Building

Using Computational Linguistic Techniques

to Harvest Image Descriptors

Columbia UniversityFunded by the Andrew W. Mellon Foundation

2002-2004

Page 2: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 2

Photograph courtesy of the Council of Industrial Design's Design Archive.

Page 3: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 3

Increase access to digital images

Page 4: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 4

CLiMB: Interdisciplinary Research at

Columbia University

• Libraries• Computer Science Department• Center for Research on Information Access

(CRIA)

Funded by the Andrew W. Mellon Foundation2002-2004

Page 5: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 5

CLiMB Project Members

Judith Klavans, PI

Stephen Davis

Angela Giral

Patricia Renfro

Bob Wolven

Roberta Blitz

Rebecca Passonneau

Veronika Horvath

David Elson

Page 6: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 6

Problems in Image Access

Traditional approach: labor intensiveexpensive

Page 7: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

7CLiMB - Columbia University

Project CLiMB

Help image catalogers provide subject access?

Harvest image descriptors

from existing literature?

Page 8: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 8

Can we harvest image descriptors?

Page 9: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 9

CLiMB will identify and extract• proper nouns• terms and phrases

from text related to an image:

By September 14, 1908, the basis of the Greenes' final design had been worked out. It featured a radically informal, V-shaped plan (that maintained the original angled porch) and interior volumes of various heights, all under a constantly changing roofline that echoed the rise and fall of the mountains behind it. The chimneys and foundation would be constructed of the sandstone boulders that comprised the local geology, and the exterior of the house would be sheathed in stained split-redwood shakes.

— Edward R. Bosley. Greene & Greene. London: Phaidon, 2000. p.127.

CLiMB Technical Contribution

Page 10: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 10

CLiMB Overall Goals

The essence of CLiMB: • Use scholars themselves as “catalogers” by

employing scholarly publications• Enhance existing descriptive metadata

The CLiMB project:• Research: Development of richer retrieval

through increased numbers of descriptors• Practice: Development of CLiMB Toolkit

Page 11: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 11

• Digital images

• Associated text

• Target object identification (TOI)

• CLiMB Toolkit

• Evaluation

Metadata SqueezeBasic Elements

Page 12: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 12

Greene & Greene Architectural Records and Papers Collection

Drawings and ArchivesAvery Architectural and Fine Arts LibraryColumbia University Libraries

Page 13: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 13

NYDA.1960.001.00023

All Saints Episcopal Church (Pasadena, Calif.). Alterations1902-1903

Page 14: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 14

Greene & Greene Catalog RecordAuthor: Greene & Greene.Title: [Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena,

Calif.). Alterations.]Residence of Mrs. Dudley P. Allen, 1188 Hillcrest Ave., Pasadena,

Cal. [graphic] : Alteration / Greene & Greene, Architects. Published: [1917]

Physical Details: 4 sheets : various media ; 87.8 x 57.3 cm. (34 5/8 x 22 5/8 in.)Location: Columbia University, Avery Architectural Drawings

Other Authors: Greene, Charles Sumner, 1868-1957. Greene, Henry Mather, 1870-1954.

Subjects: HousesAlterationsArchitecture--Designs and plans--United States.Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena,

Calif.)

Component Item: [1] Item no. NYDA.1960.001.03224. [AVERYimage]. Electric lighting -- floor plan, part plan of basement : Sheet no.

Component Item: [2] Item no. NYDA.1960.001.00073. [AVERYimage]. [Electric lighting] floor plan, part plan of basement.

Page 15: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 15

• Bosley, Edward R. Greene & Greene. London : Phaidon, 2000.

• Current, William R. Greene & Greene: architects in the residential style. Fort Worth [Tex.] : Amon Carter Museum of Western Art, [1974]

• Makinson, Randell L. Greene & Greene: architecture as fine art. Salt Lake City : Peregrine Smith, c1977.

• Makinson, Randell L. Greene & Greene: the passion and the legacy. Salt Lake City : Gibbs and Smith, c1998.

• Smith, Bruce. Greene & Greene masterworks. San Francisco : Chronicle Books, c1998.

• Strand, Janann. A Greene & Greene guide [Pasadena, Calif. : G. Dahlstrom, 1974]

Greene & Greene Bibliography(associated texts)

Page 16: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 16

Page 17: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 17

• Digital images

• Associated text

• Target object identification (TOI)

• CLiMB Toolkit

• Evaluation

Metadata SqueezeBasic Elements

Page 18: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 18

Target Object Identification (TOI)

• New methodologies and software = new concepts and terms

• “Authority” list

• Varies from collection to collection Greene & Greene – Project Names North Carolina Museum – Creator/Title

Page 19: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 19

Author: Greene & Greene. Title: [Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.)]

Residence of Mrs. Dudley P. Allen, 1188 Hillcrest Ave., Pasadena, Cal. [graphic] : Alteration / Greene & Greene, Architects. Published: [1917] Physical Details: 4 sheets : various media ; 87.8 x 57.3 cm. (34 5/8 x 22 5/8 in.) or smaller. Location: Columbia University, Avery Architectural DrawingsOther Authors: Greene, Charles Sumner, 1868-1957.

Greene, Henry Mather, 1870-1954.Subjects:Houses

AlterationsArchitecture--Designs and plans--United States.Pasadena (Ca.) Mrs. Dudley P. Allen house, 1188 Hillcrest Avenue (Pasadena, Calif.)

Genre Or Form: Architectural drawings--American. Floor plans.

Component Item: [1] Item no. NYDA.1960.001.03224. [AVERYimage]. Electric lighting -- floor plan, part plan of basement : Sheet no.

Greene & Greene Catalog Record

Page 20: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 20

57.21.1 Lodovico Carracci The Assumption of the Virgin

60.17.51 Domenico Zampieri, called Domenichino The Madonna of Loreto Appearing to St. John the Baptist, St. Eligius, and St. Anthony Abbot

52.9.168 Bernardo Strozzi St. Lawrence Distributing the Treasures of the Church

TOI list for NCMA

Page 21: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 21

Deriving TOIs: Image Vendor Data

Page 22: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 22

Page 23: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 23

North Carolina Museum of Art

North Carolina Museum of Art: Handbook of the Collections.

Rebecca Martin Nagy, editor. Raleigh: The Museum of Art. 1998

Page 24: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University

North Carolina Museum of Art

Page 25: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 25

Page 26: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 26

Page 27: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 27

• Record Type = work• Type = painting• Title = Banquet Piece• Measurements. Dimensions = 79.7 x 94 cm• Material = oil on panel• Creator = Uyl, Jan Jansz. den• Date. Creation = ca. 1635• Location. Current Repository = Raleigh (NC,

USA) North Carolina Museum of Art• ID Number. Current Repository = 52.9.43• Style/Period = Dutch 17th century

VRA Core 3.0 Record

Page 28: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 28

• Digital images

• Associated text

• Target object identification (TOI)

• CLiMB Toolkit

• Evaluation

Metadata SqueezeBasic Elements

Page 29: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 29

CLiMB TOOLKIT: Process Flow

1. Load Text

2. Load TOI List

3. Analyze Text

5. Review

4. Select Subject Access Terms

Page 30: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 30

CLiMB Toolkit

http://www1.cs.columbia.edu/~delson/CLiMB/gui/dev2

Page 31: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 31

• Record Type = work• Type = painting• Title = Banquet Piece• Measurements. Dimensions = 79.7 x 94 cm• Material = oil on panel• Creator = Uyl, Jan Jansz. den• Date. Creation = ca. 1635• Location. Current Repository = Raleigh (NC, USA),

North Carolina Museum of Art• ID Number. Current Repository = 52.9.43• Style/Period = Dutch 17th century• CLiMB Subject = Dutch still life painting• CLiMB Subject = vanitas• CLiMB Subject = burned down candle• CLiMB Subject = glass• CLiMB Subject = pewter

VRA Core 3.0 Record with CLiMB Subject Terms

Page 32: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 32

CLiMB Metadata and

Cataloging Schemas• MARC 21

Field for CLiMB metadata:653 - INDEX TERM--UNCONTROLLED (R)

• VRA Core Categories, Version 3.0Field for CLiMB metadata: SUBJECT

• Dublin Core Metadata Element Set, Version 1.1Field for CLiMB metadata: SUBJECT AND KEYWORDS

Page 33: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 33

• Digital images

• Associated text

• Target object identification (TOI)

• CLiMB Toolkit

• Evaluation

Metadata SqueezeBasic Elements

Page 34: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 34

Toolkit

CLiMB

CLiMB Toolkit Evaluation14 April 2004

Page 35: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 35

CLiMB Toolkit Evaluation

• Evaluation with expert users Visual resources professionals Art librarians

• Overall satisfaction with interface design

• Overall satisfaction with functionality

• Identified areas for further development

Page 36: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 36

Thank you!

Any further questions?

www.columbia.edu/cu/cria/climb

Page 37: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 37

Page 38: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 38

Page 39: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 39

Page 40: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 40

Page 41: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 41

Page 42: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 42

Page 43: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 43

Page 44: CLiMB - Columbia University1 Project CLiMB Computational Linguistics for Metadata Building Using Computational Linguistic Techniques to Harvest Image Descriptors.

CLiMB - Columbia University 44