Top Banner
Making Textual Information More Accessible Holly Miller Florida Institute of Technology
25
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Making Information More Accessible

Making Textual Information More

Accessible

Holly MillerFlorida Institute of Technology

Page 2: Making Information More Accessible

About me

•Biochemist•Curious about Information•Librarian• Informatician/Project Director/Library Director•Asst. Dean, Scholarly Content & Faculty Engagement

Page 3: Making Information More Accessible

DOMO

Every minute:• Facebook users share nearly 2.5 million pieces of content.• Twitter users tweet nearly 300,000 times.• Email users send over 200 million messages.

Page 5: Making Information More Accessible

50 million articles published by 2009

Jinha (2010) Learned Publishing 23:258.

Page 6: Making Information More Accessible

Cancer - 694,372 articles in the last 5 years

Climate change – 88,565

Species extinction – 8,453

Median # of articles read in a year – 264*

Cance

r

Climat

e Cha

nge

Spec

ies E

xtinct

ion

# of a

rticle

s rea

d/ye

ar100.00

1,000.00

10,000.00

100,000.00

1,000,000.00694372

88565

8453

264

Num

ber

of

Art

icle

s

*Nature News (2014) Scientists may be reaching a peak in reading habits

Page 7: Making Information More Accessible

Too much information

Page 8: Making Information More Accessible

Example: Species Identification

Names offer a logical way to search for and index content

Page 9: Making Information More Accessible

Names are one of biology’s Controlled Vocabularies

Page 11: Making Information More Accessible

How to do it? In the past….

Georges Louis Leclerc, comte de BuffonHistoire naturelle : générale et particulière (Oiseaux), 1799-1808

Page 12: Making Information More Accessible

FindIT - Scientific Name Recognition Algorithm

Page 13: Making Information More Accessible

The OCR Problem

Epitonium foliaceicostwm Orbigny Wrinkled-ribbed Wentletrap Southeast Florida to the Lesser Antilles.

Page 14: Making Information More Accessible

Phyllodesmium acanthorhinum

Source: http://ab.co/1ByZcIb Photographer: Robert Bolland

Page 15: Making Information More Accessible

Machine Learning for

Species Identification

Reptilia and Batrachia. (1885-1902) by Albert C.L.G.  Günther

Page 16: Making Information More Accessible

NetiNetiName Extraction from Textual Information-Name Extraction for Taxonomic Indexing

The fluorescent sea slug Phyllodesmium acanthorhinum is more than just a pretty collection of colors: the creature bridged the gap for scientists trying to understand the relationship between sea slugs that feed on hydroids and those that dine on corals.

Source: http://ab.co/1ByZcIb Photographer: Robert Bolland

Akella et al. BMC Bioinformatics 2012, 13:211http://www.biomedcentral.com/1471-2105/13/211

Page 17: Making Information More Accessible

Named Entity Recognition (NER)

to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations…

Page 18: Making Information More Accessible

 Adjective noun unknown

How does NetiNeti work?

Named Entity Recognition (NER)

The fluorescent sea slug Phyllodesmium acanthorhinum is more than just a pretty collection of colors:

Adjective noun unknown

Page 19: Making Information More Accessible

How does NetiNeti work?

• Text is tokenized (broken into chunks)• Prefiltering step• Probability that token is a name is calculated (structure

and context)• Training (positive and negative examples)• Features (letter combinations, # of vowels, part of speech)

The fluorescent sea slug Phyllodesmium acanthorhinum is more than just a pretty collection of colors:

name not a name

Page 20: Making Information More Accessible

How well does NetiNeti work?

Page 21: Making Information More Accessible

http://gnrd.globalnames.org/

Page 22: Making Information More Accessible
Page 23: Making Information More Accessible

Connecting Biodiversity Literature to EOL

Page 24: Making Information More Accessible
Page 25: Making Information More Accessible

Questions?

The language of birds :London: Saunders and Otley,1837.biodiversitylibrary.org/page/47512020via Flickr

Thank You!