Top Banner
©2013 MFMER | slide-1 An Incremental Approach to MEDLINE MeSH Indexing Presenter: Hongfang Liu BioASQ 2013 Team Member: Mayo Clinic: Wu Stephen, James Masanz, and Hongfang Liu University of Delaware: Dongqing Zhu, Ben Carterette
17

An Incremental Approach to MEDLINE MeSH Indexing

Feb 24, 2016

Download

Documents

avedis

An Incremental Approach to MEDLINE MeSH Indexing. Presenter: Hongfang Liu. Team Member: Mayo Clinic: Wu Stephen, James Masanz , and Hongfang Liu University of Delaware: Dongqing Zhu, Ben Carterette. BioASQ 2013. Outline. Motivation & Task Incremental Systems MetaMap -based - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-1

An Incremental Approach to MEDLINE MeSH IndexingPresenter: Hongfang Liu

BioASQ 2013

Team Member: Mayo Clinic: Wu Stephen, James Masanz, and Hongfang LiuUniversity of Delaware: Dongqing Zhu, Ben Carterette

Page 2: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-2

Outline

• Motivation & Task• Incremental Systems

• MetaMap-based• Search-based• LLDA-based

• Experiment Setup• Evaluation• Conclusion

Page 3: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-3

Motivation of BioASQ Task

• Reduce human effort in MeSH indexing• Increasing number of new articles• Low consistency among annotators [Funk and Reid]

• Automatic MeSH indexing• Suggest MeSH terms for a given new article

Page 4: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-4

Motivation of Mayo’s Participation

• Information retrieval (IR)-based ontology annotation• Traditional approach has been information

extraction-based• Three levels of intelligence in artificial intelligence

• Knowledge-base intelligence• Data intelligence• User intelligence> Explore the use of topic modeling and distant supervision for ontology annotation

Page 5: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-5

Proposed Approaches

• MetaMap-based

• Search-based

• LLDA-based

Three approaches can work either independently or together in an incremental way

DUI

DUI

DUI

DUI

Page 6: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-6

MetaMap-based System

Title: Age-period-cohort effect on mortality from cervical cancer. Abstract: to estimate the effect of age, period and birth cohort …

CUICandidates

Score

C0007847 1000C0302592 1000C0998265 861

… …

MetaMap Restricted to MeSH ontology

… …

… …

… …

… ..

… …

A ranked list of CUI => a ranked list of DUI

Title_score

Score threshold

Top DUI

Page 7: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-7

MetaMap-based System

Title weight Score threshold Top DUI

• Parameter Tuning

Titles concepts are more important

Low threshold roughly leads to high

precision/recall

Tradeoff between P/R

Page 8: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-8

Search-based System

• Retrieval Model

• DUI Aggregation

– query term – query weight – matching function – document – Dirichlet parameter

Docs

D01, D02, D03 …

D08, D03, D01 …

D02, D03, D01 …

DUI

ranked by tf * score(Q, D)

Page 9: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-9

• Term Query• is a single-word expression• concept-related words in title and abstract

• Phrase Query• is a multi-word expression• concept-related phrases in title and abstract

• Long Query• mix of TQ and PQ

Search-based System

#weight(2.0 examination 2.0 cow 2.0 ultrasonographic 3.0 navel3.0 urachal 3.0 extra-abdominal 2.0 pathologic 2.0 abscess)

#weight(3.5 #uw2(hiv-1 infection) 4.5 #uw2(differential susceptibility) 2.0 #uw2(actin dynamics) 2.0 actin 4.5 #uw2(cortical actin) 4.5 #uw3(naive t cells) 2.5 dichotomy 3.5 #uw2(human memory)3.5 #uw3(chemotactic actin activity) 2.0 cd45ro)

Page 10: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-10

Search-based System

Dirichlet Smoothing parameter

Top-ranked documents Top-ranked DUI

• Parameter Tuning

Less smoothing => better performance

A small set of highly relevant documents Tradeoff between P/R

Page 11: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-11

Systems

• LLDA-based• LDA Process

• Each document is a mixture of topics• Each topic is a multinomial word distribution

• Labeled LDA• Incorporate label information

Page 12: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-12

Systems

• LLDA-based• Top categories in MeSH

…Top-level categories as topics

(e.g., Anatomy Category, Chemicals and Drugs Category,

etc.)

root

Each label below is converted to corresponding top-level labels

Page 13: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-13

Systems

• LLDA-based• DUI candidate list pruning

A pruned rank list

doc Search-based

LLDA-based Categories

DUIDUI

DUI

DUI

Page 14: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-14

Data

Training -- <PMID, title, abstract, labels>

Testing -- input:<PMID, title, abstract>

output: <PMID, labels>

Page 15: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-15

Evaluation

MM: MetaMap-based systemMi: microLCA: lowest common ancestor

Page 16: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-16

Conclusion and Future Work• Three Systems

• MetaMap-based, search-based, LLDA-based

• Research findings• Explored impact of various parameter on performance• Promising results from search-based labeling

• Future Direction• Better concept weighting strategies

• E.g., corpus-level statistics, external resources• Comprehensive comparisons with existing methods• A better strategy for incorporating hierarchical info. Into LLDA

Page 17: An Incremental Approach to MEDLINE  MeSH  Indexing

©2013 MFMER | slide-17

Questions & Discussion