Top Banner
Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar
25

Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Dec 13, 2015

Download

Documents

Elvin Campbell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Conference Tracker

(project presentation)

Andy CarlsonVitor CarvalhoKevin KillourhyMohit Kumar

Page 2: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Overview

• Goal: To find and gather salient details about conferences and workshops.

• Submission Deadline• Location• Home page

… and others

• Preliminary results:• Succeeded in autonomously finding conferences, submission

deadlines, locations, and homepages

… although not without error

… approaches ranged from bootstrapping to focused crawling

Page 3: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Conference TrackerModule

With four modules, each group member worked primarily on the design and implementation of a particular component.

Page 4: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Bootstrapped Conference Acronym Discovery

• Goal: find conference acronyms

• Examples: ICML2006, IJCAI01, SIGMOD’98

• Discovers patterns of the form “token token _____ token token” that frequently have acronyms in the blank

• Redundant features: web page text, morphology

Page 5: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Seed Conferences

• We start by searching for:– “academic conferences including”– “academic conferences such as” – “and other academic conferences”– “or other academic conferences”

• This yields seeds:– SC2001, WWW2003

Page 6: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Finding patterns

• Searching for “SC2001” and “WWW2003” yields these ten most frequent patterns:– QUESTIONS ABOUT ___ MAY BE– PAPERS AT ___ IN DENVER– GATHER AT ___ TO DEFINE,– TRIP TO ___ PC MEETING– PREVIOUS MESSAGE: ___ BEOWULF PARTY– FWD FW ___ CALL FOR– TO OFFER ___ CLUSTER TUTORIAL– FWD AGENTS ___ WORKSHOP ON– EXHIBIT AT ___ TO FEATURE– 1 0 ___ 1 1

Page 7: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Finding more acronyms

• Searching for these new patterns yields more acronyms:– HFES2003– ICKM2005– SC2000– SCCG2000– SPLIT 2001– SVC05– WWW2002– WWW2004– WWW2005

Page 8: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Repeat…• Repeating process for 5 cycles yields 95 conference acronyms

• AAAI-05, AAAI'05, AAAI-2000, AAAI-98, AAMAS 2002, AAMAS 2005, ACL 2005, ACSAC 2002, ADMKD'2005, AGENTS 1999, AIAS 2001, AMPT95, AMST 2002, APOCALYPSE 2000, AVI2004, AWESOS 2004, BABEL01, CASCON 1999, CASCON 2000, CHI2006, CHI 2006, CHI97, CHI99, CITSA 2004, COMPCON 93, CSCW2000, EACL06, ECOOP 2002, ECOOP 2003, ECSCW 2001, EDMEDIA 2001, EDMEDIA 2002, EDMEDIA 2004, EMBODY2, ES2002, ESANN 2002, ESANN 2004, GECCO 2000, GWIC'94, HFES2003, HT05, HT'05, IAT99, ICKM2005, ICSM 2003, IFCS 2004, IJCAI-03, IJCAI05, IJCAI 2001, IJCAI 2005, IJCAI91, IJCAI95, ISCSB 2001, LICS 2001, MEMOCODE 2004, METRICS02, MIDDLEWARE 2003, NORDICHI 2002, NUFACT05, NWPER'04, NWPER'2000, OOPSLA'98, PARCO2003, PARLE'93, PKI04, PODC 2005, POPL'03, PROGRESS 2003, PRORISC 2002, PRORISC 2003, PRORISC 2004, PRORISC 2005, PROSODY 2002, RIAO 94, ROMANSY 2002, SAC 2004, SAC2005, SC2000, SC2001, SCCG2000, SIGDOC'93, SIGGRAPH'83, SIGIR 2001, SPIN97, SPLIT 2001, SPS 2004, SVC05, UML'2000, WOTUG 16, WOTUG 19, WWW2002, WWW2003, WWW2004, WWW2005, WWW2006

Page 9: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Best patterns

• Most productive patterns: – “cfp ___ workshop”– “proceedings of ___ pages”– “for the ___ workshop”

Page 10: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Bootstrapped Acronym Discovery-- Conclusions

• Using morphology to find only conference acronyms gave 100% precision, low recall (all acronyms discovered were conferences or workshops)

• Bootstrapping from a generic set of queries can take us from 2 to 95 acronyms

• To boost recall, we need some method of focusing on the best patterns

Page 11: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Name/Page Finder (Algorithm)

• Supplied with an acronym/year (SAC’04), finds the corresponding conference and its homepage (Selected Areas in Cryptography / http://vlsi.uwaterloo.ca/~sac04)

– Search Google for “SAC 04” and “SAC 2004” (10 results each)– Extract potential conference names (using capitalization heuristics)– Score each web page and potential conference name– Select highest-scoring page / name pair

• Score each name and page based on– Heuristics (e.g., acronym embedded in name, title contains acronym)– Inclusion of words distinctive to conference names and pages

• Distinctive words are determined using TF-IDF* scoring, and word counts are updated after each acronym.

Page 12: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Name/Page Finder (Results)1. Evaluation within Conference Tracker

– Given output of the Acronym Finder, find name/homepage for all the acronym/year pairs.

– When the homepage and name is completely right, it is labeled as all-correct. If the name is correct (but the homepage wrong), it is labeled as name-correct.

all-correct 17/73 23%

name-correct 19/73 26%

total 36/73 49%

2. Evaluation as stand-alone component– Given set of 27 manually collected acronyms for conferences

with homepages in 2006, repeat the above procedure

all-correct 19/27 70%

name-correct 4/27 15%

total 23/27 85%

Page 13: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Location Finder – Approach (Focused Crawling)

• Motivation: Sergei-Brin’s approach for author-book title• Observation: Searching for <Conference Name> <Location> returns conference main page or

similar pages.• Pattern Observation: These pages state the full name of the conference in close proximity of the

conference location.

• Generalized pattern: Proximity – Defined currently by a window of 200 characters.• Algorithm:

– Query Google with Conference Long name and year– Use top URLs to look for “locations” in ‘Proximity’ of conference long name (Currently using topmost query

only)– Use heuristics to assess whether the page contains the conference location or is a list of such conference-

location pair

Page 14: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Location Finder – Pros & Cons• PROs

– Quite Generalised approach, because of Proximity operator

– Scalable approach

• CONs– Depends on the Google query results

• Query ‘crafting’ important

– Dependant on finding out the ‘home page’ or similar page for the conference

– Needs Location annotators

Page 15: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Location Finder – Test Results• 13 Conferences & Workshops – IEEE &

ACM (Using full name to query Google & using top link for extraction)– Correct – 7– Partially Correct -1– No result – 5

• Reasons:– Annotator coverage: 1 (Partially correct)– Name in image: 4– Text extraction from web page: 1

Page 16: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Location Finder – Improvements• Use Co-training:

– Redundancy on the web is not being exploit– model is not probabilistic (currently using just

top link for extraction)

• Location annotator– Currently, a simple dictionary look-up (Use

Minorthird/BBN)

• Intelligent adaptable window

Page 17: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Submission Date

• Task: find the Paper Deadline Submission Date

• Google : “call for papers conferenceName conferenceAcronym year submission deadline” and similar queries

• 2 types of processing: pages with CFP lists and usual Conference pages.

• Most of the times, no sentence structure. • Idea: Proximity of keywords (submission, deadline,

conference name, year, etc.)

Page 18: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Lists of CFP

Page 19: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Conference Dates Page

Page 20: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Submission Date

• Hand-tuned Entity recognizer for dates• Several heuristics and regular expressions• No learning• Rank by the “closest” date to keywords

• Some keywords: submission, deadline, conference acronym, year

• Precision:– All conferences: top1 = 2%, top 3 = 5.8%, events = 13.4%– More recent conferences: (SIGIR, ICML, KDD, 2003-2006):

• Top 1 = 50%, Top 3 = 75%

Page 21: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Submission Date

• Problems:– Main conference and workshop/tutorial dates – Conferences co-located– Same conference but previous year– Actual conference event dates– Change of deadlines– Hard to evaluate: just couldn’t find the deadline for

some old conferences

Page 22: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Overall Results

• Acronym finder– 100% precision

• Name/page finder– 49% names correct– 23% names & URLs– (85% on vetted data)

• Location finder– 21% locations correct– 38% lists, 30% none– 11% wrong

• Date finder– 2% completely right– 5.8% in top 3– 13.4% event dates

Page 23: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Lessons Learned

• If we really are learning, then reconsider earlier decisions in light of new knowledge– Pass 1: AAAI = Holger Hoos and Thomas Stuetzle, IJCAI Workshop

– Pass 2: AAAI = National Conference on Artificial Intelligence

• Supplement creative learning algorithms with simple, focused crawling

• Don’t underestimate the time it takes to build foundational tools before “learning”

Page 24: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

Useful Resources

• Perl– Rapid prototyping– Packages/extensions– Quick/dirty text manipulation

• Shell scripts and Unix tools– grep, sed, bash, lynx ...

• Google– wildcards (*) and date ranges 2003..2006– cached web pages

Page 25: Conference Tracker (project presentation) Andy Carlson Vitor Carvalho Kevin Killourhy Mohit Kumar.

What’s Next?

• Failure notifications from later components could propagate backward.

• All components could be smarter about how long to descend Google’s returns (i.e., as long as they provide valuable info)

• Given good name/acronym/location/date sets, we could look for lists.