Top Banner
HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development
19

HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

Mar 26, 2015

Download

Documents

Nicole Medina
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCHPeter Jackson & Khalid Al-Kofahi

Corporate Research & Development

Page 2: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

HORIZONTAL VERSUS VERTICAL SEARCH

HORIZONTAL VERTICAL

Consumer focus Professional focus

General interest Specialist interests

Average user Expert user

Shallow information need Deep information need

2

Page 3: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

THE PARADOX OF SEARCH

• The further you get from keyword indexing and retrieval, the harder it is to explain a search result– Professional searchers demand transparency

• Tool versus appliance

• You need an ‘explanatory model’ that people can relate to and understand, even if it is actually just a cartoon of the real process– Examples: Basic PageRank, Collaborative Filtering

• Such models don’t work so well in vertical domains– Links aren’t always endorsements

– Sparsity of data in smaller communities

3

Page 4: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

RECENT TRENDS IN SEARCH

• Fragmentation of ‘horizontal’ search– Media, location, demographics (Weber & Castillo, 2010)

• More sophisticated models of user behavior– Post-click behaviors (Zhong, Wang, et al, 2010)

• ‘Practical semantics’ versus Semantic Web– Maps as search results for local, micro-results

• Incorporation of domain knowledge into search– Taxonomies, vocabularies, use cases, work flows

4

Page 5: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

THE EXAMPLE OF LEGAL SEARCH

• The completeness requirement– Recall as important as precision

• Less redundancy than on the Web

• The authority requirement– Court superiority, jurisdiction

– Highly cited cases and statutes• Supercession by statute or regulation

• The multi-topical nature of documents– Case may cover many points of law but only cited for one

– Citations can be negative as well as positive per topic

>These factors also apply to scientific documents

5

Page 6: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

POWER LAW AND LEGAL TOPICS

6

Page 7: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

POWER LAW AND WESTLAW USERS

7

Page 8: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

EXPERT SEARCH

• In many verticals, there are at least two sources of expertise available for enhancing search– Editors and authors, who generate useful metadata

– Users, who generate clickstreams and other data

• Editorial value addition improves recall especially– Helps find both fat neck and long tail document on a topic

• Aggregate user behavior mostly improves precision– Power users find most relevant and important documents

• The model of expert search enables and explains the portfolio of results, rather than individual results

8

Page 9: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

9

SOURCES OF EVIDENCE:AUTHORS & EDITORS

Headnote, KNHeadnote, KN

text text texttext citation textcitation text text

= = == = == = =

= = == = == = =

CASE

= = == = == = =

= = == = == = =

CASE

= = == = == = =

= = == = == = =

CASE

= = == = == = =

= = == = == = =

CASE

= = == = == = =

= = == = == = =

CASE

= = == = == = =

= = == = == = =

CASE

CASE

= = == = == = =

= = == = == = =

CASE

= = == = == = =

= = == = == = =

CASE

= = == = == = =

= = == = == = =

172013 (A)28 (B)

205,3105 (A)19 (B)

Issue: Long arm jurisdiction12 A (Key cases)54 B (Highly Relevant)

354 (A)5 (B)

= = == = == = =

= = == = == = =

CASE

Burger King Corp, V.

Rudzewicz

Page 10: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

Burger King Corp, V.

Rudzewicz

10

SOURCES OF EVIDENCEAUTHORS & EDITORS

HN1 KN1HN2 KN2HN3 KN2…. ….…. ....HN35 KN14

= = == = == = =

= = == = == = =

ALR

= = == = == = =

= = == = == = =

CJS

= = == = == = =

= = == = == = =

AMJUR

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

CASES

= = == = == = =

= = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

CASES

= = == = == = =

= = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

CASES

= = == = == = =

= = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

CASES

= = == = == = =

= = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

CASES

= = == = == = =

= = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

CASES

= = == = == = =

= = == = == = =

Another set of related cases

Page 11: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

Burger King Corp, V.

Rudzewicz

11

SOURCES OF EVIDENCE: USERS (I)

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

CASES

= = == = == = =

= = == = == = =

Query 1

Query 2

Query 3

Query NCLICK

SESSION 1

CLICK

SESSION N

PRINT

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

CASES

= = == = == = =

= = == = == = =

ACTIONS

ACTIONS

Link query language to document language via click, print, and cite checking behaviors

Identify documents that are co-clicked, co-printed, etc, with the Burger King

case across user sessions

CLICK

PRINT

KEYCITE

Page 12: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

12

QUERY 1

QUERY N

"personal jurisdiction” 176"minimum contacts” 50"forum selection clause” 39“personal jurisdiction” 39"forum non conveniens” 32"choice of law” 29

IN THE LAST 3 MONTHS

SOURCES OF EVIDENCE: USERS (II)

Original breach of contract and trademark

infringement case turned into a civil procedure case

about jurisdictionon appeal

Burger King Corp, V.

Rudzewicz

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

CASES

= = == = == = =

= = == = == = =

SESSION 1

CLICK

SESSION N

PRINT

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

= = == = == = == = == = == = =

CASES

= = == = == = =

= = == = == = =

ACTIONS

ACTIONS

USER ACTIONS: 10417 TOTAL SESSIONS: 9758

Page 13: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

AI & THE RANKING PROBLEM

• Supervised Machine Learning (Ranker SVM)– Iteratively retrieve and rank documents

– Incorporate all available cues: text similarity, classifications, citations, user behavior and query logs

– All of this requires lots of data!

• Training & Validation– Gold data: hand-crafted research reports covering a

variety of legal issues

– Report contains an issue statement, multiple queries, all seminal, highly relevant documents, some relevant docs• > 100K documents judged against ~400 legal issues

– System was also tested by an independent 3rd party

13

Page 14: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

HADOOP FOR BIG DATA PROCESSING

• At launch, query logs contained ~ 2 Billion records– Queries & user actions

• Relied on a Hadoop cluster to– Extract, Transform, and Load processes.

– Cluster similar queries together

– Extract, normalize, collate citation contexts

• Dramatic improvement in processing times– From tens of hours to tens of minutes

14

Page 15: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

COMPUTATION NORMAL TIME HADOOP TIME

Building complete Westlaw dictionary

2.5 days 1 hour

Clustering similar Westlaw queries

1.5 days 3 minutes

Citation extraction from over 10 M documents

1.25 days 3 hours

HADOOP: TYPICAL SPEED UPS

Page 16: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

CLUSTER CONFIGURATION: QUERIES

• 8 machines, each with 16 cores

• Only 14 cores/machine were available for processing– Giving a total of 112 cores

• Block size of 64 MB– Each core processes one block at a time

• Cluster can process 7 GB at each step

• Latest cluster is twice the size: 224 cores– Almost 1 TB of memory and over 1 PB of storage

16

Page 17: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

THE POWER OF EXPERT SEARCH

• Leverages expertise of community: authors, editors, & users– We know why documents are linked

– We know exactly who our users are

• Metadata, authority & aggregated user data all contribute to relevance, importance & popularity

• Can still benefit from Power Law phenomena so common on the Web

• Can exploit data parallelism to achieve the same kind of scale as horizontal search

17

Page 18: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

LESSONS LEARNED

• Vertical search is not just about search– It’s about findability

• Includes navigation, recommendations, clustering, faceted classification, etc.

– It’s about satisfying a set of well-understood tasks• Usually on enhanced content

• Usually for expert customers

• Leveraging human value addition is key– None of the human actors set out to improve search

• Difficult to design complete solution upfront– Need platform for experimentation and validation at scale

18

Page 19: HUMAN EXPERTISE AND ARTIFICIAL INTELLIGENCE IN VERTICAL SEARCH Peter Jackson & Khalid Al-Kofahi Corporate Research & Development.

QUESTIONS?

• A relevant paper is downloadable from

http://labs.thomsonreuters.com

19