1© 2007 IBM Corporation
®
“Find What I Mean, Not What I Say"
Mike MoranIBM Distinguished Engineer
November, 2007
2© 2007 IBM Corporation
Information Management software | Enterprise Content Management
Why do companies use search?
Business Benefit Value of Search
Increase productivity Enable employees to more quickly find information needed to complete their business activities
Achieve greater insight
Analyze free-form, text-based information for insight into customer behavior and business performance
Decrease costs Empower customers and partners to support themselves and perform their own research
Increase revenue Ensure customers can easily find products and services, driving higher sales and increasing customer retention
3© 2007 IBM Corporation
Information Management software | Enterprise Content Management
How does IBM OmniFind meet those needs?
OmniFind Discovery Edition
OmniFind Yahoo! Edition
Basic, No-Charge Search
Search for Self-Service and eCommerce
OmniFind Enterprise Edition
Scalable and Secure Enterprise Search for Corporate Intranets
Insight Solutions with OmniFind
Content Analytics
4© 2007 IBM Corporation
Information Management software | Enterprise Content Management
Why is search so difficult?
It is harder to think of words than to make choices
Choosing the same words as the author is not easy
Words are ambiguous
1 to 10 of 10 zillion
5© 2007 IBM Corporation
Information Management software | Enterprise Content Management
Task
Information Need
Query
Verbal form
Search Engine
I need to tell Pat.
How do I contact Pat?
What’s Pat’s phone number?
Pat phone
Misconception
Mistranslation
Misformulation
Ambiguity
The classic search model
6© 2007 IBM Corporation
Information Management software | Enterprise Content Management
Sometimes your word is used too often
Searching for “neon” finds signs
and cars
7© 2007 IBM Corporation
Information Management software | Enterprise Content Management
Sometimes your word isn’t used at all
Searching for “Pat
phone” finds nothing
Pat phone
8© 2007 IBM Corporation
Information Management software | Enterprise Content Management
• Explicit semantics• Efficient search• Focused content...BUT...• Slow growing• Narrow coverage• Less current/relevant
Analytics bridge unstructured and structured data
UnstructuredInformation
• High-value• Most current• Fastest growing • ...BUT ...• Buried in huge volumes (noise)• Implicit semantics• Inefficient search
Text, Chat, Email, Audio,
Video
Indices
DBs
KBs
StructuredInformation
Text Analysis
9© 2007 IBM Corporation
Information Management software | Enterprise Content Management
“…We were offered $250,000/year in 2001 for an outdoor sign in Hunts Point overlooking the Bruckner expressway. …”
Rate
Bronx
Billboard
Rate for
Rate Billboard
Bronx
Rate for
Going rate for leasing a billboard near Triborough Bridge SEARCH:
Located in
Located in
No keywords in common,but a good answer
Find what I mean, not what I say
10© 2007 IBM Corporation
Information Management software | Enterprise Content Management
“…Simon and Garfunkel's "The 59th Street Bridge Song" was rated highly by the Billboard magazine in the 60's…”
Song Title
Bronx
Rate Billboard
Queens
Rate for
Going rate for leasing a billboard near Triborough Bridge SEARCH:
Located inCommon keywordsBad semantic match
Magazine
Without semantic search, it’s not a pretty picture
11© 2007 IBM Corporation
Information Management software | Enterprise Content Management
President visits shrineBush in Israel
Located At
Gov OfficialArg2:Location
CountryTitle Person
Arg1:Entity
PPVPNPSyntacticAnnotator
Named EntityAnnotator
Relationship Annotator
News example
Search: “Bush trip to Middle East”
12© 2007 IBM Corporation
Information Management software | Enterprise Content Management
Financial services example
Search: “Fred Center’s title”
Search: “head of Center Micros”
Fred is theCenter CEO of
OrganizationPerson
CeoOf
Arg2:OrgArg1:Person
PPVPNPParser
Named Entity
Relationship
Center Micros
13© 2007 IBM Corporation
Information Management software | Enterprise Content Management
Law enforcement example
Search: Neon car
Search: “Higgins’ car”
A was drivenNeon by
Driven By
Arg2:PersonArg1:Car
VP PPNPSyntacticAnnotator
Named EntityAnnotator
Relationship Annotator
TimothyHiggins
Car Person
14© 2007 IBM Corporation
Information Management software | Enterprise Content Management
How does semantic search find a phone number?
15© 2007 IBM Corporation
Information Management software | Enterprise Content Management
Synonyms
Results
ExpandedQuery
@xmlf2::‘ibm <.or>phone <#phonenumber/> "phone nbr" "telephone nbr" "telephone number" </.or>
<.or>number <#phonenumber/> "phone nbr" "telephone nbr" "telephone number" </.or>'
When you search for “IBM phone number”
16© 2007 IBM Corporation
Information Management software | Enterprise Content Management
Customers need a platform, not just samples
To create domain-specific knowledge, create a new annotator or modify one already shipped
Or configure any regular expression with no coding
And it needs to work in many natural languages
17© 2007 IBM Corporation
Information Management software | Enterprise Content Management
Customers need an open, extensible framework
Text analysis is a complex, multi-step process
No one vendor can satisfy every need you’ll have in text analysis
That’s why you need an open framework
Parse Words Categorize Search
IndexAnnotate
Identify Language
UIMA
Text
OmniFind Enterprise Edition
18© 2007 IBM Corporation
Information Management software | Enterprise Content Management
UIMA is an open standard framework
IBM has submitted the Unstructured Information Management Architecture (UIMA) specification to the Organization for the Advancement of Structured Information Standards (OASIS)
The UIMA source code has been contributed to the Apache Software Foundation and an Apache Incubator project has been established to foster collaborative, consensus based development of new software based on UIMA
19© 2007 IBM Corporation
Information Management software | Enterprise Content Management
Support for UIMA and OmniFind
Deliver content to platform for
analysis
Provide components that perform text
analysis
Provide applications that leverage text analysis and enhanced search
20© 2007 IBM Corporation
Information Management software | Enterprise Content Management
Read all about it “Buy this book, read it, and then read it again.”
--Chris Sherman, Search Engine Watch
“Indispensable guide” --Kirkus Reports
Updated every printing
www.mikemoran.com
The search
marketing best seller
“Act now and read it”—Bryan Eisenberg
“Great book”--Robert Scoble
“Bravo” --Search Engine Watch
For more information about the books, and for the free Biznology newsletter and blog:
Internet Marketing