IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved
LanguageWare Introduction
Marie Wallace, IBM LanguageWare
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
Strategy
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
There are many challenges in maximizing the total ROI of natural language processing which LanguageWare addresses through it’s comprehensive “big picture” design & implementation
Explicit knowledge in documents, emails and other written forms.
Tacit knowledge located in the experience of individuals, networks and communities.
Embedded knowledge in work routines, practices and norms.
Morphological Analysis Lexical Analysis Parsing & Grammars Semantic Analysis & Disambiguation Statistical Processing
Knowledge Integration (mental semantic models) Social Computing (tagging, relationship extraction,
ontology derivation, …) Social Semantic Search & Discovery (w/
disambiguation)
Knowledge Integration (converting documents into business objects)
On-the-fly analysis (form filling, semi/automatic tagging, disambiguation, …)
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM launched the LanguageWare project 2001 with the vision of creating a common NLP componentry that could be flexibly applied to a wide range of challenges across IBM’s entire product portfolio (entire information lifecycle)
CreateCreate StoreStore FindFind UnderstandUnderstand
Text Analytics can improve content consistency, and it’s associated meta-data, through semi-automatic analysis at content creation
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
CreateCreate StoreStore FindFind UnderstandUnderstand
IBM launched the LanguageWare project 2001 with the vision of creating a common NLP componentry that could be flexibly applied to a wide range of challenges across IBM’s entire product portfolio (entire information lifecycle)
It can generate valuable meta-data which can be leveraged for subsequent analysis – integrate multiple sources of (un)structured information
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
CreateCreate StoreStore FindFind UnderstandUnderstand
IBM launched the LanguageWare project 2001 with the vision of creating a common NLP componentry that could be flexibly applied to a wide range of challenges across IBM’s entire product portfolio (entire information lifecycle)
It can help enhance search experience through leveraging semantics, social networks, taxonomies & folksonomies, … to uncover knowledge hidden in the unstructured content
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
It can support BI techniques and algorithms to extract actionable knowledge and insight from vast quantities of available information (structured & unstructured)
IBM launched the LanguageWare project 2001 with the vision of creating a common NLP componentry that could be flexibly applied to a wide range of challenges across IBM’s entire product portfolio (entire information lifecycle)
CreateCreate StoreStore FindFind UnderstandUnderstand
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
To achieve a solution that addressed the diverse and conflicting requirements across divisions, brands, products, industries, and platforms, we needed to create something that was…
Ubiquitous & Flexible
– Leveraged comprehensively across all IBM divisions through simple-to-use integration packages satisfying any type of application
Enterprise-transforming & Enterprise-ready
– Combining strong engineering principles with latest research techniques
Highly Extensible & Customizable
– Applying an open extensible data-driven model which delivers a highly optimized industrial-strength runtime, with simple yet powerful customization tools for developing domain resources
Standards-based & Easily Accessible
– Leveraging open source technologies & standards, such as UIMA, and made freely available for evaluation & prototyping through Alphaworks
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
Our philosophy was to create light-weight technology that could be easily embedded into any existing solution to provide natural language understanding transparently and unobtrusively for the end user
“Language Understanding” is personal & specific to industry sector, company, organization, function, person, and time …
The data that drives the discovery process is YOUR competitive advantage
– You need technology that can be enhanced with your data models to allow you to capture insights that your competitors can’t
Discovery is an integral part of all our lives and you want it integrated seamlessly into your entire information life-cycle – from creation to obsolescence (and beyond)
– You need technology that can seamlessly integrate the knowledge of your people – mapping personal models to analytics models
– The closer you move the analytics to the knowledge worker (allowing a feedback loop to harness knowledge) the higher quality analytics
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
As a result of the successful execution of this strategy, LanguageWare is the most broadly used NLP technology across IBM…
Embedded into Lotus, WebSphere, DB2, and Rational products
Integrated into IBM’s hardware
Used internally by IBM’s CIO Office
Deployed in GBS and SWG services
Used within IBM Research
Used as part of European Research projects
And now licensed to end-customers
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
Technology
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
In the context of the present invention, a compound as described herein or pharmaceutical composition thereof can be
utilized for modulating the activity of RUP3 receptor mediated diseases, conditions and/or disorders as described herein.
Examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of
metabolic related disorders such as, but not limited to, type I diabetes, type II diabetes, inadequate glucose tolerance,
insulin resistance, hyperglycemia, hyperlipidemia, hypertriglyceridemia, hypercholesterolemia, dyslipidemia and
syndrome X. Other examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or
treatment of obesity and/or overweight by decreasing food intake, inducing satiation (i.e., the feeling of fullness),
controlling weight gain, decreasing body weight and/or affecting metabolism such that the recipient loses weight and/or
maintains weight.
In the context of the present invention, a compound as desribed herein or pharmaceutical composition thereof can be
utilized for modulating the activity of RUP3 receptor mediated diseases, conditions and/or disorders as described herein.
Examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of
metabolic related disorders such as, but not limited to, type I diabetes, type II diabetes, inadequate glucose tolerance,
insulin resistance, hyperglycemia, hyperlipidemia, hypertriglyceridemia, hypercholesterolemia, dyslipidemia and
syndrome X. Other examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or
treatment of obesity and/or overweight by decreasing food intake, inducing satiation (i.e., the feeling of fullness),
controlling weight gain, decreasing body weight and/or affecting metabolism such that the recipient loses weight and/or
maintains weight.
In the context of the present invention, a compound as desribed herein or pharmaceutical composition thereof can be
utilized for modulating the activity of RUP3 receptor mediated diseases, conditions and/or disorders as described herein.
Examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of
metabolic related disorders such as, but not limited to, type I diabetes, type II diabetes, inadequate glucose tolerance,
insulin resistance, hyperglycemia, hyperlipidemia, hypertriglyceridemia, hypercholesterolemia, dyslipidemia and
syndrome X. Other examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or
treatment of obesity and/or overweight by decreasing food intake, inducing satiation (i.e., the feeling of fullness),
controlling weight gain, decreasing body weight and/or affecting metabolism such that the recipient loses weight and/or
maintains weight.
In the context of the present invention, a compound as described herein or pharmaceutical composition thereof can be
utilized for modulating the activity of RUP3 receptor mediated diseases, conditions and/or disorders as described herein.
Examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of
metabolic related disorders such as, but not limited to, type I diabetes, type II diabetes, inadequate glucose tolerance,
insulin resistance, hyperglycemia, hyperlipidemia, hypertriglyceridemia, hypercholesterolemia, dyslipidemia and
syndrome X. Other examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or
treatment of obesity and/or overweight by decreasing food intake, inducing satiation (i.e., the feeling of fullness),
controlling weight gain, decreasing body weight and/or affecting metabolism such that the recipient loses weight and/or
maintains weight.
English
Language IdentificationLanguage
Identification SegmentationSegmentation NormalizationNormalizationClassification
Classification
Fuzzy matchingSpelling correction, approx.
lookup, hyphenation
C18.452.339.500.396C18.452.339.500.396
Disambiguation
Disambiguation
DiseaseDisease
Relationship Extraction
Relationship Extraction
Combination Therapy
Pharma Patent
Utilize, describe, modulate, …Utilize, describe, modulate, …
RulesRegular expressions, parsing, grammars
PharmaActionPharmaActionPharmaEffectPharmaEffect
Compound X <can be combined with>
Compound Y
Compound X <can be combined with>
Compound Y
Compound X<addresses>
Disease Y
Compound X<addresses>
Disease Y
C18.654.726.500C18.654.726.500
lipase inhibitors, such as
tetrahydrolipstatin
lipase inhibitors, such as
tetrahydrolipstatin
Obesity-relatedOrlistatOrlistat
running = noun (not verb)tank = vehicle (not container)
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
This is a footer 04/18/23
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
Automatic tagging based on concept mentions
NETWORK OF CONCEPTS
TEXT
Mention Mention Mention Mention
Mapping mentions to concepts .
Finding “focus” concept
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
Multidimensional disambiguation
TEXT
Mention Mention
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
LanguageWare comprises a number of building blocks which combine together to deliver the capabilities
Runtime
The heart of the solution and provides the foundation on which most other capabilities are built
• It provides a language-agnostic text analyzer
• Analysis driven by data and logic encoded in our resources
• It allows us to rapidly analyze text, identify lexical units, normalize and classify those units
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
LanguageWare comprises a number of building blocks which combine together to deliver the capabilities
Runtime
Resources
Lexico-semantic resources that drive the behavior of the system
• Optimized for size and performance
• Customizable for different domains
• A semantic layer modeled by a directed graph to represent the knowledge (network of concepts)
• Graph mining techniques for analysis of semantic networkMorphological description of words
Lexical entries to concepts
Concepts (with navigation)
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
LanguageWare comprises a number of building blocks which combine together to deliver the capabilities
Runtime
Resources
UIMA Annotators
UIMA
The UIMA provides a framework for building text analytics applications, with annotators acting as the plug-ins which encode the processing logic.
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
LanguageWare comprises a number of building blocks which combine together to deliver the capabilities
Runtime
Resources
UIMA Annotators
UIMA
Workbench
• Eclipse-based tooling for manipulation of language resources.
• Allows customers to easily develop new, or customize existing, language resources thereby modifying the behaviour of the annotators.
• Targets the non-developer – terminologist, taxonomist, domain specialist, …
• Supports several representations of domain knowledge – thesauri, taxonomies, ontologies, …
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
LanguageWare comprises a number of building blocks which combine together to deliver the capabilities
Runtime
Resources
UIMA Annotators
UIMA
Workbench
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
Customizable Domain
ResourcesResources RulesResources Resources
Rules & Seed list
LanguageWare Resource
Workbench
Annotators
UIMA (CAS) AnnotationsText
Lexical Analyzer
Language Classifier
POS Tagger
ParserSemantic Analyzer
LanguageWare Architecture / Processing Model
Software Libraries
Char handling, regex, … (ICU4J)
Core NLP (DJTJ)
aFSTuTaggerDLTLS
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
UIMAUIMA
UIMA Pipeline
Crawled Documents
CAS
Collection Reader
Collection Reader
CAS
AnnotatorAnnotator
DB
Index
CAS Consumer
CAS Consumer
• Language Identification• Document Classification• Lexical Analyzer• POS Tagger• Parser• Semantic Analyzer
• Named-Entity Extraction• Relationship Extraction
CAS
AnnotatorAnnotator
CAS
AnnotatorAnnotator
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
LanguageWare Resource Workbench
Annotator: Descriptor,Class(es), language resource(s)
UIM
A-
compliant
applica
tion
UIM
A-
compliant
applica
tion
XML: Taxonomy definition, dictionary data,pointers to training data, rules, …
XML: Updated taxonomy definition, dictionary data, analysisresults, change reports,recommendations, …
Any A
pplicationA
ny Application
LanguageWare Resource Workbench LanguageWare Resource Workbench
Common Building Blocks, i.e. Template AnnotatorsCommon Building Blocks, i.e. Template Annotators
Machine Learning and Results Visualization, Manipulation, and Verification
Machine Learning and Results Visualization, Manipulation, and Verification
Review and modify annotation
results
Review and modify annotation
results
Measurements & Statistics
Measurements & Statistics
Develop rules, regular expressions,
templates, …
Develop rules, regular expressions,
templates, …
Develop / import domain resources, rules, models, …
Develop / import domain resources, rules, models, …
Search Engine InterfaceData collection, and analysis verification & test
Search Engine InterfaceData collection, and analysis verification & test
Lexical AnalysisLexical
Analysis Named Entity Recognition
Named Entity Recognition POS
TaggingPOS
TaggingDocument
ClassificationDocument
Classification
Generate Reports
Generate Reports
ParsingParsing
Semantic Analysis &
Disambiguation
Semantic Analysis &
Disambiguation
Relationship Extraction
Relationship Extraction
Deploy
Validate
Analyse
Create/Modify
Build
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
Sample Processing Model
Lexical AnalyzerTo spot concept mentions
Layered lexico-semantic resources – linking morphology, lexical entries, concepts & relationships
Language Resources
Concept NavigationAPI for navigation through the network of concepts
Con
cept
sText
Nav
igat
ion
ApplicationApplicationSpot term mentions, analyze distribution of concepts, disambiguate, find focus, …
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
IBM LanguageWare
© 2005 IBM Corporation – All Rights Reserved –
Contacts
LanguageWare External Download on Alphaworks http://www.alphaworks.ibm.com/tech/lrw
LanguageWare Wikipediahttp://en.wikipedia.org/wiki/Languageware
LanguageWare Senior Research & Development [email protected]