Top Banner
IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare
24

IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

Dec 16, 2015

Download

Documents

Milo Collins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved

LanguageWare Introduction

Marie Wallace, IBM LanguageWare

Page 2: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

Strategy

Page 3: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

There are many challenges in maximizing the total ROI of natural language processing which LanguageWare addresses through it’s comprehensive “big picture” design & implementation

Explicit knowledge in documents, emails and other written forms.

Tacit knowledge located in the experience of individuals, networks and communities.

Embedded knowledge in work routines, practices and norms.

Morphological Analysis Lexical Analysis Parsing & Grammars Semantic Analysis & Disambiguation Statistical Processing

Knowledge Integration (mental semantic models) Social Computing (tagging, relationship extraction,

ontology derivation, …) Social Semantic Search & Discovery (w/

disambiguation)

Knowledge Integration (converting documents into business objects)

On-the-fly analysis (form filling, semi/automatic tagging, disambiguation, …)

Page 4: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM launched the LanguageWare project 2001 with the vision of creating a common NLP componentry that could be flexibly applied to a wide range of challenges across IBM’s entire product portfolio (entire information lifecycle)

CreateCreate StoreStore FindFind UnderstandUnderstand

Text Analytics can improve content consistency, and it’s associated meta-data, through semi-automatic analysis at content creation

Page 5: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

CreateCreate StoreStore FindFind UnderstandUnderstand

IBM launched the LanguageWare project 2001 with the vision of creating a common NLP componentry that could be flexibly applied to a wide range of challenges across IBM’s entire product portfolio (entire information lifecycle)

It can generate valuable meta-data which can be leveraged for subsequent analysis – integrate multiple sources of (un)structured information

Page 6: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

CreateCreate StoreStore FindFind UnderstandUnderstand

IBM launched the LanguageWare project 2001 with the vision of creating a common NLP componentry that could be flexibly applied to a wide range of challenges across IBM’s entire product portfolio (entire information lifecycle)

It can help enhance search experience through leveraging semantics, social networks, taxonomies & folksonomies, … to uncover knowledge hidden in the unstructured content

Page 7: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

It can support BI techniques and algorithms to extract actionable knowledge and insight from vast quantities of available information (structured & unstructured)

IBM launched the LanguageWare project 2001 with the vision of creating a common NLP componentry that could be flexibly applied to a wide range of challenges across IBM’s entire product portfolio (entire information lifecycle)

CreateCreate StoreStore FindFind UnderstandUnderstand

Page 8: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

To achieve a solution that addressed the diverse and conflicting requirements across divisions, brands, products, industries, and platforms, we needed to create something that was…

Ubiquitous & Flexible

– Leveraged comprehensively across all IBM divisions through simple-to-use integration packages satisfying any type of application

Enterprise-transforming & Enterprise-ready

– Combining strong engineering principles with latest research techniques

Highly Extensible & Customizable

– Applying an open extensible data-driven model which delivers a highly optimized industrial-strength runtime, with simple yet powerful customization tools for developing domain resources

Standards-based & Easily Accessible

– Leveraging open source technologies & standards, such as UIMA, and made freely available for evaluation & prototyping through Alphaworks

Page 9: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

Our philosophy was to create light-weight technology that could be easily embedded into any existing solution to provide natural language understanding transparently and unobtrusively for the end user

“Language Understanding” is personal & specific to industry sector, company, organization, function, person, and time …

The data that drives the discovery process is YOUR competitive advantage

– You need technology that can be enhanced with your data models to allow you to capture insights that your competitors can’t

Discovery is an integral part of all our lives and you want it integrated seamlessly into your entire information life-cycle – from creation to obsolescence (and beyond)

– You need technology that can seamlessly integrate the knowledge of your people – mapping personal models to analytics models

– The closer you move the analytics to the knowledge worker (allowing a feedback loop to harness knowledge) the higher quality analytics

Page 10: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

As a result of the successful execution of this strategy, LanguageWare is the most broadly used NLP technology across IBM…

Embedded into Lotus, WebSphere, DB2, and Rational products

Integrated into IBM’s hardware

Used internally by IBM’s CIO Office

Deployed in GBS and SWG services

Used within IBM Research

Used as part of European Research projects

And now licensed to end-customers

Page 11: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

Technology

Page 12: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

In the context of the present invention, a compound as described herein or pharmaceutical composition thereof can be

utilized for modulating the activity of RUP3 receptor mediated diseases, conditions and/or disorders as described herein.

Examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of

metabolic related disorders such as, but not limited to, type I diabetes, type II diabetes, inadequate glucose tolerance,

insulin resistance, hyperglycemia, hyperlipidemia, hypertriglyceridemia, hypercholesterolemia, dyslipidemia and

syndrome X. Other examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or

treatment of obesity and/or overweight by decreasing food intake, inducing satiation (i.e., the feeling of fullness),

controlling weight gain, decreasing body weight and/or affecting metabolism such that the recipient loses weight and/or

maintains weight.

In the context of the present invention, a compound as desribed herein or pharmaceutical composition thereof can be

utilized for modulating the activity of RUP3 receptor mediated diseases, conditions and/or disorders as described herein.

Examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of

metabolic related disorders such as, but not limited to, type I diabetes, type II diabetes, inadequate glucose tolerance,

insulin resistance, hyperglycemia, hyperlipidemia, hypertriglyceridemia, hypercholesterolemia, dyslipidemia and

syndrome X. Other examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or

treatment of obesity and/or overweight by decreasing food intake, inducing satiation (i.e., the feeling of fullness),

controlling weight gain, decreasing body weight and/or affecting metabolism such that the recipient loses weight and/or

maintains weight.

In the context of the present invention, a compound as desribed herein or pharmaceutical composition thereof can be

utilized for modulating the activity of RUP3 receptor mediated diseases, conditions and/or disorders as described herein.

Examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of

metabolic related disorders such as, but not limited to, type I diabetes, type II diabetes, inadequate glucose tolerance,

insulin resistance, hyperglycemia, hyperlipidemia, hypertriglyceridemia, hypercholesterolemia, dyslipidemia and

syndrome X. Other examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or

treatment of obesity and/or overweight by decreasing food intake, inducing satiation (i.e., the feeling of fullness),

controlling weight gain, decreasing body weight and/or affecting metabolism such that the recipient loses weight and/or

maintains weight.

In the context of the present invention, a compound as described herein or pharmaceutical composition thereof can be

utilized for modulating the activity of RUP3 receptor mediated diseases, conditions and/or disorders as described herein.

Examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or treatment of

metabolic related disorders such as, but not limited to, type I diabetes, type II diabetes, inadequate glucose tolerance,

insulin resistance, hyperglycemia, hyperlipidemia, hypertriglyceridemia, hypercholesterolemia, dyslipidemia and

syndrome X. Other examples of modulating the activity of RUP3 receptor mediated diseases include the prophylaxis or

treatment of obesity and/or overweight by decreasing food intake, inducing satiation (i.e., the feeling of fullness),

controlling weight gain, decreasing body weight and/or affecting metabolism such that the recipient loses weight and/or

maintains weight.

English

Language IdentificationLanguage

Identification SegmentationSegmentation NormalizationNormalizationClassification

Classification

Fuzzy matchingSpelling correction, approx.

lookup, hyphenation

C18.452.339.500.396C18.452.339.500.396

Disambiguation

Disambiguation

DiseaseDisease

Relationship Extraction

Relationship Extraction

Combination Therapy

Pharma Patent

Utilize, describe, modulate, …Utilize, describe, modulate, …

RulesRegular expressions, parsing, grammars

PharmaActionPharmaActionPharmaEffectPharmaEffect

Compound X <can be combined with>

Compound Y

Compound X <can be combined with>

Compound Y

Compound X<addresses>

Disease Y

Compound X<addresses>

Disease Y

C18.654.726.500C18.654.726.500

lipase inhibitors, such as

tetrahydrolipstatin

lipase inhibitors, such as

tetrahydrolipstatin

Obesity-relatedOrlistatOrlistat

running = noun (not verb)tank = vehicle (not container)

Page 13: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

This is a footer 04/18/23

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

Automatic tagging based on concept mentions

NETWORK OF CONCEPTS

TEXT

Mention Mention Mention Mention

Mapping mentions to concepts .

Finding “focus” concept

Page 14: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

Multidimensional disambiguation

TEXT

Mention Mention

Page 15: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

LanguageWare comprises a number of building blocks which combine together to deliver the capabilities

Runtime

The heart of the solution and provides the foundation on which most other capabilities are built

• It provides a language-agnostic text analyzer

• Analysis driven by data and logic encoded in our resources

• It allows us to rapidly analyze text, identify lexical units, normalize and classify those units

Page 16: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

LanguageWare comprises a number of building blocks which combine together to deliver the capabilities

Runtime

Resources

Lexico-semantic resources that drive the behavior of the system

• Optimized for size and performance

• Customizable for different domains

• A semantic layer modeled by a directed graph to represent the knowledge (network of concepts)

• Graph mining techniques for analysis of semantic networkMorphological description of words

Lexical entries to concepts

Concepts (with navigation)

Page 17: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

LanguageWare comprises a number of building blocks which combine together to deliver the capabilities

Runtime

Resources

UIMA Annotators

UIMA

The UIMA provides a framework for building text analytics applications, with annotators acting as the plug-ins which encode the processing logic.

Page 18: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

LanguageWare comprises a number of building blocks which combine together to deliver the capabilities

Runtime

Resources

UIMA Annotators

UIMA

Workbench

• Eclipse-based tooling for manipulation of language resources.

• Allows customers to easily develop new, or customize existing, language resources thereby modifying the behaviour of the annotators.

• Targets the non-developer – terminologist, taxonomist, domain specialist, …

• Supports several representations of domain knowledge – thesauri, taxonomies, ontologies, …

Page 19: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

LanguageWare comprises a number of building blocks which combine together to deliver the capabilities

Runtime

Resources

UIMA Annotators

UIMA

Workbench

Page 20: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

Customizable Domain

ResourcesResources RulesResources Resources

Rules & Seed list

LanguageWare Resource

Workbench

Annotators

UIMA (CAS) AnnotationsText

Lexical Analyzer

Language Classifier

POS Tagger

ParserSemantic Analyzer

LanguageWare Architecture / Processing Model

Software Libraries

Char handling, regex, … (ICU4J)

Core NLP (DJTJ)

aFSTuTaggerDLTLS

Page 21: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

UIMAUIMA

UIMA Pipeline

Crawled Documents

CAS

Collection Reader

Collection Reader

CAS

AnnotatorAnnotator

DB

Index

CAS Consumer

CAS Consumer

• Language Identification• Document Classification• Lexical Analyzer• POS Tagger• Parser• Semantic Analyzer

• Named-Entity Extraction• Relationship Extraction

CAS

AnnotatorAnnotator

CAS

AnnotatorAnnotator

Page 22: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

LanguageWare Resource Workbench

Annotator: Descriptor,Class(es), language resource(s)

UIM

A-

compliant

applica

tion

UIM

A-

compliant

applica

tion

XML: Taxonomy definition, dictionary data,pointers to training data, rules, …

XML: Updated taxonomy definition, dictionary data, analysisresults, change reports,recommendations, …

Any A

pplicationA

ny Application

LanguageWare Resource Workbench LanguageWare Resource Workbench

Common Building Blocks, i.e. Template AnnotatorsCommon Building Blocks, i.e. Template Annotators

Machine Learning and Results Visualization, Manipulation, and Verification

Machine Learning and Results Visualization, Manipulation, and Verification

Review and modify annotation

results

Review and modify annotation

results

Measurements & Statistics

Measurements & Statistics

Develop rules, regular expressions,

templates, …

Develop rules, regular expressions,

templates, …

Develop / import domain resources, rules, models, …

Develop / import domain resources, rules, models, …

Search Engine InterfaceData collection, and analysis verification & test

Search Engine InterfaceData collection, and analysis verification & test

Lexical AnalysisLexical

Analysis Named Entity Recognition

Named Entity Recognition POS

TaggingPOS

TaggingDocument

ClassificationDocument

Classification

Generate Reports

Generate Reports

ParsingParsing

Semantic Analysis &

Disambiguation

Semantic Analysis &

Disambiguation

Relationship Extraction

Relationship Extraction

Deploy

Validate

Analyse

Create/Modify

Build

Page 23: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

Sample Processing Model

Lexical AnalyzerTo spot concept mentions

Layered lexico-semantic resources – linking morphology, lexical entries, concepts & relationships

Language Resources

Concept NavigationAPI for navigation through the network of concepts

Con

cept

sText

Nav

igat

ion

ApplicationApplicationSpot term mentions, analyze distribution of concepts, disambiguate, find focus, …

Page 24: IBM LanguageWare © 2005 IBM Corporation – All Rights Reserved LanguageWare Introduction Marie Wallace, IBM LanguageWare.

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

IBM LanguageWare

© 2005 IBM Corporation – All Rights Reserved –

Contacts

LanguageWare External Download on Alphaworks http://www.alphaworks.ibm.com/tech/lrw

LanguageWare Wikipediahttp://en.wikipedia.org/wiki/Languageware

LanguageWare Senior Research & Development [email protected]