CI Information Hub - SAP · Extraction and Conversion ... •Charting Access content ... • Role-based views focus on information relevant to a particular job function.

Post on 17-Jun-2018

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Copyright © 2008 by Iknow LLC. This document may not be copied, modified, reproduced, republished, transmitted, posted, or distributed in anyform without prior written permission from Iknow. All rights not expressly granted herein are reserved. Unauthorized use of this material mayviolate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

Iknow is a registered trademark of Iknow LLC. This trademark may not be used in any manner without prior written consent from Iknow LLC. Allother company and product names may be trademarks of their respective companies.

Iknow LLC100 Overlook Center, 2nd FloorPrinceton, New Jersey 08540-7814

T: (609) 419-0500F: (609) 419-0715

www.iknow.us.com

CI InformationHub

Incorporating TextAnalysis into Businessand CompetitiveIntelligence

March 12, 2008

Pharmaceutical Industry – Today’s Context

The pharmaceutical business is unlike any other. Our goal is not entertainment,enjoyment or prosperity. It is the health of patients. PhRMA member companiesare devoted to applying biomedical innovation to create new medicines that willenhance or save the lives of patients around the world.

Being in the healthcare business brings awesome responsibilities. Every day, ourmember companies face difficult, fundamental questions. The answers to thosequestions profoundly affect patients’ lives. Which diseases should we study? Howcan we best advance research? Where is the balance between risk and benefit?

William C. Weldon, Immediate Past Chairman

Pharmaceutical Research and Manufacturers of America (PhRMA)

2Copyright © 2008 by Iknow LLC.

Four Categories of Information

Structured Unstructured

Lists(Raw)

Analysis

Our focus today is on unstructured information

Structured Unstructured

Lists(Raw)

Analysis

Some Examples of Corporate Knowledge Assets

5Copyright © 2008 by Iknow LLC.

Our focus today is on unstructured information

Structured Unstructured

Lists(Raw)

Analysis

Today’s Agenda

CI Information Hub

I. Competitive Intelligence (CI) Business ProcessInformation DiscoveryAnalysis and SynthesisDelivery

II. Text AnalysisEntity ExtractionTaxonomyCategorization / ClusteringSummarization

III. CI Information Hub DemonstrationText AnalysisSearch

Summary

Q&A

7Copyright © 2008 by Iknow LLC.

8Copyright © 2008 by Iknow LLC. Company confidential materials. Do not copy or distribute.

About Iknow

Princeton, New Jersey, USA

2001

Document management, content management,business process management (BPM) and workflowautomation, corporate and enterprise portals, businessand competitive intelligence, distance learning/e-Learning, collaboration and groupware, digital assetmanagement, text analysis, taxonomy and metadatamanagement, search, and other new and emerginginformation technologies.

Iknow helps companies leverage and transform theirintellectual assets into sustainable “knowledge-based”competitive advantage.

Business management and technology consulting firmfocused in the knowledge management (KM) domain.

Iknow LLC

CorporateHeadquarters

Year Founded

Expertise

Description

Legal Name

Mission

Iknow’s Technology Expertise

9Copyright © 2008 by Iknow LLC.

Knowledge Value ChainSource: The Knowledge Agency © 2007

10Copyright © 2008 by Iknow LLC.

Competitive Intelligence Business Process

11

Discovery

Stage 1 Stage 2 Stage 3

Relevant content isidentified, collected,and converted it into“documents”.

The document’smetadata isgenerated and thedocument iscategorized andindexed.

Input documents areanalyzed andsynthesized and a CIreport is created.

This stage typicallyinvolves:• Analysis tools• Collaboration• Workflow routing

for review andapproval

• Publishing the finalCI report

Analysis andSynthesis

CI reports and otherfiltered content aremade available toend-users throughboth “push” and“pull” deliveryapproaches.• “Push” approaches

deliver informationthrough content-based notification.

• “Pull” approachesinvolve user-directed search.

Delivery

Section I

12

Source Identification and CI Report Template Creation

Information SourcesKey

IntelligenceTopics

Source Analysis

• Internet websites, includingblogs and wikis

• Corporate intranet, includingshared drives, eRooms, andother collaboration

• News feeds

• Subscription services

• E-mails

• Internal reports

• Purchased (external) reports

• Internal databases

• Purchased (external)databases

• Other field intelligence (e.g.,a valuable comment from asupplier)

CI ReportTemplates

CI ReportAnalysis

13

InformationSources

Document Collection,Extraction andConversion

Stage 1: Information Discovery

DocumentPreprocessing

Automatic Tasks:

• Generate and/orextract documentmetadata

• Categorize the inputdocument using oneor more approaches

Manual Tasks:

•Remove duplicate ornon-relevantdocuments

•Fill in missingmetadata andtaxonomic entries

•Edit incorrect entries

ValidatedInput

Documents

Example of Document Metadata

14Copyright © 2008 by Iknow LLC.

15

Manual Tasks

Approved andPublished CI

Reports(Output

Document)

ValidatedInput

Documents

•Search

•Analysis

•Charting

Access contentfrom a variety ofinput documents

Route reports forreview andapproval

Analyze content

Synthesizecontent

Produce draftreports

Perform review

Load approvedreports intodocumentrepository

Stage 2: Information Analysis and Synthesis

CI ReportTemplates

•Graphing

•Mapping

•Visualization

•Structuring

•Workflow

Support Tools

16Copyright © 2007 by Iknow LLC. Company confidential materials. Do not copy or distribute.

Notificationand Routing

Customerpreferences

Personalization

E-mailcampaignmanagement

“Push” deliverybased on anindividualizedpersonal profile.

Search

Knowledgerepositories

Databases

Indexes “Pull” deliverybased on queries.

Approved andPublished CI

Reports

ValidatedInput

Documents

Stage 3: Information Delivery

ContentManagement

System

Enabling Technologies

17Copyright © 2008 by Iknow LLC.

Content Management

Workflow Automation

Portal / Dashboard

Collaboration

Search

Text Analysis

Analytical Tools

Discovery

Stage 1 Stage 2 Stage 3

Analysis andSynthesis

Delivery

X X X

X X X

X X X

X X X

X

X X

X X

Intelligence Analysts End Users

Text Analysis

18

Discovery

Stage 1 Stage 2 Stage 3

Relevant content isidentified, collected,and converted it into“documents”.

The document’smetadata isgenerated and thedocument iscategorized andindexed.

Input documents areanalyzed andsynthesized and a CIreport is created.

This stage typicallyinvolves:• Analysis tools• Collaboration• Workflow routing

for review andapproval

• Publishing the finalCI report

Analysis andSynthesis

CI reports and otherfiltered content aremade available toend-users throughboth “push” and“pull” deliveryapproaches.• “Push” approaches

deliver informationthrough content-based notification.

• “Pull” approachesinvolve user-directed search.

Delivery

Section II

Keyword-Based Technologies

Keyword-based technologies have no understanding of the real content of thedocuments.

For example, this paragraph:

“The study, of nearly 14,000 U.S. adults, found that higher blood levelsof selenium were linked to a lower risk of death over 12 years, at which point therisk appears to increase. The findings, published in the Archives of InternalMedicine, support earlier studies linking selenium to lower risks of prostate, lungand colon cancers.”

And this paragraph:

“12 14,000 a adults and appears Archives at blood cancers colon deathearlier findings, found higher in increase Internal levels linked linking lower lowerlung Medicine nearly of of of of of over point prostate published risk risk risksselenium selenium studies study support that The the The the to to to U.S. werewhich years”

Are considered to be the SAME!!

19Copyright © 2008 by Iknow LLC.

Text Analysis – Four Concepts

1. Entity Extraction. Entity extraction (also known as Named Entity Recognition(NER) and Entity Identification (EI)) seeks to identify and classify atomicelements in the text, called “entities”, into predefined categories.

Examples of Common Entity Types

• Who: People, Positions, Social Security Numbers

• What: Companies, Organizations, Financial Indexes, Products (software,weapons, vehicles…)

• When: Dates, Days, Holidays, Months, Years, Times, Time Periods

• Where: Addresses, Cities, States, Countries, Facilities (stadiums, plants),Internet Addresses, Phone Numbers

• How Much: Currencies, Measures, Percentages

• Concepts (Global piracy, unstructured data…)

• Relations & Events (Person-Organization, Travel, M&A…)

• Categories (business, terrorism…)

20Copyright © 2008 by Iknow LLC.

Text Analysis – Four Concepts (continued)

2. Taxonomy. A taxonomy is a subject-based classification that arranges theterms in a controlled vocabulary into a hierarchy. The value of a taxonomy isthat it allows related terms to be grouped together and categorized in waysthat make it easier to find the correct term to use whether for searching or todescribe an object.

Example from MeSH

All MeSH Categories

Diseases Category

Digestive System Diseases

Liver Diseases

Hepatitis

Hepatitis, Alcoholic

Hepatitis, Animal

Hepatitis, Viral, Animal +

Hepatitis, Chronic

Hepatitis, Autoimmune

……

21Copyright © 2008 by Iknow LLC.

Text Analysis – Four Concepts (continued)

3. Categorization/Clustering. Automatic categorization is the process in whichideas and objects are recognized, differentiated and grouped into categories bya computer program. Ideally, a category illuminates a relationship between thesubjects and objects.

4. Summarization. Automatic summarization is the creation of a shortenedversion of a text by a computer program. The output of this process containsthe most important points of the original text. Summarization systems are ableto create both query relevant text summaries and generic machine-generatedsummaries.

22Copyright © 2008 by Iknow LLC.

Search Results Page with Text Analysis

Dynamic Summarization and Top Mentions

Entity Extraction and Clustering

CI Information Hub Demonstration

Today, we will demonstrate how a few steps in the Discovery Stage can beautomated.

Software Products

• Business Objects Text Analysis

• Raritan Technologies (search integration framework)

• Autonomy K2 (enterprise search)

Content Sources

• Factiva

• Websites

• RSS Feeds

26Copyright © 2008 by Iknow LLC.

Section III

Business Objects Text Analysis

• Uses sophisticated natural language processing. Combines lexicons with patternrecognition; based on deep understanding of language.

• Reads text documents in more than 220 file formats and in more than 30 majorlanguages.

• Entity extraction analyzes the full text of documents, clustering results bypeople, places, organizations, concepts, and more. More than 35 pre-defined(out-of-the-box) entity types are available.

• Customize the system to cluster by industry-standard or company-specifictaxonomies

• Workbench to create and test custom entities, relations, events, and taxonomies(using a hybrid learn-by-example and rules-based approach).

• Useful for answering complex questions, e.g.,

What companies are mentioned in conjunction with mine?

What relevant M&A activities have occurred in the last week?

What concepts are most commonly associated with my company in thesenews articles?

What issues are my customers complaining about?

Raritan Technologies Search Integration Framework

Raritan’s Search Integration Framework is a software toolkit that is used toquickly develop feature-rich search applications.

• Reusable software components

• Pre-built connectors and adapters

Benefits from using the Framework are:

• Easily integrate software products from many vendors

• Easily build customer-specific search applications

• Easily connect to hundreds of data sources, data bases, and applications

28Copyright © 2008 by Iknow LLC.

Taxonomies

In today’s demonstration, we are using selected sections from the following fourtaxonomies.

Medical Taxonomies

ClinicalTrials.gov. Diseases taxonomy - 23 major classifications of diseasesand conditions. Diseases hierarchy contains over 4,000 nodes and 100,000rules.

National Library of Medicine’s Medical Subject Headings (MeSH).Comprehensive medical taxonomy contains over 300,000 nodes and rules.

Business Taxonomies

Library of Congress Subject Headings (LCSH). Class H (Social Sciencessection) provides Business, Finance, Law and Sales and Marketingtaxonomies.

Factiva Company Taxonomy (for selected pharmaceutical companies).

Note: Custom taxonomies can also be developed to match your company’s uniquebusinesses (e.g., molecules, drug names, product/brand names).

29Copyright © 2008 by Iknow LLC.

Solution Highlights

• Advanced search and browse capabilities.

• Continuous update of information from selected web sites and online newssources.

• Data automatically classified by subject area.

• Document summaries that show you the "top things" contained within a fulltext search result

• Ability to save queries, set alerts, and to share queries within a work group.

• Advanced highlighting provides rapid insights into document relevance.

• Graphical reporting tools to provide instant visualization of key informationtrends.

• Role-based views focus on information relevant to a particular job function.

30Copyright © 2008 by Iknow LLC.

Other Possible Applications – Selected Examples

R&D Competitive Analysis

Marketing Sentiment Analysis, Buzz Tracking, Customer Intelligence,Competitive Analysis

Call Center Common Cause Diagnosis, Regional Diagnosis

Manufacturing Contract Analysis, Six Sigma Compliance, Warranty ClaimsAnalysis

Purchasing Supplier Analysis, Bid Tracking and Analysis

Finance Regulatory Compliance, Fraud Detection, Insurance ClaimsAnalysis

Legal e-Discovery

Enterprise Data Fusion

Summary

Today, we described an information hub for competitive intelligence.

• Reviewed the competitive intelligence business process

• Discussed how a variety of commercial-grade software products can beintegrated together for intelligence applications

• Showed how a variety of source content can be processed using text analysissoftware.

Benefits of this solution include:

Quality

• Incorporates “all” corporate information – multiple sources; multipleformats

• Text analytics helps make sense of what you're looking at

• Higher quality outputs and better business decision making

Time

• Shorten the cycle time of finding and analyzing information

Cost

• Significantly lower cost than manual processing

32Copyright © 2008 by Iknow LLC.

33Copyright © 2008 by Iknow LLC. Company confidential materials. Do not copy or distribute.

Contact Information

Mr. Barry FreindlichPresident(908) 668-8181 x10barryf@raritantechnologies.comwww.raritantechnologies.com

Dr. Bernard L. Palowitch, Jr.President(609) 419-0500bpalowitch@iknow.us.comwww.iknow.us.com

RaritanTechnologies

Iknow

top related