CI Information Hub - SAP · Extraction and Conversion ... •Charting Access content ... • Role-based views focus on information relevant to a particular job function.
Post on 17-Jun-2018
213 Views
Preview:
Transcript
Copyright © 2008 by Iknow LLC. This document may not be copied, modified, reproduced, republished, transmitted, posted, or distributed in anyform without prior written permission from Iknow. All rights not expressly granted herein are reserved. Unauthorized use of this material mayviolate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.
Iknow is a registered trademark of Iknow LLC. This trademark may not be used in any manner without prior written consent from Iknow LLC. Allother company and product names may be trademarks of their respective companies.
Iknow LLC100 Overlook Center, 2nd FloorPrinceton, New Jersey 08540-7814
T: (609) 419-0500F: (609) 419-0715
www.iknow.us.com
CI InformationHub
Incorporating TextAnalysis into Businessand CompetitiveIntelligence
March 12, 2008
Pharmaceutical Industry – Today’s Context
The pharmaceutical business is unlike any other. Our goal is not entertainment,enjoyment or prosperity. It is the health of patients. PhRMA member companiesare devoted to applying biomedical innovation to create new medicines that willenhance or save the lives of patients around the world.
Being in the healthcare business brings awesome responsibilities. Every day, ourmember companies face difficult, fundamental questions. The answers to thosequestions profoundly affect patients’ lives. Which diseases should we study? Howcan we best advance research? Where is the balance between risk and benefit?
William C. Weldon, Immediate Past Chairman
Pharmaceutical Research and Manufacturers of America (PhRMA)
2Copyright © 2008 by Iknow LLC.
Four Categories of Information
Structured Unstructured
Lists(Raw)
Analysis
Our focus today is on unstructured information
Structured Unstructured
Lists(Raw)
Analysis
Some Examples of Corporate Knowledge Assets
5Copyright © 2008 by Iknow LLC.
Our focus today is on unstructured information
Structured Unstructured
Lists(Raw)
Analysis
Today’s Agenda
CI Information Hub
I. Competitive Intelligence (CI) Business ProcessInformation DiscoveryAnalysis and SynthesisDelivery
II. Text AnalysisEntity ExtractionTaxonomyCategorization / ClusteringSummarization
III. CI Information Hub DemonstrationText AnalysisSearch
Summary
Q&A
7Copyright © 2008 by Iknow LLC.
8Copyright © 2008 by Iknow LLC. Company confidential materials. Do not copy or distribute.
About Iknow
Princeton, New Jersey, USA
2001
Document management, content management,business process management (BPM) and workflowautomation, corporate and enterprise portals, businessand competitive intelligence, distance learning/e-Learning, collaboration and groupware, digital assetmanagement, text analysis, taxonomy and metadatamanagement, search, and other new and emerginginformation technologies.
Iknow helps companies leverage and transform theirintellectual assets into sustainable “knowledge-based”competitive advantage.
Business management and technology consulting firmfocused in the knowledge management (KM) domain.
Iknow LLC
CorporateHeadquarters
Year Founded
Expertise
Description
Legal Name
Mission
Iknow’s Technology Expertise
9Copyright © 2008 by Iknow LLC.
Knowledge Value ChainSource: The Knowledge Agency © 2007
10Copyright © 2008 by Iknow LLC.
Competitive Intelligence Business Process
11
Discovery
Stage 1 Stage 2 Stage 3
Relevant content isidentified, collected,and converted it into“documents”.
The document’smetadata isgenerated and thedocument iscategorized andindexed.
Input documents areanalyzed andsynthesized and a CIreport is created.
This stage typicallyinvolves:• Analysis tools• Collaboration• Workflow routing
for review andapproval
• Publishing the finalCI report
Analysis andSynthesis
CI reports and otherfiltered content aremade available toend-users throughboth “push” and“pull” deliveryapproaches.• “Push” approaches
deliver informationthrough content-based notification.
• “Pull” approachesinvolve user-directed search.
Delivery
Section I
12
Source Identification and CI Report Template Creation
Information SourcesKey
IntelligenceTopics
Source Analysis
• Internet websites, includingblogs and wikis
• Corporate intranet, includingshared drives, eRooms, andother collaboration
• News feeds
• Subscription services
• E-mails
• Internal reports
• Purchased (external) reports
• Internal databases
• Purchased (external)databases
• Other field intelligence (e.g.,a valuable comment from asupplier)
CI ReportTemplates
CI ReportAnalysis
13
InformationSources
Document Collection,Extraction andConversion
Stage 1: Information Discovery
DocumentPreprocessing
Automatic Tasks:
• Generate and/orextract documentmetadata
• Categorize the inputdocument using oneor more approaches
Manual Tasks:
•Remove duplicate ornon-relevantdocuments
•Fill in missingmetadata andtaxonomic entries
•Edit incorrect entries
ValidatedInput
Documents
Example of Document Metadata
14Copyright © 2008 by Iknow LLC.
15
Manual Tasks
Approved andPublished CI
Reports(Output
Document)
ValidatedInput
Documents
•Search
•Analysis
•Charting
Access contentfrom a variety ofinput documents
Route reports forreview andapproval
Analyze content
Synthesizecontent
Produce draftreports
Perform review
Load approvedreports intodocumentrepository
Stage 2: Information Analysis and Synthesis
CI ReportTemplates
•Graphing
•Mapping
•Visualization
•Structuring
•Workflow
Support Tools
16Copyright © 2007 by Iknow LLC. Company confidential materials. Do not copy or distribute.
Notificationand Routing
Customerpreferences
Personalization
E-mailcampaignmanagement
“Push” deliverybased on anindividualizedpersonal profile.
Search
Knowledgerepositories
Databases
Indexes “Pull” deliverybased on queries.
Approved andPublished CI
Reports
ValidatedInput
Documents
Stage 3: Information Delivery
ContentManagement
System
Enabling Technologies
17Copyright © 2008 by Iknow LLC.
Content Management
Workflow Automation
Portal / Dashboard
Collaboration
Search
Text Analysis
Analytical Tools
Discovery
Stage 1 Stage 2 Stage 3
Analysis andSynthesis
Delivery
X X X
X X X
X X X
X X X
X
X X
X X
Intelligence Analysts End Users
Text Analysis
18
Discovery
Stage 1 Stage 2 Stage 3
Relevant content isidentified, collected,and converted it into“documents”.
The document’smetadata isgenerated and thedocument iscategorized andindexed.
Input documents areanalyzed andsynthesized and a CIreport is created.
This stage typicallyinvolves:• Analysis tools• Collaboration• Workflow routing
for review andapproval
• Publishing the finalCI report
Analysis andSynthesis
CI reports and otherfiltered content aremade available toend-users throughboth “push” and“pull” deliveryapproaches.• “Push” approaches
deliver informationthrough content-based notification.
• “Pull” approachesinvolve user-directed search.
Delivery
Section II
Keyword-Based Technologies
Keyword-based technologies have no understanding of the real content of thedocuments.
For example, this paragraph:
“The study, of nearly 14,000 U.S. adults, found that higher blood levelsof selenium were linked to a lower risk of death over 12 years, at which point therisk appears to increase. The findings, published in the Archives of InternalMedicine, support earlier studies linking selenium to lower risks of prostate, lungand colon cancers.”
And this paragraph:
“12 14,000 a adults and appears Archives at blood cancers colon deathearlier findings, found higher in increase Internal levels linked linking lower lowerlung Medicine nearly of of of of of over point prostate published risk risk risksselenium selenium studies study support that The the The the to to to U.S. werewhich years”
Are considered to be the SAME!!
19Copyright © 2008 by Iknow LLC.
Text Analysis – Four Concepts
1. Entity Extraction. Entity extraction (also known as Named Entity Recognition(NER) and Entity Identification (EI)) seeks to identify and classify atomicelements in the text, called “entities”, into predefined categories.
Examples of Common Entity Types
• Who: People, Positions, Social Security Numbers
• What: Companies, Organizations, Financial Indexes, Products (software,weapons, vehicles…)
• When: Dates, Days, Holidays, Months, Years, Times, Time Periods
• Where: Addresses, Cities, States, Countries, Facilities (stadiums, plants),Internet Addresses, Phone Numbers
• How Much: Currencies, Measures, Percentages
• Concepts (Global piracy, unstructured data…)
• Relations & Events (Person-Organization, Travel, M&A…)
• Categories (business, terrorism…)
20Copyright © 2008 by Iknow LLC.
Text Analysis – Four Concepts (continued)
2. Taxonomy. A taxonomy is a subject-based classification that arranges theterms in a controlled vocabulary into a hierarchy. The value of a taxonomy isthat it allows related terms to be grouped together and categorized in waysthat make it easier to find the correct term to use whether for searching or todescribe an object.
Example from MeSH
All MeSH Categories
Diseases Category
Digestive System Diseases
Liver Diseases
Hepatitis
Hepatitis, Alcoholic
Hepatitis, Animal
Hepatitis, Viral, Animal +
Hepatitis, Chronic
Hepatitis, Autoimmune
……
21Copyright © 2008 by Iknow LLC.
Text Analysis – Four Concepts (continued)
3. Categorization/Clustering. Automatic categorization is the process in whichideas and objects are recognized, differentiated and grouped into categories bya computer program. Ideally, a category illuminates a relationship between thesubjects and objects.
4. Summarization. Automatic summarization is the creation of a shortenedversion of a text by a computer program. The output of this process containsthe most important points of the original text. Summarization systems are ableto create both query relevant text summaries and generic machine-generatedsummaries.
22Copyright © 2008 by Iknow LLC.
Search Results Page with Text Analysis
Dynamic Summarization and Top Mentions
Entity Extraction and Clustering
CI Information Hub Demonstration
Today, we will demonstrate how a few steps in the Discovery Stage can beautomated.
Software Products
• Business Objects Text Analysis
• Raritan Technologies (search integration framework)
• Autonomy K2 (enterprise search)
Content Sources
• Factiva
• Websites
• RSS Feeds
26Copyright © 2008 by Iknow LLC.
Section III
Business Objects Text Analysis
• Uses sophisticated natural language processing. Combines lexicons with patternrecognition; based on deep understanding of language.
• Reads text documents in more than 220 file formats and in more than 30 majorlanguages.
• Entity extraction analyzes the full text of documents, clustering results bypeople, places, organizations, concepts, and more. More than 35 pre-defined(out-of-the-box) entity types are available.
• Customize the system to cluster by industry-standard or company-specifictaxonomies
• Workbench to create and test custom entities, relations, events, and taxonomies(using a hybrid learn-by-example and rules-based approach).
• Useful for answering complex questions, e.g.,
What companies are mentioned in conjunction with mine?
What relevant M&A activities have occurred in the last week?
What concepts are most commonly associated with my company in thesenews articles?
What issues are my customers complaining about?
Raritan Technologies Search Integration Framework
Raritan’s Search Integration Framework is a software toolkit that is used toquickly develop feature-rich search applications.
• Reusable software components
• Pre-built connectors and adapters
Benefits from using the Framework are:
• Easily integrate software products from many vendors
• Easily build customer-specific search applications
• Easily connect to hundreds of data sources, data bases, and applications
28Copyright © 2008 by Iknow LLC.
Taxonomies
In today’s demonstration, we are using selected sections from the following fourtaxonomies.
Medical Taxonomies
ClinicalTrials.gov. Diseases taxonomy - 23 major classifications of diseasesand conditions. Diseases hierarchy contains over 4,000 nodes and 100,000rules.
National Library of Medicine’s Medical Subject Headings (MeSH).Comprehensive medical taxonomy contains over 300,000 nodes and rules.
Business Taxonomies
Library of Congress Subject Headings (LCSH). Class H (Social Sciencessection) provides Business, Finance, Law and Sales and Marketingtaxonomies.
Factiva Company Taxonomy (for selected pharmaceutical companies).
Note: Custom taxonomies can also be developed to match your company’s uniquebusinesses (e.g., molecules, drug names, product/brand names).
29Copyright © 2008 by Iknow LLC.
Solution Highlights
• Advanced search and browse capabilities.
• Continuous update of information from selected web sites and online newssources.
• Data automatically classified by subject area.
• Document summaries that show you the "top things" contained within a fulltext search result
• Ability to save queries, set alerts, and to share queries within a work group.
• Advanced highlighting provides rapid insights into document relevance.
• Graphical reporting tools to provide instant visualization of key informationtrends.
• Role-based views focus on information relevant to a particular job function.
30Copyright © 2008 by Iknow LLC.
Other Possible Applications – Selected Examples
R&D Competitive Analysis
Marketing Sentiment Analysis, Buzz Tracking, Customer Intelligence,Competitive Analysis
Call Center Common Cause Diagnosis, Regional Diagnosis
Manufacturing Contract Analysis, Six Sigma Compliance, Warranty ClaimsAnalysis
Purchasing Supplier Analysis, Bid Tracking and Analysis
Finance Regulatory Compliance, Fraud Detection, Insurance ClaimsAnalysis
Legal e-Discovery
Enterprise Data Fusion
Summary
Today, we described an information hub for competitive intelligence.
• Reviewed the competitive intelligence business process
• Discussed how a variety of commercial-grade software products can beintegrated together for intelligence applications
• Showed how a variety of source content can be processed using text analysissoftware.
Benefits of this solution include:
Quality
• Incorporates “all” corporate information – multiple sources; multipleformats
• Text analytics helps make sense of what you're looking at
• Higher quality outputs and better business decision making
Time
• Shorten the cycle time of finding and analyzing information
Cost
• Significantly lower cost than manual processing
32Copyright © 2008 by Iknow LLC.
33Copyright © 2008 by Iknow LLC. Company confidential materials. Do not copy or distribute.
Contact Information
Mr. Barry FreindlichPresident(908) 668-8181 x10barryf@raritantechnologies.comwww.raritantechnologies.com
Dr. Bernard L. Palowitch, Jr.President(609) 419-0500bpalowitch@iknow.us.comwww.iknow.us.com
RaritanTechnologies
Iknow
top related