CI Information Hub - SAP · Extraction and Conversion ... •Charting Access content ... • Role-based views focus on information relevant to a particular job function.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Iknow is a registered trademark of Iknow LLC. This trademark may not be used in any manner without prior written consent from Iknow LLC. Allother company and product names may be trademarks of their respective companies.
Iknow LLC100 Overlook Center, 2nd FloorPrinceton, New Jersey 08540-7814
T: (609) 419-0500F: (609) 419-0715
www.iknow.us.com
CI InformationHub
Incorporating TextAnalysis into Businessand CompetitiveIntelligence
The pharmaceutical business is unlike any other. Our goal is not entertainment,enjoyment or prosperity. It is the health of patients. PhRMA member companiesare devoted to applying biomedical innovation to create new medicines that willenhance or save the lives of patients around the world.
Being in the healthcare business brings awesome responsibilities. Every day, ourmember companies face difficult, fundamental questions. The answers to thosequestions profoundly affect patients’ lives. Which diseases should we study? Howcan we best advance research? Where is the balance between risk and benefit?
William C. Weldon, Immediate Past Chairman
Pharmaceutical Research and Manufacturers of America (PhRMA)
Document management, content management,business process management (BPM) and workflowautomation, corporate and enterprise portals, businessand competitive intelligence, distance learning/e-Learning, collaboration and groupware, digital assetmanagement, text analysis, taxonomy and metadatamanagement, search, and other new and emerginginformation technologies.
Iknow helps companies leverage and transform theirintellectual assets into sustainable “knowledge-based”competitive advantage.
Business management and technology consulting firmfocused in the knowledge management (KM) domain.
Keyword-based technologies have no understanding of the real content of thedocuments.
For example, this paragraph:
“The study, of nearly 14,000 U.S. adults, found that higher blood levelsof selenium were linked to a lower risk of death over 12 years, at which point therisk appears to increase. The findings, published in the Archives of InternalMedicine, support earlier studies linking selenium to lower risks of prostate, lungand colon cancers.”
And this paragraph:
“12 14,000 a adults and appears Archives at blood cancers colon deathearlier findings, found higher in increase Internal levels linked linking lower lowerlung Medicine nearly of of of of of over point prostate published risk risk risksselenium selenium studies study support that The the The the to to to U.S. werewhich years”
1. Entity Extraction. Entity extraction (also known as Named Entity Recognition(NER) and Entity Identification (EI)) seeks to identify and classify atomicelements in the text, called “entities”, into predefined categories.
2. Taxonomy. A taxonomy is a subject-based classification that arranges theterms in a controlled vocabulary into a hierarchy. The value of a taxonomy isthat it allows related terms to be grouped together and categorized in waysthat make it easier to find the correct term to use whether for searching or todescribe an object.
3. Categorization/Clustering. Automatic categorization is the process in whichideas and objects are recognized, differentiated and grouped into categories bya computer program. Ideally, a category illuminates a relationship between thesubjects and objects.
4. Summarization. Automatic summarization is the creation of a shortenedversion of a text by a computer program. The output of this process containsthe most important points of the original text. Summarization systems are ableto create both query relevant text summaries and generic machine-generatedsummaries.
• Uses sophisticated natural language processing. Combines lexicons with patternrecognition; based on deep understanding of language.
• Reads text documents in more than 220 file formats and in more than 30 majorlanguages.
• Entity extraction analyzes the full text of documents, clustering results bypeople, places, organizations, concepts, and more. More than 35 pre-defined(out-of-the-box) entity types are available.
• Customize the system to cluster by industry-standard or company-specifictaxonomies
• Workbench to create and test custom entities, relations, events, and taxonomies(using a hybrid learn-by-example and rules-based approach).
• Useful for answering complex questions, e.g.,
What companies are mentioned in conjunction with mine?
What relevant M&A activities have occurred in the last week?
What concepts are most commonly associated with my company in thesenews articles?
What issues are my customers complaining about?
Raritan Technologies Search Integration Framework
Raritan’s Search Integration Framework is a software toolkit that is used toquickly develop feature-rich search applications.
• Reusable software components
• Pre-built connectors and adapters
Benefits from using the Framework are:
• Easily integrate software products from many vendors
In today’s demonstration, we are using selected sections from the following fourtaxonomies.
Medical Taxonomies
ClinicalTrials.gov. Diseases taxonomy - 23 major classifications of diseasesand conditions. Diseases hierarchy contains over 4,000 nodes and 100,000rules.
National Library of Medicine’s Medical Subject Headings (MeSH).Comprehensive medical taxonomy contains over 300,000 nodes and rules.
Business Taxonomies
Library of Congress Subject Headings (LCSH). Class H (Social Sciencessection) provides Business, Finance, Law and Sales and Marketingtaxonomies.
Factiva Company Taxonomy (for selected pharmaceutical companies).
Note: Custom taxonomies can also be developed to match your company’s uniquebusinesses (e.g., molecules, drug names, product/brand names).