• HIGH PERFORMANCE PROCESSING OF DOCUMENT COLLECTIONS • SERVER BACK-END FOR INTELLIGENT KNOWLEDGE MINING OCMiner® is OntoChem’s high performance text analysis and data mining tool box. It is designed to meet the specific needs of our clients instead of providing a one-size-fits-all solution. High quality and performance are achieved by straightforward implementation of tailor made and modular products for information retrieval and display of medium up to very large scale data sources and document collections. OCMiner® is used by small and large life science companies to automatically index, analyze and search internal or external data collections, extracting product related knowledge and supporting the development of novel products by transitive knowledge discovery. OC MINER® chem onto
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
• HIGH PERFORMANCE PROCESSING OF DOCUMENT COLLECTIONS
• SERVER BACK-END FOR INTELLIGENT KNOWLEDGE MINING
OCMiner® is OntoChem’s high performance text analysis and data mining tool box. It is designed to meet the specifi c needs of our clients
instead of providing a one-size-fi ts-all solution. High quality and performance are achieved by straightforward implementation of tailor made
and modular products for information retrieval and display of medium up to very large scale data sources and document collections.
OCMiner® is used by small and large life science companies to automatically index, analyze and search internal or external data collections,
extracting product related knowledge and supporting the development of novel products by transitive knowledge discovery.
OC MINER®
chemonto
TECHNOLOGY
OCMiner® is a modular processing pipeline for unstructured
information based on the Apache UIMA framework. Custom
data mining is implemented by integrating any number
of different tool box modules into a pipeline that produces
the desired output. Tunable modules to select from consist
of a broad range of different readers, analysis engines or
consumers that may perform tasks in parallel on multiprocessor
machines and even distributed over several computers.
Readers are reading data from a variety of sources,
standardizing the input for further analysis:
● Document readers for offi ce documents and many
other fi le formats
● Extended support for XML and PDF documents
● Database readers allow direct access to relational
databases, ontologies or document management
systems (DMS)
PRODUCT FEATURES
INPUT
● Fast and scalable processing of large content
sources like fi le collections or databases
● Offi ce documents and many other fi le formats with
extended support for XML and PDF documents
MODULES
● Document structure, sentence and language
recognition
● Annotation of named entities
◦ Small or very large controlled vocabularies,
taxonomies, multi-faceted ontologies,
meta-ontologies in any format – e.g. OBO,
OWL, SKOS, CSV, …
◦ Specialized unique ontologies such as chemistry
and proteins or genes
◦ Resolution of abbreviations, acronyms,
homonyms and anaphora
◦ Intelligent treatment of word forms and
special characters
● Relationship extraction using syntax rule based
shallow or deep parsing
OUTPUT
● Annotated content, search results or extracted
knowledge as fi les or databases
● Browser based search and display interfaces
● Data analysis and graphical representation of
complex relationships
● API - local or web-based
OCMiner® allows to extract complex relationships – for example between compounds (here phlorizin) and species or diseases. The strength of the relationship is shown by the size of the found concept.
Analysis engines work on the standardized information and
add further data:
● Recognition of document structure such as headlines,
paragraphs, sentences, as well as specifi c document
section types, for example title, abstract, authors,
keywords, abbreviation lists and references section.
● Dictionary based named entity (NE) recognition is a
high performance dictionary look-up technology with
support for very large dictionaries (>100 Mill. entries).
It implements specifi c language and dictionary
dependent treatment options such as:
◦ Adaptable to recognize spelling variations
- Spaces/hyphens (e.g. “HIV-1” or “HIV1” or “HIV-I”),
- Handling of letters with umlaut or other diacritics
(e.g. “Sögrens disease” → “Soegrens disease”)
- British/American English (e.g. “behaviour” → “behavior”)