Transcript
© 2015 black swan
bla
cksw
an
1
Analytic SEARCH
…to find better answers faster.
Klaus Katerklaus.kater@blaswa.com
II-SDV 2015, Nice
© 2015 black swan
bla
cksw
an
2
Surface Web
Deep Web
CorporateResources
Analytic SEARCH
• Targeted Crawling: Surface Web, Deep Web and Corporate Data
• Analyzing/Aggregating: Link content from different data sources
• Scraping: Extract structured data from unstructured information
© 2015 black swan
bla
cksw
an
3
SEARCHCORPUS®
Surface Web
Deep Web
CorporateResources
in some Scientific PublicationMatch: some scientific term
Search Hit:Referenced documentor database entry.
• Indexing crawled documents and extracted data
• Content AND context uniquely identify a document
• Documents annotated with context build a SEARCHCORPUS®
© 2015 black swan
bla
cksw
an
4
P3 – Use Cases
• Pull: Viewers used as SEARCH interface in any application
• Pimp: Extend existing data with crawled information
• Push: Profile driven notification of events (Email, …)
pull
push
pimp
© 2015 black swan
bla
cksw
an
5
Analytic SEARCHFeatures:
Modeling / Documentation•Graphical filter chain designer•Filter chain template collection•Rich text documentation of project•Explanation component for SERACHCORPUS®•Multidimensional SEARCHCORPUS®
•Logging module for job execution
Analysis / Data Quality•Full text search
(Fuzzy, regular expressions, proximity, Levensthein, booleanoperators)
•Content detection (remove advertising, etc.)•Text extractors for common formats•Archive extractors•White list / black list include / exclude rules•Web crawler
(per seed url parameter set and behavior rules)
•Data extraction, HTML/XML scraping•Graphical regular expression designer•Text operations (cleansing)•Term frequency models•Pattern matching•Job history / log files (3 log levels per filter)
•Executive jobs for sequential job executionAdministration / Portal•User rights administration•Job scheduler / dashboard•User monitoring•External viewer / Webservice permission management
Implementation / Deployment•Plugin mechanism for filter types•Filter type developers API•Plugin mechanism for container types•Container type developers API•Plugin mechanism for viewers•Web based multi user environment•Parallel job execution•JAVA application server•Parallel and distributed deployment•Hadoop and map/reduce components
© 2015 black swan
bla
cksw
an
10
ProposalAnalytic SEARCH on multidimensional SEARCHCORPORA1) that combine structured and unstructured data from multiple sources like
…to find better quality answers faster.
• medical records• drug databases• scientific publications
• corporate financial filings• news tickers• company websites• publications• social media• blogs• data feeds
• uncounted internal datasources that currently cannot be searched at all...
© 2015 black swan
bla
cksw
an
11
Contact:
Klaus Kater
Email:
Website:
klaus.kater@blaswa.com
http://www.blaswa.com
Phone: +49 7441 5203720
Contact
top related