Top Banner
© 2015 black swan black swan 1 Analytic SEARCH …to find better answers faster. Klaus Kater [email protected] II-SDV 2015, Nice
13
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: II-SDV 2015, 20 - 21 April, in Nice

© 2015 black swan

bla

cksw

an

1

Analytic SEARCH

…to find better answers faster.

Klaus [email protected]

II-SDV 2015, Nice

Page 2: II-SDV 2015, 20 - 21 April, in Nice

© 2015 black swan

bla

cksw

an

2

Surface Web

Deep Web

CorporateResources

Analytic SEARCH

• Targeted Crawling: Surface Web, Deep Web and Corporate Data

• Analyzing/Aggregating: Link content from different data sources

• Scraping: Extract structured data from unstructured information

Page 3: II-SDV 2015, 20 - 21 April, in Nice

© 2015 black swan

bla

cksw

an

3

SEARCHCORPUS®

Surface Web

Deep Web

CorporateResources

in some Scientific PublicationMatch: some scientific term

Search Hit:Referenced documentor database entry.

• Indexing crawled documents and extracted data

• Content AND context uniquely identify a document

• Documents annotated with context build a SEARCHCORPUS®

Page 4: II-SDV 2015, 20 - 21 April, in Nice

© 2015 black swan

bla

cksw

an

4

P3 – Use Cases

• Pull: Viewers used as SEARCH interface in any application

• Pimp: Extend existing data with crawled information

• Push: Profile driven notification of events (Email, …)

pull

push

pimp

Page 5: II-SDV 2015, 20 - 21 April, in Nice

© 2015 black swan

bla

cksw

an

5

Analytic SEARCHFeatures:

Modeling / Documentation•Graphical filter chain designer•Filter chain template collection•Rich text documentation of project•Explanation component for SERACHCORPUS®•Multidimensional SEARCHCORPUS®

•Logging module for job execution

Analysis / Data Quality•Full text search

(Fuzzy, regular expressions, proximity, Levensthein, booleanoperators)

•Content detection (remove advertising, etc.)•Text extractors for common formats•Archive extractors•White list / black list include / exclude rules•Web crawler

(per seed url parameter set and behavior rules)

•Data extraction, HTML/XML scraping•Graphical regular expression designer•Text operations (cleansing)•Term frequency models•Pattern matching•Job history / log files (3 log levels per filter)

•Executive jobs for sequential job executionAdministration / Portal•User rights administration•Job scheduler / dashboard•User monitoring•External viewer / Webservice permission management

Implementation / Deployment•Plugin mechanism for filter types•Filter type developers API•Plugin mechanism for container types•Container type developers API•Plugin mechanism for viewers•Web based multi user environment•Parallel job execution•JAVA application server•Parallel and distributed deployment•Hadoop and map/reduce components

Page 6: II-SDV 2015, 20 - 21 April, in Nice

© 2015 black swan

bla

cksw

an

6

Impressions

Page 7: II-SDV 2015, 20 - 21 April, in Nice

© 2015 black swan

bla

cksw

an

7

Impressions

Page 8: II-SDV 2015, 20 - 21 April, in Nice

© 2015 black swan

bla

cksw

an

8

Impressions

Page 9: II-SDV 2015, 20 - 21 April, in Nice

© 2015 black swan

bla

cksw

an

9

Impressions

Page 10: II-SDV 2015, 20 - 21 April, in Nice

© 2015 black swan

bla

cksw

an

10

ProposalAnalytic SEARCH on multidimensional SEARCHCORPORA1) that combine structured and unstructured data from multiple sources like

…to find better quality answers faster.

• medical records• drug databases• scientific publications

• corporate financial filings• news tickers• company websites• publications• social media• blogs• data feeds

• uncounted internal datasources that currently cannot be searched at all...

Page 11: II-SDV 2015, 20 - 21 April, in Nice

© 2015 black swan

bla

cksw

an

11

Contact:

Klaus Kater

Email:

Website:

[email protected]

http://www.blaswa.com

Phone: +49 7441 5203720

Contact

Page 12: II-SDV 2015, 20 - 21 April, in Nice

© 2015 black swan

bla

cksw

an

12

Backup

Page 13: II-SDV 2015, 20 - 21 April, in Nice

© 2015 black swan

bla

cksw

an

13

TechnologyAnalytic SEARCH Product Architecure

Building Filter Chains

Scheduling Jobs andProcessing

Crawling for Information

MultidimensionalSEARCHCORPUS®

Visualization / Searching

Social Media Interfaces

Data Warehouses