> design > publish > search! http://www.spinque.com/ How to Search Annotated Text by Strategy? Roberto Cornacchia Wouter Alink Arjen P. De Vries Spinque B.V. CLIN 2013, 18 January 2013
Jun 27, 2015
> design > publish > search!
http://www.spinque.com/
How to Search Annotated Text by Strategy?
Roberto CornacchiaWouter Alink
Arjen P. De Vries
Spinque B.V.
CLIN 2013, 18 January 2013
> design > publish > search!
Search by Strategy
Design the way you would like to search
● A search engine design framework
● Custom search engines built from “Strategies”, which:● are designed as graphs● abstract data processing● combine different data sources● incorporate probabilistic reasoning● translate to database queries
http://www.spinque.com/
3
> design > publish > search!
Search by Strategy
All housesAll houses
Rank full-text
Rank full-text
Query termsQuery terms
Rankon location
Rankon location
DifferenceDifference
Selecton attribute
Selecton attribute
UnionUnion
Rankon location
Rankon location
Crime mapCrime map
Don't try and program the ultimate search engine
Design a number of domain-specific search strategies
Click. Generate Web search engines on probabilistic DB
4
> design > publish > search!
Multiple domains, custom UIs
5
> design > publish > search!
Multiple domains, custom UIs
6
> design > publish > search!
Multiple domains, custom UIs
7
> design > publish > search!
Multiple domains, custom UIs
8
> design > publish > search!
Strategy Editor
9
> design > publish > search!
Not only "documents"
10
> design > publish > search!
What's in the DB?
term obj freq
t0
o3
0.03
t0
o5
0.21
t1
o2
0.08
subj pred / attr obj / val p
Roberto speaks_to You 0.95
You listen_to Roberto 0.6
speech minutes 15 0.8
obj f1
... fN
o0
0.12 ... 0.84
o1
0.54 ... 0
o2
0.23 ... 0.31
obj pre size level
o0
100 50 0
o1
110 20 1
o2
144 16 2
Full-text search Annotation search
Feature-vectors (CBIR, SVM) Hierarchical search
11
> design > publish > search!
Choose hot topics from (kid-)news
http://www.opstel.eu
Rank on date Expand
Extract terms
Kid news
12
> design > publish > search!
Use POS annotations
Text
Annotated text: we are interested in NPs
<abstract date="2013-01-15"> Lilly de pitbull is een held. De hond uit de Amerikaanse staat Massachusetts heeft …</abstract>
<abstract date="2013-01-15"> <NP>Lilly de pitbull</NP> is <NP>een held</NP>. <NP>De hond uit de Amerikaanse staat Massachusetts</NP> heeft …</abstract>
13
> design > publish > search!
"Lilly de held" on Alpino
14
> design > publish > search!
Choose hot topics from (kid-)news
http://www.opstel.eu
Rank on dateExpand
Top terms
Kid news
Top NPs
15
> design > publish > search!
Topic suggestion for kids
http://www.opstel.eu
16
> design > publish > search!
Topic suggestion for kids
Data: Wikipedia, magazines for children, ..
Left branch: rank data sources on annotations, e.g.: Most seen content – hot topics Seen during night-time? Probably not for kids
Right branch: query expansion using recent (hot) content
Can we improve this by adding.. ? Text reading level (machine learning) Handle spelling mistakes in query expansion Syntactic dependencies
17
> design > publish > search!
Example: syntactic dependencies
AEGIR dependency parser for English (Koster et al.)
Parses text, outputs dependency triples "PGs prevent the mucosal damage .. "
[PG,SUBJ,prevent][prevent,OBJ,damage][damage,ATTR,mucosal]
...
CLEFIP 2011: Combining document representations for prior-art retrieval, Eva D'hondt, Suzan Verberne, Wouter Alink, Roberto Cornacchia
18
> design > publish > search!
Prior art search.Designed by Eva D'hondt, Nijmegen
19
> design > publish > search!
Find patents containing similar triples
20
> design > publish > search!
Recap
All housesAll houses
Rank full-text
Rank full-text
Query termsQuery terms
Rankon location
Rankon location
DifferenceDifference
Selecton attribute
Selecton attribute
UnionUnion
Rankon location
Rankon location
Crime mapCrime map
Strategies encapsulatedomain expert knowledge(how to find)
Strategies abstract awaysearch expert knowledge(how to search)
Strategies facilitate knowledge management
Store / share / publish / refine
Minimise the effort needed to design/update complex domain-specific search engines
YOU can easily experiment with (new) data representations, ranking formulas,
annotations, etc.
21
> design > publish > search!
Thank you
www.spinque.com