Top Banner
Search-As-You-Type in Forms: Hao Wu Supervised by Prof. Lizhu Zhou Database Research Group, Tsinghua University VLDB PhD Workshop – Sept. 13, Singapore Database Research Group Leveraging the Usability and the Functionality of Search Paradigm in Relational Databases
27
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Seaform Slides in VLDB 2010 PhD Workshop

Search-As-You-Type in Forms:

Hao WuSupervised by Prof. Lizhu Zhou

Database Research Group, Tsinghua University

VLDB PhD Workshop – Sept. 13, Singapore

DatabaseResearch

Group

Leveraging the Usability and the Functionalityof Search Paradigm in Relational Databases

Page 2: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 3: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 4: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

04/13/2023 4Hao Wu, DB Group, Tsinghua University

• Relational databases are widely used.• There are many search paradigms:

▪Structured Query Language (SQL)▪Keyword Search (KS)▪Query-By-Example (QBE)

• Different search paradigms are needed by different users.

Page 5: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

04/13/2023 5Hao Wu, DB Group, Tsinghua University

#1: SQL is complex.SELECT *FROM Author A, Autor_Paper AP, Paper PWHERE title LIKE 'keyword' AND title LIKE 'search' AND authors LIKE 'g%' AND A.id = AP.aid AND P.id = AP.pid

Page 6: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

04/13/2023 6Hao Wu, DB Group, Tsinghua University

Traditional keyword search is imprecise.

keyword search g

Title? Conf. name? Author name?

#2:

Page 7: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

04/13/2023 7Hao Wu, DB Group, Tsinghua University

#3: Form is awkward.

UCI Directory: http://directory.uci.edu/index.php?form_type=advanced_search

Page 8: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

04/13/2023 8Hao Wu, DB Group, Tsinghua University

The "Search" button is not convenient.

#4:

Page 9: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

04/13/2023 9Hao Wu, DB Group, Tsinghua University

+ Keyword Search+ Form-Style Interface+ Search-as-you-type

Seaform=

Page 10: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 11: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 12: Seaform Slides in VLDB 2010 PhD Workshop

04/13/2023 Hao Wu, DB Group, Tsinghua University 12

Problem Statement

• Data:▪Single relational table.▪Several searchable attributes.

ID Title Conf. Author

1 xml database VLDB albert

2 xml database SIGMOD bob

3 xml search VLDB albert

4 xml security VLDB alice

5 rdbms SIGMOD charlie

Page 13: Seaform Slides in VLDB 2010 PhD Workshop

04/13/2023 Hao Wu, DB Group, Tsinghua University 13

Problem Statement

• Query:▪A set of keywords (prefixes) split by fields.▪A focus indicator.

Author:

xmlTitle:

al

Focus = Author

Page 14: Seaform Slides in VLDB 2010 PhD Workshop

04/13/2023 Hao Wu, DB Group, Tsinghua University 14

Problem Statement

• Results:▪Global results: corresponding tuples.▪ Local results: corresponding attribute values.▪Aggregations.

Author:

xmlTitle:

albert 2alice 1

xml database (albert)xml search (albert)xml security (alice)al

Page 15: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 16: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 17: Seaform Slides in VLDB 2010 PhD Workshop

04/13/2023 Hao Wu, DB Group, Tsinghua University 17

Challenges: Search-As-You-Type

• Prefix matching:▪E.g. al albert, alice, …

Trie structure w/ cache.• Fast response:

▪Synchronization of local resultsand global results yields heavycomputational cost.On-demand synchronization and dual-list trie.

……

Φ

a

l

b

b

o

bi

……

Page 18: Seaform Slides in VLDB 2010 PhD Workshop

04/13/2023 Hao Wu, DB Group, Tsinghua University 18

Challenges: Error Tolerance

• Misplacing of keywords:▪ E.g. input "albert" into the Title input box.

Automatic query refinement (given a query, how can we modify it to obtain more results?)Large search space; rely on precise estimation and probabilistic model.

• Fuzzy matching:▪ E.g. input "albrt" instead of "albert".

Edit-distance computation on trie structure.Ranking issue of local results: should local results be sorted by edit-distance, or by aggregation values?

Page 19: Seaform Slides in VLDB 2010 PhD Workshop

04/13/2023 Hao Wu, DB Group, Tsinghua University 19

Challenges: Scalability

• Handle large-scale databases:▪ There are large number of tuples.

1) Top-k algorithmPrecise aggregation is impossible in this case.2) Using RDBMS itselfIndex structure should be redesigned for DBMS; performance issues.

• Handle multiple tables:▪ Data are regularized to several tables.

Generalize the single-table local-global computation and reduce on-the-fly joins using pre-joined tables.It is hard to determine which tables are the most necessary to pre-join; extra storage cost.

Page 20: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 21: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 22: Seaform Slides in VLDB 2010 PhD Workshop

Initial Achievements

04/13/2023 22Hao Wu, DB Group, Tsinghua University

Seaform-DBLP

Features:• Single table.• Prefix matching.• Average response time

is less than 30 ms.

Limitations:• Does not tolerate errors.• Non-top-k, i.e. it returns

all matching results.• Memory-resident.

Page 23: Seaform Slides in VLDB 2010 PhD Workshop

14:00 to 15:302 Sept. 14, Tuesday

14:00 to 15:305 Sept. 15, Wednesday

Demonstrations:

Page 24: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 25: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 26: Seaform Slides in VLDB 2010 PhD Workshop

04/13/2023 Hao Wu, DB Group, Tsinghua University 26

Conclusions

• Search-as-you-type with form is a good choice to balance the usability and functionality.

• There are still many problems to solve:▪ More effective index other than trie + inverted lists.▪ Support error tolerance.▪ Native DBMS support.▪ Top-k algorithms.▪ Pre-join (materialize) tables.▪ ...

Page 27: Seaform Slides in VLDB 2010 PhD Workshop

Thankshttp://tastier.cs.thu.edu.cn/seaform/

My homepage: http://dbgroup.cs.thu.edu.cn/wuhao/