Top Banner
Fulltext engine for Non-Fulltext Queries Adrian Nuta // Sphinxsearch // 2013
29

Fulltext engine for non fulltext searches

Jul 13, 2015

Download

Technology

Adrian Nuta
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Fulltext engine for non fulltext searches

Fulltext engine for

Non-Fulltext Queries

Adrian Nuta // Sphinxsearch // 2013

Page 2: Fulltext engine for non fulltext searches

• Introduction

• Non-fulltext queries

• Special data columns

• Fulltext for speed-up non-fulltext

Page 3: Fulltext engine for non fulltext searches

Introduction

Page 4: Fulltext engine for non fulltext searches

What is Sphinx

• free, open-source, search server

• fast 700 qps /core / 1M docs

• flexible 100+ features

• scalableo 300 mil. q / day

o 50 TB data, 100+ boxes

Page 5: Fulltext engine for non fulltext searches

Fulltext Fields Attributes

Sphinx document

Doc

ID

Integer, Float, Bool, Timestamp, MVA,

String, JSON

...

● Inverted index

● indexed, not stored

● stored, not indexed

● held in memory or

on disk

Page 6: Fulltext engine for non fulltext searches

MySQL

Application

Sphinx

MySQL protocol

MySQL protocol

MySQL language

SphinxQL language

MySQL connector

MySQL connector

SELECT * FROM mytable WHERE ...

Meet SphinxQL

Page 7: Fulltext engine for non fulltext searches

Non-fulltext queries

Page 8: Fulltext engine for non fulltext searches

What Sphinx can do beside fulltext?

• usual WHERE, ORDER, GROUP BY

• GROUP BY custom extensions:

o WITHIN GROUP ORDER BY

o GROUP <N> BY

• Aggregation, timestamp,math functions

• Comparasion functions: IF(), INTERVAL(), IN()

• Geo spatial: GEODIST(), GEOPOLY2D()

Page 9: Fulltext engine for non fulltext searches

WITHIN GROUP ORDER BYmysql> SELECT *,DAY(added) as today FROM facetdemo WHERE property2 = 160 AND today =26 GROUP BY brand_id WITHIN GROUP ORDER BY

price ASC ORDER BY brand_id ASC;

+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+

| id | price | brand_id | property2 | added | title | brand_name | property | today |

+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+

| 520157 | 10 | 1 | 160 | 1382745486 | Product Nine Seven | brand1 | Three | 26 |

| 1726473 | 10 | 2 | 160 | 1382796463 | Product Two Three | brand2 | Eight | 26 |

| 1588875 | 11 | 3 | 160 | 1382762264 | Product Three Six | brand3 | Five | 26 |

| 1556197 | 10 | 4 | 160 | 1382754018 | Product Eight Six | brand4 | Seven | 26 |

| 751443 | 11 | 5 | 160 | 1382803444 | Product Six Three | brand5 | One | 26 |

| 512776 | 11 | 6 | 160 | 1382743642 | Product Ten Five | brand6 | Six | 26 |

mysql> SELECT *,DAY(added) as today FROM facetdemo WHERE property2 = 160 AND today =26 GROUP BY brand_id WITHIN GROUP ORDER

BY price DESC ORDER BY brand_id ASC;

+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+

| id | price | brand_id | property2 | added | title | brand_name | property | today |

+---------+-------+----------+-----------+------------+---------------------+------------+----------+-------+

| 815154 | 998 | 1 | 160 | 1382819286 | Product Two Nine | brand1 | Eight | 26 |

| 2793903 | 999 | 2 | 160 | 1382813601 | Product Eight Five | brand2 | Two | 26 |

| 699831 | 1000 | 3 | 160 | 1382790589 | Product One Six | brand3 | Eight | 26 |

| 714052 | 1000 | 4 | 160 | 1382794137 | Product One Ten | brand4 | Three | 26 |

| 2791902 | 999 | 5 | 160 | 1382813140 | Product Five Three | brand5 | Four | 26 |

| 2753725 | 1000 | 6 | 160 | 1382803662 | Product Seven Three | brand6 | Two | 26 |

Page 10: Fulltext engine for non fulltext searches

Using GROUP <N> BY

mysql> SELECT * FROM facetdemo GROUP 3 BY brand_id WITHIN GROUP ORDER BY added DESC ORDER BY brand_id ASC;

+---------+-------+----------+------------+---------------------+------------+----------+

| id | price | brand_id | added | title | brand_name | property |

+---------+-------+----------+------------+---------------------+------------+----------+

| 1479848 | 938 | 1 | 1382735889 | Product Ten Seven | brand1 | Four |

| 2479064 | 398 | 1 | 1382734998 | Product Ten Five | brand1 | Eight |

| 1480553 | 687 | 1 | 1382734048 | Product Four Two | brand1 | One |

| 1479580 | 62 | 2 | 1382734834 | Product Nine Seven | brand2 | Ten |

| 1479585 | 357 | 2 | 1382734834 | Product Six Two | brand2 | Five |

| 477383 | 908 | 2 | 1382733871 | Product Ten Three | brand2 | Eight |

| 2478429 | 425 | 3 | 1382734839 | Product Three Ten | brand3 | Five |

| 477456 | 519 | 3 | 1382734818 | Product Ten One | brand3 | Six |

| 477521 | 190 | 3 | 1382734403 | Product Three Two | brand3 | Five |

| 2478459 | 931 | 4 | 1382734850 | Product One Two | brand4 | Five |

| 1479718 | 891 | 4 | 1382734065 | Product Two One | brand4 | Three |

| 2478514 | 106 | 4 | 1382733868 | Product Six Seven | brand4 | One |

| 477297 | 991 | 5 | 1382734844 | Product Five Eight | brand5 | Four |

| 2479053 | 648 | 5 | 1382733994 | Product Six One | brand5 | Nine |

| 1480798 | 250 | 5 | 1382732121 | Product One Seven | brand5 | Eight |

Page 11: Fulltext engine for non fulltext searches

Using HAVING

mysql> SELECT *,COUNT(*) FROM facetdemo where property2 = 190 and price>900 GROUP BY brand_id HAVING COUNT(*)>1000;

+-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+

| id | price | brand_id | property2 | added | title | brand_name | property | count(*) |

+-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+

| 2566 | 934 | 24 | 190 | 1382615816 | Product One Three | brand24 | Six | 1023 |

| 4807 | 905 | 11 | 190 | 1382616392 | Product Five Six | brand11 | Eight | 1023 |

| 5539 | 985 | 44 | 190 | 1382616552 | Product Ten Four | brand44 | Three | 1009 |

| 7655 | 912 | 10 | 190 | 1382617104 | Product Four Five | brand10 | Ten | 1028 |

| 16837 | 968 | 20 | 190 | 1382619365 | Product One Nine | brand20 | Five | 1015 |

+-------+-------+----------+-----------+------------+-------------------+------------+----------+----------+

5 rows in set (0.17 sec)

Page 12: Fulltext engine for non fulltext searches

Comparing simple queriesOperation Example MySQL Sphinx difference

Filter by integer, group by

integer

WHERE property_int =190

GROUP BY brand_id0.32 0.14 2.2x

Group by integer, order by

count(*)

GROUP BY brand_id ORDER BY

COUNT(*) DESC

1.76 0.53 3.3x

Filter by integer, order by

timestamp

WHERE brand_id=20 ORDER BY

added ASC

0.00 0.14 0

Filter by integer, order by

timestamp and integer

column

WHERE brand_id=20 ORDER BY

added DESC, property_int ASC

0.31 0.19 1.5x

Page 13: Fulltext engine for non fulltext searches

Using IF comparasion

mysql> SELECT COUNT(*), IF( property2=270 OR price<80, 1,

IF(property2=280 OR price> 900,2,3)

) AS expr FROM facetdemo GROUP BY expr;

+----------+------+

| count(*) | expr |

+----------+------+

| 7494455 | 3 |

| 1357178 | 2 |

| 1148366 | 1 |

+----------+------+

3 rows in set (1.04 sec)

Page 14: Fulltext engine for non fulltext searches

Using INTERVAL for segmentation

mysql> SELECT id, price, INTERVAL(price,0,300,600,900) AS pricerange, COUNT(*) FROM facetdemo WHERE

brand_id=27 GROUP BY pricerange ORDER BY pricerange ASC;

+------+-------+------------+----------+

| id | price | pricerange | count(*) |

+------+-------+------------+----------+

| 219 | 196 | 1 | 58283 |

| 46 | 467 | 2 | 60535 |

| 109 | 667 | 3 | 60789 |

| 5 | 962 | 4 | 20285 |

+------+-------+------------+----------+

4 rows in set (0.19 sec)

Page 15: Fulltext engine for non fulltext searches

Geo spatial in Sphinx

GEODIST(lat1, lon1, lat2, lon2, { option=value, ... })

o in { deg | degrees | rad | radians}

o out {m | meters | km | ft | mi | miles }

o method {haversine | adaptive}

haversine - high precision, expensive

adaptive - good precision, cheaper

(Polar flat-Earth algorithm )

Page 16: Fulltext engine for non fulltext searches

• POLY2D(x1,y1,x2,y2,x3,y3, …)

• GEOPOLY2D (lat1,lng1,lat2,lng2,lat3,lng3,...)

• lat/lng in degrees

• CONTAINTS( polygon, x, y )

mysql> SELECT *, CONTAINS(GEOPOLY2D(40.95164274496,-76.88583678218,41.188446201688,-

73.203723511772,39.900666261352,-74.171833538046,40.059260979044,-

76.301076056469),latitude_deg,longitude_deg) AS inside FROM geodemo WHERE inside=1

LIMIT 0,100 ;

Page 17: Fulltext engine for non fulltext searches

Special data columns

Page 18: Fulltext engine for non fulltext searches

• set of integers column

Multi value attribute (MVA)

199.99

24

128

300

float MVA

Price Categories

... ... ...

Page 19: Fulltext engine for non fulltext searches

MVA with multiple selection

mysql> SELECT id,price,brand_id,categories FROM facetdemo WHERE categories IN (13,14);

+------+-------+----------+------------+

| id | price | brand_id | categories |

+------+-------+----------+------------+

| 1 | 874 | 47 | 13 |

| 2 | 712 | 38 | 11,14 |

| 9 | 113 | 25 | 12,14 |

| 17 | 440 | 46 | 13,15 |

| 19 | 206 | 50 | 13,17 |

| 21 | 76 | 28 | 7,10,13 |

| 22 | 363 | 21 | 13,17,20 |

...

Page 20: Fulltext engine for non fulltext searches

Grouping on MVA

mysql> SELECT id,price,brand_id,categories,GROUPBY(),COUNT(*) FROM facetdemo GROUP BY categories;

+------+-------+----------+------------+-----------+----------+

| id | price | brand_id | categories | groupby() | count(*) |

+------+-------+----------+------------+-----------+----------+

| 1 | 874 | 47 | 13 | 13 | 362931 |

| 2 | 712 | 38 | 11,14 | 14 | 185023 |

| 2 | 712 | 38 | 11,14 | 11 | 329874 |

| 3 | 773 | 7 | 12,16 | 16 | 143837 |

| 3 | 773 | 7 | 12,16 | 12 | 349446 |

| 4 | 803 | 31 | 6,9 | 9 | 267583 |

| 4 | 803 | 31 | 6,9 | 6 | 184772 |

...

Page 21: Fulltext engine for non fulltext searches

Going further: JSON

• starting with 2.1 Sphinx supports JSON

documents

• useful for o unstructured data

o complex one to many relations

{

"id": 1,

"gid": 2,

"title": "some title",

"tags":

[ "tag1", "tag2", "tag3" ],

"property": [

{

"name": "color",

"value": "blue"

},

{

"name": "weight",

"value": 2.56

}

]

}

Page 22: Fulltext engine for non fulltext searches

JSON attributes

• filter, sort and group

• JSON/MVA array functions:

LENGTH(), LEAST(), GREATEST()

• Advanced JSON search in array of objects:

ANY(), ALL(), INDEXOF()

Page 23: Fulltext engine for non fulltext searches

Advanced searching in JSON

document :

id : 1011

title : Hotel Sky

myjson: {

offers: {

{

‘type’ : 3,

‘start’ : start_timestamp,

‘end’: end_timestamp

},

{

‘type’ : 1,

‘start’ : start_timestamp,

‘end’: end_timestamp

}

}

}

SELECT *,ANY (

( item.type = 1 AND

item.start > my_start_timestamp AND

item.end < my_end_timestamp )

FOR item IN myjson.offers

) AS condition

FROM index

WHERE condition =1

Page 24: Fulltext engine for non fulltext searches

• ANY ( cond FOR var IN json.array)

o true if one element match condition

• ALL ( cond FOR var IN json.array)

o true if all elements match condition

• INDEXOF ( cond FOR var IN json.array)

o returns index key of first element that match

condition

Page 25: Fulltext engine for non fulltext searches

Fulltext for speed up

non-fulltext

Page 26: Fulltext engine for non fulltext searches

SELECT *,(...) as heavy_expr

WHERE attr=x AND heavy_expr =1

SELECT *,(...) as heavy_expr

WHERE MATCH(‘attrx’) AND heavy_expr =1

No fulltext match, query does fullscan,

computes for whole collection the heavy

expression

Fulltext match, heavy expression is

computed only on result set returned by

fulltext match

Page 27: Fulltext engine for non fulltext searches

Sphinx with FT filter

Operation Example MySQL Sphinx w/o FT Sphinx with FT

Filter by integer,

order by

timestamp and

integer column

WHERE

brand_id=20

ORDER BY added

DESC, property_int

ASC

0.31 0.19

Fulltext filter,

order by

timestamp and

integer column

WHERE

MATCH(‘brand20’)

ORDER BY added

DESC, property_int

ASC

0.13

Page 28: Fulltext engine for non fulltext searches

Speed up geo spatial with fulltext

• example: find items around a point in New York city in a

10km radius. Speed-up: search only items belonging to

New York states

mysql> SELECT *, GEODIST(0.710011075352, -

1.2918035709982,latitude,longitude,{in=rad,out=km,method=adaptive}) as distance FROM geodemo WHERE

distance < 10 ORDER BY distance ASC LIMIT 0,10;10 rows in set (0.17 sec)

mysql> SELECT *, GEODIST(0.710011075352, -

1.2918035709982,latitude,longitude,{in=rad,out=km,method=adaptive}) as distance FROM geodemo WHERE

MATCH('@state_code NY') AND distance < 10 ORDER BY distance ASC LIMIT 0,10;10 rows in set (0.03 sec)

Page 29: Fulltext engine for non fulltext searches

Questions?

[email protected]

http://www.sphinxsearch.com