Top Banner
FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework Muhammad Saleem 1 , Qaiser Mehmood 2 , Axel-Cyrille Ngonga Ngomo 1 http://feasible.aksw.org/ 1 Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany 2 Insight Center for Data Analytics, National University of Ireland, Galway International Semantic Web Conference, Bethlehem, USA, 2015 07/05/2022 1
34

FEASIBLE-Benchmark-Framework-ISWC2015

Apr 14, 2017

Download

Science

Muhammad Saleem
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 1

FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework

Muhammad Saleem1, Qaiser Mehmood2, Axel-Cyrille Ngonga Ngomo1

http://feasible.aksw.org/1Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany

2Insight Center for Data Analytics, National University of Ireland, GalwayInternational Semantic Web Conference, Bethlehem, USA, 2015

Page 2: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 2

Triple Stores Benchmarks• Synthetic Benchmarks

• Make use of the synthetic queries and/or data• Benchmarks of different data sizes possible• Suitable to test the scalability• Often fail to reflect the reality • For example, LUBM, SP2Bench, BSBM, WatDiv etc.

• Queries Log Benchmarks• Make use of the real queries from queries log• Can be more close to the reality• Can be used with different data sizes• Scalability can be tested• For example, DBPSB, FEASIBLE

Page 3: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 3

DBpedia SPARQL Benchmark• Based on real DBpedia queries log• Benchmarks of different data sizes possible• Suitable to test the scalability• Only Considers SPARQL SELECT• Does not consider Important query features• For example, number of join vertices, triple patterns selectivities• Not customizable for given use cases or needs of an application

Page 4: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 4

FEASIBLE SPARQL Benchmark• Can be applied to any SPARQL queries log• Considers SPARQL SELECT, ASK, DESCRIBE, CONSTRUCT• Considers Important query features• For example, number of join vertices, triple patterns selectivities,

query runtime, resultset size, number of BGPs, Mean join vertices degree, number of triple patterns etc.• Customizable for given use cases or needs of an application

Page 5: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 5

FEASIBLE SPARQL Benchmark

• Dataset cleaning • Feature vectors and normalization• Selection of exemplars • Selection of benchmark queries

Page 6: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 6

Dataset Cleaning • Remove syntactically incorrect queries• Remove zero result size queries• It is an optional step• Not of theoretical necessity• Leads to practically reliable benchmarks

Page 7: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 7

Feature Vectors and NormalizationSELECT DISTINCT ?entita ?nomeWHERE { ?entita rdf:type dbo:VideoGame . ?entita rdfs:label ?nome FILTER regex(?nome, "konami", "i") }LIMIT 100

Query Type: SELECT Results Size: 13Basic Graph Patterns (BGPs): 1Triple Patterns: 2Join Vertices: 1Mean Join Vertices Degree: 2.0Mean triple patterns selectivity: 0.01709761619798973UNION: No DISTINCT: Yes ORDER BY: No REGEX: Yes LIMIT: Yes OFFSET: No OPTIONAL: No FILTER: Yes GROUP BY: No Runtime (ms): 65

13 1 2 1 2 0.017 0 1 0 1 1 0 0 1 0 65

0.11 0.53 0.67 0.14 0.08 0.017 0 1 0 1 1 0 0 1 0 0.14

Feature Vector

Normalized Feature Vector

Page 8: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 8

Selection of exemplars

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Plot feature vectors in a multidimensional space

Query Feature 1 Feature 2Q1 0.2 0.2Q2 0.5 0.3Q3 0.8 0.3Q4 0.9 0.1Q5 0.5 0.5Q6 0.2 0.7Q7 0.1 0.8Q8 0.13 0.65Q9 0.9 0.5Q10 0.1 0.5

Suppose we need a benchmark of 3 queries

Page 9: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 9

Selection of exemplars

Calculate average point

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 10: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 10

Selection of exemplars

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Select point of minimum Euclidean distance to avg. point

*Red is our first exemplar

Page 11: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 11

Selection of exemplars

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Select point that is farthest to exemplars

Page 12: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 12

Selection of exemplars

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 13: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 13

Selection of exemplars

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Select point that is farthest to exemplars

Page 14: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 14

Selection of exemplars

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 15: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 15

Selection of Benchmark QueriesCalculate distance from Q1 to each exemplars

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 16: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 16

Selection of Benchmark Queries

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Assign Q1 to the minimum distance exemplar

Page 17: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 17

Selection of Benchmark Queries

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Repeat the process for Q2

Page 18: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 18

Selection of Benchmark Queries

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Repeat the process for Q3

Page 19: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 19

Selection of Benchmark QueriesRepeat the process for Q6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 20: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 20

Selection of Benchmark Queries

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Repeat the process for Q8

Page 21: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 21

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Selection of Benchmark QueriesRepeat the process for Q9

Page 22: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 22

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Selection of Benchmark QueriesRepeat the process for Q10

Page 23: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 23

Selection of Benchmark QueriesCalculate Average across each cluster

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 24: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 24

Selection of Benchmark QueriesCalculate distance of each point in cluster to the average

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 25: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 25

Selection of Benchmark QueriesSelect minimum distance query as the final benchmark

query from that cluster

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Black, i.e., Q2 is the final selected query from yellow cluster

Page 26: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 26

Selection of Benchmark QueriesSelect minimum distance query as the final benchmark

query from that cluster

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Black, i.e., Q8 is the final selected query from brown cluster

Page 27: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 27

Selection of Benchmark QueriesSelect minimum distance query as the final benchmark

query from that cluster

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Black, i.e., Q3 is the final selected query from green cluster

Our benchmark queries are Q2, Q3, and Q8

Page 28: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 28

Experimental Setup• Composite Error Estimation

• L is the query log, B is the benchmark and K is the set of all features

Page 29: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 29

Experimental Setup• Virtuoso Open-Source Edition version 7.2

• NumberOfBuffers = 680000, MaxDirtyBuffers = 500000

• Sesame Version 2.7.8• Tomcat 7 as HTTP interface and native storage layout.• Set the spoc, posc, opsc indices to those specified in the native storage configuration• The Java heap size was set to 6GB

• Jena-TDB (Fuseki) Version 2.0• Java heap size set to 6GB

• OWLIM-SE Version 6.1• Tomcat 7.0 as HTTP interface• Set the entity index size to 45,000,000 and enabled the predicate list• Rule set was empty and the Java heap size was set to 6GB.

• We configured all triple stores to use 6GB of memory and used default values otherwise.

Page 30: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 30

Comparison of Composite Error

FEASIBLE’s composite error is 54.9% less than DBPSB

Page 31: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 31

Comparison of Triple Stores: QpS

Sesa

me

Virt

uoso

OW

LIM

-SE

Fuse

ki

Sesa

me

Virt

uoso

OW

LIM

-SE

Fuse

ki

SWDF DBpedia

0

50

100

150

200

250

QpS

Sesa

me

Virt

uoso

OW

LIM

-SE

Fuse

ki

Sesa

me

Virt

uoso

OW

LIM

-SE

Fuse

ki

SWDF DBpedia

0

0.5

1

1.5

2

2.5

3

QpS

Sesa

me

Virt

uoso

OW

LIM

-SE

Fuse

ki

Sesa

me

Virt

uoso

OW

LIM

-SE

Fuse

ki

SWDF DBpedia

0

10

20

30

40

50

60

70

QpS

Sesa

me

Virt

uoso

OW

LIM

-SE

Fuse

ki

Sesa

me

Virt

uoso

OW

LIM

-SE

Fuse

ki

SWDF DBpedia

00.20.40.60.8

11.21.41.61.8

2

QpS

SPARQL ASK SPARQL CONSTRUCT

SPARQL DESCRIBE SPARQL SELECT

Page 32: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 32

Comparison of Triple Stores: Mix Queries

Sesame Virtuoso OWLIM-SE FusekiSWDF

0

5

10

15

20

25

30

35

40

QM

pH

Sesame Virtuoso OWLIM-SE FusekiDBpedia

00.20.40.60.8

11.21.41.61.8

2

QM

pH

Sesame Virtuoso OWLIM-SE FusekiSWDF

00.20.40.60.8

11.21.41.61.8

2

QpS

Sesame Virtuoso OWLIM-SE FusekiDBpedia

00.010.020.030.040.050.060.070.080.09

0.1

QpS

Page 33: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 33

Rank-wise Ranking of Triple StoresAll values are in percentages

• None of the system is sole winner or loser for a particular rank• Virtuoso mostly lies in the higher ranks, i.e., rank 1 and 2 (68.29%)• Fuseki mostly in the middle ranks, i.e., rank 2 and 3 (65.14%)• OWLIM-SE usually on the slower side, i.e., rank 3 and 4 (60.86 %)• Sesame is either fast or slow. Rank 1 (31.71% of the queries) and rank 4 (23.14%)

Page 34: FEASIBLE-Benchmark-Framework-ISWC2015

05/03/2023 34

[email protected]

Try Yourself

http://feasible.aksw.org/