Top Banner
1 An open source DBMS for handheld An open source DBMS for handheld devices devices by Rajkumar Sen IIT Bombay Under the guidance of Prof. Krithi Ramamritham
43
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Stage3Raj.ppt

1

An open source DBMS for handheld devicesAn open source DBMS for handheld devices

by Rajkumar Sen

IIT Bombay

Under the guidance of

Prof. Krithi Ramamritham

Page 2: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

2

OutlineOutline

• Introduction• Storage Management• Query Processing• Other issues• Performance Evaluation• Conclusions

Page 3: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

3

IntroductionIntroduction

A resource constrained deviceA small computer with limited resourcese.g. Cellphones, Simputer, Palm devices etc.

Data management is important– Increasing number of applications– They deal with a fair amount of data– Complex queries involving joins and aggregates– Atomicity and Durability for data consistency – Ease of application development

A device resident DBMS is needed

Page 4: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

4

IntroductionIntroduction

Need for Synchronization– Data from remote server downloaded on the device– Updates at both places

– Common data needs to be synchronized

Challenges– Limited computing power and main memory– Limited stable storage– Resources are not uniform across devices– Need a system that can do the best for every device

Page 5: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

5

IntroductionIntroductionStorage Management

– Reduce storage cost to a minimum– Limited storage could preclude any additional index– Data model should try to incorporate some index

information

Query Processing– Memory limits the query processing capabilities– Minimum memory algorithms in existing systems does

not work well for complex joins and aggregates– Need algorithms that create in-memory indices and save

aggregate values– Optimal memory allocation among operators

Page 6: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

6

Storage ManagementStorage Management

Aim at compactness in representation of data

Existing storage models – Flat Storage

Tuples are stored sequentially. Ensures access locality butconsumes space.

– Pointer-based Domain Storage• Values partitioned into domains which are sets of unique values• Tuples reference the attribute value by means of pointers• One domain shared among multiple attributes

In Domain Storage, pointer of size p (typically 4 bytes) to

point to the domain value. Can we further reduce the storage cost?

Page 7: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

7

Storage ManagementStorage Management ID Storage:

– An identifier for each of the domain values

– Identifier is the ordinal value in the domain table

– Store the identifier instead of the pointer

– Use the identifier as an offset into the domain table

– Extendable IDs, length of the identifier grows and shrinks depending on the number of domain values

Page 8: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

8

Storage ManagementStorage Management D domain values can be distinguished by identifiers of

length log2D /8 bytes.

Starting with 1 byte identifiers, the length grows and shrinks. ID values are projected out from the rest of the relation and stored separately maintaining Positional Indexing. Why not bit identifiers?

– Storage is byte addressable.– Packing bit identifiers in bytes increases the storage

management complexity.

Page 9: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

9

Storage ManagementStorage Management

Relation R ID Values

Figure: ID Storage

0

1

2

1

n

0

n

v0

v1

vn

Domain Values

Positional Indexing

Page 10: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

10

Storage ManagementStorage ManagementPing Pong Effect

– At the boundaries, there is reorganization of ID values when the identifier length changes– Frequent insertions and deletions at the boundaries might result in a lot of reorganization– Phenomena should be avoided

No deletion of Domain values– Domain structure means a future insertion might reference the deleted value– Do not delete a domain value even it is not referenced

Setting a threshold for deletion for domain values– Delete only if number of deletions exceeds a threshold– Increase the threshold when boundaries are being crossed

Page 11: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

11

Storage ManagementStorage ManagementPrimary Key-Foreign Key relationship

– Primary key: A domain in itself– IDs for primary key values– Values present in child table are the corresponding primary

key IDs– Projected foreign key column forms a Join Index

Child Table

Relation S

S.BID Values

Figure: Primary Key-Foreign Key Join Index

0

1

2

1

n

0

n

v0

v1

vn

Parent TableRelation R

Page 12: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

12

Storage ManagementStorage Management ID based Storage wins over Domain Storage when p > log2D /8

Relations in a small device do not have a very high cardinalityAbove condition true for most of the data.

Advantages(i) Considerable saving in storage cost.(ii) Efficient join between parent table and child table

Page 13: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

13

Query ProcessingQuery ProcessingConsiderations

– Minimize writes to secondary storage– Efficient usage of limited main memory– Read buffer not required– Main memory as write buffer– If read:write ratio very high, flash memory as write buffer

Need for Left-deep Query Plan– Reduce materialization, if absolutely necessary use main

memory– Bushy trees and right-deep trees are ruled out– Left deep tree is most suited for pipelined evaluation– Right operand in a left-deep tree is always a stored relation

Page 14: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

14

Query ProcessingQuery ProcessingNeed for optimal memory allocation

If nested loop algorithms are used for every operator, minimum amount of memory is needed to execute the plan

– Nested loop algorithms are inefficient– Should memory usage be reduced to a minimum at the cost of performance?– Different devices come with different memory sizes– Query plans should make efficient use of memory– Memory must be optimally allocated among all operators

Need to generate the best query execution plan depending on

the available memory

Page 15: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

15

Query ProcessingQuery ProcessingOperator evaluation schemes

– Different schemes for an operator– All have different memory usage and cost– Schemes conform to left-deep tree query plan– Cost of a scheme is the computation time

Page 16: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

16

Query ProcessingQuery Processing

Schemes for Join– Nested Loop Join– Indexed Nested Loop Join– Hash Join– Using Join Index

Schemes for aggregation– Nested Loop aggregation– Buffered aggregation

Operator schemes implemented using the Iterator Model

Page 17: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

17

Query ProcessingQuery ProcessingBenefit/Size of a scheme

Every scheme is characterized by a benefit/size ratio which represents its benefit per unit memory allocation Minimum scheme for an operator is the scheme that has max. cost and min. memory

Assume n schemes s1, s2,…sn to implement an operator o

min(o)=smin

i, 1≤i≤n : Cost(si) ≤ Cost(smin) ,

Memory(si) ≥ Memory(smin)

smin is the minimum scheme for operator o. Then,

Benefit(si)=Cost(smin) – Cost(si)

Size(si) =Memory(si) – Memory(smin)

A

Page 18: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

18

Query ProcessingQuery Processing• Every operator is a collection of (size,benefit) points, n

points for n schemes• Operator cost function is the collection of (cost, memory) points of its schemes

Benefit

(0,0)

(s1,b1)

(s2,b2)

Figure: (Size, Benefit) points for an operator

Size Memory

Cost

(0,c1)

(m2,c2)

(m3,c3)

(0,0)

Figure: Operator cost function

Page 19: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

19

Query ProcessingQuery ProcessingOptimal Memory Allocation

2-Phase ApproachPhase 1: Query is first optimized to get a query planPhase 2: Division of memory among the operators

Scheme for every operator is determined in phase 1 and remainsunchanged after phase 2, memory allocation in phase 2 on thebasis of the cost functions of the schemes

Memory is assumed to be available for all the schemes, this maynot be true for a resource constrained device

Traditional 2-phase optimization cannot be used

Page 20: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

20

Query ProcessingQuery ProcessingOptimal Memory Allocation

1-Phase Approach

– Query optimizer is made memory cognizant– Modified optimizer takes into account division of

memory among operators while choosing between plans

Ideally, 1-phase optimization should be done but the optimizer becomes complex.

Page 21: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

21

Query ProcessingQuery Processing

Modified 2-phase optimizer

Optimal division of memory involves the decision of selectingthe best scheme for every operator

Phase 1: Determine the optimal left-deep join order using dynamic programming approach

Phase 2: a) Divide memory among the operators b) Choose the scheme for every operator depending

on the memory allocated

Page 22: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

22

Query ProcessingQuery ProcessingExact memory allocation

Hulgeri et al proposed an exact solution to the memory allocation problem

• Traditional 2-phase optimization• Divides memory among operator schemes, schemes

selected in phase 1• Algorithm to divide memory among linear piecewise

cost functions• Optimal division of memory takes place only at

change-over points

Page 23: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

23

Query ProcessingQuery ProcessingExact memory allocation

• Our operator cost functions are also piecewise linear functions

• Exact algorithm can be used by replacing scheme cost function with operator cost function

• Division of memory among operator cost functions• Amount of memory allocated to each operator will

exactly match one of its schemes

Page 24: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

24

Query ProcessingQuery ProcessingHeuristic memory allocation

– A heuristic to determine which operator gains the most per unit memory allocation and allocate memory to that operator– Gain of every operator is determined by its best feasible scheme– Repeat the process till memory allocation is done

Heuristic:

Select the scheme that has the maximum benefit/size ratio

Page 25: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

25

Query ProcessingQuery ProcessingMemAllocate(MTotal) {

1. Mmin = Σ Memory(min(i))

2. for i=1 to m do3. Scheme(i)=min(i)

4. Mavail = MTotal – Mmin

5. RemoveSchemes(Mavail)

6. sbest,obest=GetBestScheme(Mavail)7. if no best scheme then return8. else {9. Mavail = Mavail - Memory(sbest) + Memory(Scheme(obest))

10. Scheme(obest ) = sbest

11. RemoveSchemes(sbest,obest, Mavail )

12. RecomputeBenefits(sbest,obest)13. }14. goto step 6

}

i=1

m

Page 26: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

26

Query ProcessingQuery ProcessingRecomputation of Benefits

Once the operator obest gets memory Memory(sbest),

the benefit and size of all the schemes of obest that

have higher memory than sbest change.

New benefit and size values will be the difference between their old values and those of sbest.

Benefit

Size(0,0)

(s1,b1)(s2,b2)

(s2-s1)(b2-b1)

Scheme 1 has highest benefit/size ratioBenefit(Scheme 2)=(b2-b1)

Size(Scheme 2)=(s2-s1)Figure: Benefit and Size Recomputation

Page 27: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

27

Some other issuesSome other issues

Data Synchronization• Record the changes in a log• Merge the changes with the main server• Conflict detection and Conflict resolution

Concurrency control• Local transaction on the device, transaction doing

data synchronization• Minimum concurrency control needed

Access Rights Management• Community handhelds like Simputer

• More than a single user

Page 28: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

28

Implementation StatusImplementation Status

– Developed in C programming language– Code base distributed over several subdirs– Recursive makefiles to build the system– Lex and Bison used to write the SQL parser

– Storage Manager, Query Optimizer and Query Executor implemented

– Supports CHAR, INTEGER AND FLOAT– Select, Project, Join, and COUNT– ID based Join Index and other aggregate

operators not completed

Page 29: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

29

Performance EvaluationPerformance Evaluation

Experimental setup– Database system ported to the Simputer, a handheld

device– Sample healthcare schema and datasets

Doctor (91), Drug(77), Visit(830), Prescription(2155)

– Q1: 3 joins and 2 selections Q4: 3 joins and aggregation over two attributes– Data stored in Flat Storage and ID Storage without Join

Index– Exact and heuristic memory allocation– Response time measured by varying the amount of

memory

Page 30: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

30

Performance EvaluationPerformance Evaluation

Page 31: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

31

Performance EvaluationPerformance Evaluation

Page 32: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

32

Performance EvaluationPerformance Evaluation

Conclusions– Response times highest with minimum memory and

least with maximum memory– Computing power of the handheld affects the response

time in a big way– Heuristic memory allocation differed from exact

algorithm in a few points only– Response times more for ID Storage due to extra cost

in projection– Nested loop aggregation is very costly – Join Index should reduce the query execution time

Page 33: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

33

SummarySummary

• Storage Manager, Optimizer and Executor implemented

• Supports SPJ and COUNT operators

Contributions– A new storage model, ID based Storage– Highlighted the need for optimal memory allocation– Existing Exact allocation algorithm used with some

modifications– Heuristic memory allocation algorithm– Selection of best query execution plan depending on

memory available in a device

Page 34: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

34

Ongoing and Future workOngoing and Future work

Ongoing Work– Data synchronization utility– Remaining aggregate operators– ID based Join Index – Integration with AQUA, an online database backed

discussion forum

Future Work– Feasibility of a 1-phase optimizer– DBMS module toolkit– An operator that returns first-k results of a query– Application specific DBMS

Page 35: Stage3Raj.ppt

35

Thank YouThank You

Page 36: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

36

Performance EvaluationPerformance Evaluation

Page 37: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

37

Performance EvaluationPerformance Evaluation

Page 38: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

38

Performance EvaluationPerformance Evaluation

Page 39: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

39

Performance EvaluationPerformance Evaluation

Page 40: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

40

Performance EvaluationPerformance Evaluation

Page 41: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

41

Performance EvaluationPerformance Evaluation

Page 42: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

42

Performance EvaluationPerformance Evaluation

Page 43: Stage3Raj.ppt

04/13/23 An open source DBMS for handheld devices

43

Performance EvaluationPerformance Evaluation