1 An open source DBMS for handheld An open source DBMS for handheld devices devices by Rajkumar Sen IIT Bombay Under the guidance of Prof. Krithi Ramamritham
1
An open source DBMS for handheld devicesAn open source DBMS for handheld devices
by Rajkumar Sen
IIT Bombay
Under the guidance of
Prof. Krithi Ramamritham
04/13/23 An open source DBMS for handheld devices
2
OutlineOutline
• Introduction• Storage Management• Query Processing• Other issues• Performance Evaluation• Conclusions
04/13/23 An open source DBMS for handheld devices
3
IntroductionIntroduction
A resource constrained deviceA small computer with limited resourcese.g. Cellphones, Simputer, Palm devices etc.
Data management is important– Increasing number of applications– They deal with a fair amount of data– Complex queries involving joins and aggregates– Atomicity and Durability for data consistency – Ease of application development
A device resident DBMS is needed
04/13/23 An open source DBMS for handheld devices
4
IntroductionIntroduction
Need for Synchronization– Data from remote server downloaded on the device– Updates at both places
– Common data needs to be synchronized
Challenges– Limited computing power and main memory– Limited stable storage– Resources are not uniform across devices– Need a system that can do the best for every device
04/13/23 An open source DBMS for handheld devices
5
IntroductionIntroductionStorage Management
– Reduce storage cost to a minimum– Limited storage could preclude any additional index– Data model should try to incorporate some index
information
Query Processing– Memory limits the query processing capabilities– Minimum memory algorithms in existing systems does
not work well for complex joins and aggregates– Need algorithms that create in-memory indices and save
aggregate values– Optimal memory allocation among operators
04/13/23 An open source DBMS for handheld devices
6
Storage ManagementStorage Management
Aim at compactness in representation of data
Existing storage models – Flat Storage
Tuples are stored sequentially. Ensures access locality butconsumes space.
– Pointer-based Domain Storage• Values partitioned into domains which are sets of unique values• Tuples reference the attribute value by means of pointers• One domain shared among multiple attributes
In Domain Storage, pointer of size p (typically 4 bytes) to
point to the domain value. Can we further reduce the storage cost?
04/13/23 An open source DBMS for handheld devices
7
Storage ManagementStorage Management ID Storage:
– An identifier for each of the domain values
– Identifier is the ordinal value in the domain table
– Store the identifier instead of the pointer
– Use the identifier as an offset into the domain table
– Extendable IDs, length of the identifier grows and shrinks depending on the number of domain values
04/13/23 An open source DBMS for handheld devices
8
Storage ManagementStorage Management D domain values can be distinguished by identifiers of
length log2D /8 bytes.
Starting with 1 byte identifiers, the length grows and shrinks. ID values are projected out from the rest of the relation and stored separately maintaining Positional Indexing. Why not bit identifiers?
– Storage is byte addressable.– Packing bit identifiers in bytes increases the storage
management complexity.
04/13/23 An open source DBMS for handheld devices
9
Storage ManagementStorage Management
Relation R ID Values
Figure: ID Storage
0
1
2
1
n
0
n
v0
v1
vn
Domain Values
Positional Indexing
04/13/23 An open source DBMS for handheld devices
10
Storage ManagementStorage ManagementPing Pong Effect
– At the boundaries, there is reorganization of ID values when the identifier length changes– Frequent insertions and deletions at the boundaries might result in a lot of reorganization– Phenomena should be avoided
No deletion of Domain values– Domain structure means a future insertion might reference the deleted value– Do not delete a domain value even it is not referenced
Setting a threshold for deletion for domain values– Delete only if number of deletions exceeds a threshold– Increase the threshold when boundaries are being crossed
04/13/23 An open source DBMS for handheld devices
11
Storage ManagementStorage ManagementPrimary Key-Foreign Key relationship
– Primary key: A domain in itself– IDs for primary key values– Values present in child table are the corresponding primary
key IDs– Projected foreign key column forms a Join Index
Child Table
Relation S
S.BID Values
Figure: Primary Key-Foreign Key Join Index
0
1
2
1
n
0
n
v0
v1
vn
Parent TableRelation R
04/13/23 An open source DBMS for handheld devices
12
Storage ManagementStorage Management ID based Storage wins over Domain Storage when p > log2D /8
Relations in a small device do not have a very high cardinalityAbove condition true for most of the data.
Advantages(i) Considerable saving in storage cost.(ii) Efficient join between parent table and child table
04/13/23 An open source DBMS for handheld devices
13
Query ProcessingQuery ProcessingConsiderations
– Minimize writes to secondary storage– Efficient usage of limited main memory– Read buffer not required– Main memory as write buffer– If read:write ratio very high, flash memory as write buffer
Need for Left-deep Query Plan– Reduce materialization, if absolutely necessary use main
memory– Bushy trees and right-deep trees are ruled out– Left deep tree is most suited for pipelined evaluation– Right operand in a left-deep tree is always a stored relation
04/13/23 An open source DBMS for handheld devices
14
Query ProcessingQuery ProcessingNeed for optimal memory allocation
If nested loop algorithms are used for every operator, minimum amount of memory is needed to execute the plan
– Nested loop algorithms are inefficient– Should memory usage be reduced to a minimum at the cost of performance?– Different devices come with different memory sizes– Query plans should make efficient use of memory– Memory must be optimally allocated among all operators
Need to generate the best query execution plan depending on
the available memory
04/13/23 An open source DBMS for handheld devices
15
Query ProcessingQuery ProcessingOperator evaluation schemes
– Different schemes for an operator– All have different memory usage and cost– Schemes conform to left-deep tree query plan– Cost of a scheme is the computation time
04/13/23 An open source DBMS for handheld devices
16
Query ProcessingQuery Processing
Schemes for Join– Nested Loop Join– Indexed Nested Loop Join– Hash Join– Using Join Index
Schemes for aggregation– Nested Loop aggregation– Buffered aggregation
Operator schemes implemented using the Iterator Model
04/13/23 An open source DBMS for handheld devices
17
Query ProcessingQuery ProcessingBenefit/Size of a scheme
Every scheme is characterized by a benefit/size ratio which represents its benefit per unit memory allocation Minimum scheme for an operator is the scheme that has max. cost and min. memory
Assume n schemes s1, s2,…sn to implement an operator o
min(o)=smin
i, 1≤i≤n : Cost(si) ≤ Cost(smin) ,
Memory(si) ≥ Memory(smin)
smin is the minimum scheme for operator o. Then,
Benefit(si)=Cost(smin) – Cost(si)
Size(si) =Memory(si) – Memory(smin)
A
04/13/23 An open source DBMS for handheld devices
18
Query ProcessingQuery Processing• Every operator is a collection of (size,benefit) points, n
points for n schemes• Operator cost function is the collection of (cost, memory) points of its schemes
Benefit
(0,0)
(s1,b1)
(s2,b2)
Figure: (Size, Benefit) points for an operator
Size Memory
Cost
(0,c1)
(m2,c2)
(m3,c3)
(0,0)
Figure: Operator cost function
04/13/23 An open source DBMS for handheld devices
19
Query ProcessingQuery ProcessingOptimal Memory Allocation
2-Phase ApproachPhase 1: Query is first optimized to get a query planPhase 2: Division of memory among the operators
Scheme for every operator is determined in phase 1 and remainsunchanged after phase 2, memory allocation in phase 2 on thebasis of the cost functions of the schemes
Memory is assumed to be available for all the schemes, this maynot be true for a resource constrained device
Traditional 2-phase optimization cannot be used
04/13/23 An open source DBMS for handheld devices
20
Query ProcessingQuery ProcessingOptimal Memory Allocation
1-Phase Approach
– Query optimizer is made memory cognizant– Modified optimizer takes into account division of
memory among operators while choosing between plans
Ideally, 1-phase optimization should be done but the optimizer becomes complex.
04/13/23 An open source DBMS for handheld devices
21
Query ProcessingQuery Processing
Modified 2-phase optimizer
Optimal division of memory involves the decision of selectingthe best scheme for every operator
Phase 1: Determine the optimal left-deep join order using dynamic programming approach
Phase 2: a) Divide memory among the operators b) Choose the scheme for every operator depending
on the memory allocated
04/13/23 An open source DBMS for handheld devices
22
Query ProcessingQuery ProcessingExact memory allocation
Hulgeri et al proposed an exact solution to the memory allocation problem
• Traditional 2-phase optimization• Divides memory among operator schemes, schemes
selected in phase 1• Algorithm to divide memory among linear piecewise
cost functions• Optimal division of memory takes place only at
change-over points
04/13/23 An open source DBMS for handheld devices
23
Query ProcessingQuery ProcessingExact memory allocation
• Our operator cost functions are also piecewise linear functions
• Exact algorithm can be used by replacing scheme cost function with operator cost function
• Division of memory among operator cost functions• Amount of memory allocated to each operator will
exactly match one of its schemes
04/13/23 An open source DBMS for handheld devices
24
Query ProcessingQuery ProcessingHeuristic memory allocation
– A heuristic to determine which operator gains the most per unit memory allocation and allocate memory to that operator– Gain of every operator is determined by its best feasible scheme– Repeat the process till memory allocation is done
Heuristic:
Select the scheme that has the maximum benefit/size ratio
04/13/23 An open source DBMS for handheld devices
25
Query ProcessingQuery ProcessingMemAllocate(MTotal) {
1. Mmin = Σ Memory(min(i))
2. for i=1 to m do3. Scheme(i)=min(i)
4. Mavail = MTotal – Mmin
5. RemoveSchemes(Mavail)
6. sbest,obest=GetBestScheme(Mavail)7. if no best scheme then return8. else {9. Mavail = Mavail - Memory(sbest) + Memory(Scheme(obest))
10. Scheme(obest ) = sbest
11. RemoveSchemes(sbest,obest, Mavail )
12. RecomputeBenefits(sbest,obest)13. }14. goto step 6
}
i=1
m
04/13/23 An open source DBMS for handheld devices
26
Query ProcessingQuery ProcessingRecomputation of Benefits
Once the operator obest gets memory Memory(sbest),
the benefit and size of all the schemes of obest that
have higher memory than sbest change.
New benefit and size values will be the difference between their old values and those of sbest.
Benefit
Size(0,0)
(s1,b1)(s2,b2)
(s2-s1)(b2-b1)
Scheme 1 has highest benefit/size ratioBenefit(Scheme 2)=(b2-b1)
Size(Scheme 2)=(s2-s1)Figure: Benefit and Size Recomputation
04/13/23 An open source DBMS for handheld devices
27
Some other issuesSome other issues
Data Synchronization• Record the changes in a log• Merge the changes with the main server• Conflict detection and Conflict resolution
Concurrency control• Local transaction on the device, transaction doing
data synchronization• Minimum concurrency control needed
Access Rights Management• Community handhelds like Simputer
• More than a single user
04/13/23 An open source DBMS for handheld devices
28
Implementation StatusImplementation Status
– Developed in C programming language– Code base distributed over several subdirs– Recursive makefiles to build the system– Lex and Bison used to write the SQL parser
– Storage Manager, Query Optimizer and Query Executor implemented
– Supports CHAR, INTEGER AND FLOAT– Select, Project, Join, and COUNT– ID based Join Index and other aggregate
operators not completed
04/13/23 An open source DBMS for handheld devices
29
Performance EvaluationPerformance Evaluation
Experimental setup– Database system ported to the Simputer, a handheld
device– Sample healthcare schema and datasets
Doctor (91), Drug(77), Visit(830), Prescription(2155)
– Q1: 3 joins and 2 selections Q4: 3 joins and aggregation over two attributes– Data stored in Flat Storage and ID Storage without Join
Index– Exact and heuristic memory allocation– Response time measured by varying the amount of
memory
04/13/23 An open source DBMS for handheld devices
30
Performance EvaluationPerformance Evaluation
04/13/23 An open source DBMS for handheld devices
31
Performance EvaluationPerformance Evaluation
04/13/23 An open source DBMS for handheld devices
32
Performance EvaluationPerformance Evaluation
Conclusions– Response times highest with minimum memory and
least with maximum memory– Computing power of the handheld affects the response
time in a big way– Heuristic memory allocation differed from exact
algorithm in a few points only– Response times more for ID Storage due to extra cost
in projection– Nested loop aggregation is very costly – Join Index should reduce the query execution time
04/13/23 An open source DBMS for handheld devices
33
SummarySummary
• Storage Manager, Optimizer and Executor implemented
• Supports SPJ and COUNT operators
Contributions– A new storage model, ID based Storage– Highlighted the need for optimal memory allocation– Existing Exact allocation algorithm used with some
modifications– Heuristic memory allocation algorithm– Selection of best query execution plan depending on
memory available in a device
04/13/23 An open source DBMS for handheld devices
34
Ongoing and Future workOngoing and Future work
Ongoing Work– Data synchronization utility– Remaining aggregate operators– ID based Join Index – Integration with AQUA, an online database backed
discussion forum
Future Work– Feasibility of a 1-phase optimizer– DBMS module toolkit– An operator that returns first-k results of a query– Application specific DBMS
35
Thank YouThank You
04/13/23 An open source DBMS for handheld devices
36
Performance EvaluationPerformance Evaluation
04/13/23 An open source DBMS for handheld devices
37
Performance EvaluationPerformance Evaluation
04/13/23 An open source DBMS for handheld devices
38
Performance EvaluationPerformance Evaluation
04/13/23 An open source DBMS for handheld devices
39
Performance EvaluationPerformance Evaluation
04/13/23 An open source DBMS for handheld devices
40
Performance EvaluationPerformance Evaluation
04/13/23 An open source DBMS for handheld devices
41
Performance EvaluationPerformance Evaluation
04/13/23 An open source DBMS for handheld devices
42
Performance EvaluationPerformance Evaluation
04/13/23 An open source DBMS for handheld devices
43
Performance EvaluationPerformance Evaluation