Self-Tuning Database Systems: The AutoAdmin Experienceinfolab.stanford.edu/infoseminar/archive/SpringY2002/speakers/suraj… · Research Group Overview. 5/10/2002 (c) Microsoft Corporation

Post on 12-Jun-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

5/10/2002 (c) Microsoft Corporation 1

Self-Tuning Database Systems: The AutoAdmin Experience

Surajit ChaudhuriData Management and Exploration Group

Microsoft Researchhttp://research.microsoft.com/users/surajitc

surajitc@microsoft.com

5/10/2002 (c) Microsoft Corporation 2

Research Group Overview

5/10/2002 (c) Microsoft Corporation 3

Data Management, Exploration and Mining Group

Formed in 1999 by fusing two projects -AutoAdmin and DB support for DM Research with technology transfer

Project-orientedClose partnership with SQL Server

6 researchers, 5 developersA junior-heavy team Strong internship program

5/10/2002 (c) Microsoft Corporation 4

Current Projects

AutoAdmin: Self Tuning Database Systems Data Cleaning Exploratory Projects

Approximate Query ProcessingDocuments + Structured DataXML2SQL

Past project: SQL-aware Data Mining

5/10/2002 (c) Microsoft Corporation 5

Self-Tuning Database Systems: The AutoAdmin Experience

The Black Art of Database Tuning. . .

Applications

DBS

Workload

Performance

TuningGuru

SystemParameters

5/10/2002 (c) Microsoft Corporation 7

AutoAdmin: Motivation

Started in summer 1996 at Microsoft Research – team of 2Our Goal:

Make database systems self-tuning and self administering

Analogy: Cars

Reduce TCO

5/10/2002 (c) Microsoft Corporation 8

Vision of a Self Tuning System

Manager Sets goals, policy, and the budgetSystem does the rest

Everyone is a CIOBuild a system

Used by millions of people each dayAdministered and managed by a ½ time person

On hardware fault, order replacement partOn overload, order additional equipmentUpgrade hardware and software automatically

“What Next?A dozen remaining IT problems”

Turing Award Lecture,FCRC,

May 1999Jim GrayMicrosoft

5/10/2002 (c) Microsoft Corporation 10

Physical Design ImpactsQuery Execution

SELECT NameFROM EmployeesWHERE Age < 40 AND Salary > 200K

Execution Plan A: Filter (Age < 40 AND Salary > 200K)Table Scan (Employees)

Execution Plan B:Filter (Age < 40)Table Lookup (Employees) by Salary

5/10/2002 (c) Microsoft Corporation 11

Effect of Workload on Physical Design

Which column(s) should we index?Right answer may be:

SalaryAgeBothNeither!

Depends on the workload, and requires knowledge of statistics

SELECT NameFROM EmployeesWHERE Age < 40 AND Salary > 200K

SELECT NameFROM EmployeesWHERE Age < 20 AND Salary > 50K

5/10/2002 (c) Microsoft Corporation 12

AutoAdmin: Key ContributionsA What-if architecture for exploring the space of hypothetical designs (SIGMOD 98)

Workload drivenIntegrated physical database design tool(VLDB 97, VLDB 00)

Recommends indexes and Materialized ViewsPart of Microsoft SQL Server product since 1998

Statistics selection (ICDE 00, SIGMOD 02)Execution feedback driven statistics building(SIGMOD 99, SIGMOD 01)

5/10/2002 (c) Microsoft Corporation 13

“What-If” Architectures

5/10/2002 (c) Microsoft Corporation 14

“What-If” Architecture Overview

Query

Optimizer(Extended)

Database Server

Workload

AutoAdmin

Recommendation

“What-if”

Application

5/10/2002 (c) Microsoft Corporation 15

“What-If” Analysis of Physical Design

Estimate quantitatively the impact of physical design on workload

e.g., if we add an index on T.c, which queries benefit and by how much?

Without making actual changes to physical design

Time consuming Resource intensive

Search efficiently the space of hypothetical designs

5/10/2002 (c) Microsoft Corporation 16

Workload-driven Physical Design for Databases

5/10/2002 (c) Microsoft Corporation 17

Physical Database Design:Problem Statement

Workloadqueries and updates

ConfigurationA set of indexes, materialized views from a search spaceCost obtained by “what-if” realization of the configuration

ConstraintsUpper bound on storage space for indexes

Search: Pick a configuration that is of “lowest” cost for the given database and workload (VLDB 1997)

5/10/2002 (c) Microsoft Corporation 18

Architecture of Tuning Wizard in Microsoft SQL Server

Candidate Selection

Workload

Recommendation

ConfigurationEnumeration

Microsoft

SQL

Server

ServerExtensions

5/10/2002 (c) Microsoft Corporation 19

Search Space

Large Search Space for indexesMany columns to choose fromKinds of indexes

Explosive search space for materialized viewsQuery optimizers use physical design in novel waysPhysical design choices interact

5/10/2002 (c) Microsoft Corporation 20

AutoAdmin Milestones

Started in late summer 1996SQL Server 7.0: Ships index tuning wizard (1998)SQL Server 2000: Integrated recommendations for indexes and materialized Views Shared research results widely

5/10/2002 (c) Microsoft Corporation 21

Workload Driven Statistics Management

5/10/2002 (c) Microsoft Corporation 22

ExampleSELECT * FROM lineitem, ordersWHERE l_orderkey = o_orderkey ANDl_shipdate = '01-02-99' AND o_orderdate = '01-01-99'

orders lineitem

Index Nested Loop Join

Result

orders lineitem

Merge Join

Result

With stats Cost = 25

Without stats Cost = 112

5/10/2002 (c) Microsoft Corporation 23

Essential Set of Statistics

“Chicken-and-egg” problemCannot tell if additional statistics are necessary until we actually build them!Need a test for equivalence without having to build any statistics in (C – S)

S

C

5/10/2002 (c) Microsoft Corporation 24

ExampleSELECT E.EmployeeName, D.DeptName FROM Employees E, Department D WHERE E.DeptId = D.DeptID AND E.Age < 40 AND E.Salary > 200KStatistics on E.Age are missingMay not need statistics on E.Age if predicate E.Salary > 200K is very selective

5

/

1

0

/

2

0

0

2

(

c

)

M

i

c

r

o

s

o

f

t

C

o

r

p

o

r

a

t

i

o

n

2

5

F

o

r

m

a

l

i

z

i

n

g

E

s

s

e

n

t

i

a

l

S

t

a

t

i

s

t

i

c

s

O

u

r

G

o

a

l

:

F

i

n

d

a

s

u

b

s

e

t

t

h

a

t

i

s

a

s

g

o

o

d

a

s

h

a

v

i

n

g

a

l

l

s

t

a

t

i

s

t

i

c

s

b

u

t

a

v

o

i

d

p

r

i

c

e

o

f

m

a

i

n

t

a

i

n

i

n

g

a

l

l

F

o

r

g

i

v

e

n

w

o

r

k

l

o

a

d

W

h

a

t

i

s

a

s

g

o

o

d

?

t

-

O

p

t

i

m

i

z

e

r

-

C

o

s

t

e

q

u

i

v

a

l

e

n

c

e

C

o

s

t

(

Q

,

C

)

a

n

d

C

o

s

t

(

Q

,

S

)

a

r

e

w

i

t

h

i

n

t

%

o

f

e

a

c

h

o

t

h

e

r

P

u

b

l

i

c

a

t

i

o

n

:

I

E

E

E

I

C

D

E

2

0

0

5/10/2002 (c) Microsoft Corporation 26

Essential Statistics(IEEE ICDE 2000)

In the absence of statistics:Query Optimizers use “magic numbers” for selectivity of predicates

For Age < 40, assume selectivity = 0.30Data distribution independent

MNSA (Magic Number Sensitivity Analysis)Set magic numbers to a few different valuesIf varying selectivity does not affect plan

⇒⇒⇒⇒ additional statistics will not help Else⇒⇒⇒⇒ Select a “promising” statistics to build

5/10/2002 (c) Microsoft Corporation 27

Statistics on Queries

Reduce optimizer error by building statistics on query expressions (SIT)A very promising ideaLike materialized views – a manageability challenge Recent work from AutoAdmin (SIGMOD 2002)

5/10/2002 (c) Microsoft Corporation 28

Execution Feedback Driven Statistics Building

5/10/2002 (c) Microsoft Corporation 29

Self-Tuning Statistics

Think Maps Why care about maps for Greenland? Need detailed maps for areas you visitMake maps more detailed each time you visit

Idea: Start with “uniformity” assumptionProgressively refine with execution feedbackSingle and multidimensional histograms SIGMOD 99, SIGMOD 2001

5/10/2002 (c) Microsoft Corporation 30

More on Self-Tuning Database Systems

More at MicrosoftSQL Server 7.0 introduced several auto-tuning features

IBM AlmadenWork by Mario and Shel LEO at IBM ARC has similar goals as AutoAdmin

5/10/2002 (c) Microsoft Corporation 31

Rethinking Database Systems

5/10/2002 (c) Microsoft Corporation 32

Featurism hurts Self-TuningFeaturism has turned into a curse

Yet another indexing smart /join method/optimizer transformation added

Abusing ExtensibilityEliminate all second-order optimizations

Turning into black magicHard to abstract principlesCannot educate next generation of engineersPerformance is unpredictable

Self-Tuning is difficult

5/10/2002 (c) Microsoft Corporation 33

Role ModelsEx. 1: Aircraft with many subsystems (engine, fuselage, electrical control, etc.)Ex. 2: RISC hardwareNo single engineer understands entire system

Local theories for individual subsystems andreasonable understanding of interactions

Few points of interaction with stable and narrow interfacesBuilt-in system support for debugging subcomponents (incl. Performance tuning)

5/10/2002 (c) Microsoft Corporation 34

RISC Philosophy for DBMS

Details in VLDB 2000 vision paperPackage as components with simplified functionalityEnforce

Layered approachStrong limits on interaction (narrow APIs)Multiple consumers for a component

Components must have manageable complexityEncapsulation must include predictable performance and self-tuningNot a new idea – but an idea worth revisiting

5/10/2002 (c) Microsoft Corporation 35

Final WordsDBMS has to be self-tuning to be a good software componentAutoAdmin

Exploit workload and execution feedback richly for enabling self-tuningDemonstrated through technology incorporated in Microsoft SQL Server

Despite advances, self-tuning remains a very formidable challenge

Need to think “self-tuning” globally by paying attention “locally”RISC DBMS architectures – worth revisiting?

5/10/2002 (c) Microsoft Corporation 36

More Information

Data Management, Exploration and Mining Group Homepage

http://research.microsoft.com/dmx

Microsoft SQL Server White papers on Self-Tuning technologyMy contacts

http://research.microsoft.com/users/surajitcsurajitc@microsoft.com

top related