Top Banner
DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes Gary Valentin, Michael Zuliani, Daniel C. Zilio IBM Toronto Lab valentin,zuliani,[email protected] Guy Lohman IBM Almaden Research Center [email protected] Alan Skelley University of Toronto [email protected] Abstract This paper introduces the concept of letting an RDBMS Optimizer optimize its own environment. In our project, we have used the DB2 Optimizer to tackle the index selection problem, a variation of the knapack problem. This paper will discuss our implementation of index recommendation, the user interface, and provide measurements on the quality of the recommended indexes. 1. Introduction The performance of queries in a relational database man- agement system (RDBMS) has always been very sensitive to the indexes that exist on the tables in a database. Tradi- tional B+-tree indexes can speed the execution of a query in one or more of the following ways: Applying predicates, i.e. by limiting the data that must be accessed to only those rows that satisfy those pred- icates; Ordering rows, i.e. to apply ORDER BY, GROUP BY, or DISTINCT clauses, or to merge-join a table with another table; Providing index-only access, i.e. to save having to ac- cess data pages by providing all the columns needed by a query; Enforcing uniqueness, i.e. by restricting the index to one row identifier per key value. Specialized indexes may provide other advantageous as- pects to query execution, such as statistics on the number of keys. Since the advent of relational DBMSs, researchers have attempted to automate the design of databases, including the selection of indexes that would best serve a particular workload of queries. An index may have multiple columns as key columns, and the ordering of those columns is sig- nificant. Given that real applications such as SAP can have tens of thousands of tables, each table can have hundreds of columns, and a typical workload can have thousands of queries, the number of possible indexes to consider is stag- gering. Finding the set of indexes that optimize a workload of complex, multi-table queries having varying importance and subject to resource constraints, is a daunting combina- torics challenge. Initially these design tools were completely separate from the DBMS engine itself. They independently pro- posed candidate indexes and attempted to evaluate the cost and benefit of each set of candidate indexes. A major ad- vance in the design tools was the use of the engine’s opti- mizer to evaluate the cost of queries, given a set of candi- date indexes [FST 88]. This advance prevented duplication of the optimizer’s cost model in the design tool, and ensured consistency with the optimizer’s choice of index when the recommended indexes were subsequently created. This paper presents what was done as the next logical step: Have the engine’s optimizer recommend candidate in- dexes, as well as evaluate their benefit and cost. The DB2 Advisor, new in IBM’s DB2 Universal Database (UDB) V6.1, utilizes a component in the optimizer that recom- mends candidate indexes based upon an analysis of each query, and then evaluates those indexes, all in one call to the engine! This approach significantly improves the quality of the indexes that are considered, and speeds the evaluation of alternatives by reducing the number of calls to the engine. By modeling the index selection problem as a variant of the well-known Knapsack Problem [GN 72], the DB2 Advisor is also able to optimize large workloads of queries in a rea- sonable amount of time. The remainder of this paper is structured as follows. Sec- tion 2 describes the overall architecture of the DB2 Advisor
10

DB2 Advisor: An Optimizer Smart Enough to Recommend Its ... · DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes Gary Valentin, Michael Zuliani, Daniel C. Zilio

Mar 16, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DB2 Advisor: An Optimizer Smart Enough to Recommend Its ... · DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes Gary Valentin, Michael Zuliani, Daniel C. Zilio

DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes

GaryValentin,MichaelZuliani, DanielC. ZilioIBM TorontoLab

valentin,zuliani,[email protected]

Guy LohmanIBM AlmadenResearch Center

[email protected]

Alan SkelleyUniversityof [email protected]

Abstract

Thispaperintroducestheconceptof letting an RDBMSOptimizeroptimizeits ownenvironment.In our project,wehaveusedthe DB2 Optimizerto tackle the index selectionproblem,a variation of the knapack problem. This paperwill discussour implementationof index recommendation,theuserinterface, andprovidemeasurementsonthequalityof therecommendedindexes.

1. Introduction

Theperformanceof queriesin arelationaldatabaseman-agementsystem(RDBMS) hasalwaysbeenvery sensitiveto the indexesthatexist on the tablesin a database.Tradi-tionalB+-treeindexescanspeedtheexecutionof aqueryinoneor moreof thefollowing ways:

� Applying predicates,i.e. by limiting thedatathatmustbeaccessedto only thoserows thatsatisfythosepred-icates;

� Orderingrows,i.e. to applyORDERBY, GROUPBY,or DISTINCT clauses,or to merge-join a table withanothertable;

� Providing index-only access,i.e. to save having to ac-cessdatapagesby providing all the columnsneededby a query;

� Enforcinguniqueness,i.e. by restrictingthe index toonerow identifierperkey value.

Specializedindexes may provide other advantageousas-pectsto queryexecution,suchasstatisticson the numberof keys.

Sincetheadventof relationalDBMSs,researchershaveattemptedto automatethe designof databases,including

the selectionof indexesthat would bestserve a particularworkloadof queries.An index mayhave multiple columnsaskey columns,andthe orderingof thosecolumnsis sig-nificant. GiventhatrealapplicationssuchasSAPcanhavetensof thousandsof tables,eachtablecanhave hundredsof columns,anda typical workloadcanhave thousandsofqueries,thenumberof possibleindexesto consideris stag-gering.Findingthesetof indexesthatoptimizea workloadof complex, multi-tablequerieshaving varyingimportanceandsubjectto resourceconstraints,is a dauntingcombina-toricschallenge.

Initially thesedesign tools were completely separatefrom the DBMS engineitself. They independentlypro-posedcandidateindexesandattemptedto evaluatethecostandbenefitof eachsetof candidateindexes. A major ad-vancein the designtools wasthe useof the engine’s opti-mizer to evaluatethe costof queries,given a setof candi-dateindexes[FST88]. This advancepreventedduplicationof theoptimizer’scostmodelin thedesigntool,andensuredconsistency with the optimizer’s choiceof index whentherecommendedindexesweresubsequentlycreated.

This paperpresentswhat was doneas the next logicalstep:Havetheengine’soptimizerrecommendcandidatein-dexes,aswell asevaluatetheir benefitandcost. The DB2Advisor, new in IBM’ s DB2 Universal Database(UDB)V6.1, utilizes a componentin the optimizer that recom-mendscandidateindexes basedupon an analysisof eachquery, andthenevaluatesthoseindexes,all in onecall to theengine!Thisapproachsignificantlyimprovesthequalityoftheindexesthatareconsidered,andspeedstheevaluationofalternativesby reducingthe numberof calls to the engine.By modelingtheindex selectionproblemasavariantof thewell-known KnapsackProblem[GN 72], theDB2 Advisoris alsoableto optimizelargeworkloadsof queriesin a rea-sonableamountof time.

Theremainderof thispaperis structuredasfollows.Sec-tion 2 describestheoverallarchitectureof theDB2 Advisor

Page 2: DB2 Advisor: An Optimizer Smart Enough to Recommend Its ... · DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes Gary Valentin, Michael Zuliani, Daniel C. Zilio

andits userinterface. Section3 detailshow the optimizerthroughthe recommendationalgorithm is able to recom-mendthebestindexesfor a givenquery. Section4 presentsthe algorithm to extend this conceptbeyond just a singlestatementat a time to a workloadof SQL statements,andsubjectto resourceconstraints(suchasdisk space). Sec-tion 5 contrastsDB2 Advisor with previouswork on indexrecommendation.In Section6, we give somepreliminaryperformancemeasurements,both of the time for the algo-rithm to run andof the resultingexecutiontime for work-loadsbenefittingfrom theAdvisor’s advice.Section7 dis-cussesfuturework.

2. Architecture

At thehighestlevel, theDB2 Advisor worksasa black-box index-recommendationengine.Theblack-boxhastwoinputs:asetof SQLstatementsknown astheworkload,andstatisticsdescribingthe target database.Thereis only oneoutput:therecommendedindexes.

Architecturally, theDB2 Advisorconsistsof:

Index SmartGuide A graphicaluserinterface

db2advis A command-linedriven utility for recommend-ing indexes

Optimizer Extensionshavebeenwritten into theDB2 Op-timizer for the recommendationof indexesaswell astheir evaluation

Advise tables Thesenew tablesarecreatedfor thepurposeof advising,andthey areusedasa communicationve-hiclebetweendb2advisandtheOptimizer

The preferredmethodof invoking the Index Advisoris through the GraphicalUser Interfacecalled the IndexSmartGuide.We have includedheretwo screensnapshotsof the Index SmartGuide.Thescreensnapshotin Figure2shows how the usercanspecifya workloadof statements.TheSmartGuideautomaticallysearchesfor SQLstatementsandtheir frequency of executionin theSQL cacheandim-portsthem.Effectively, theDB2 dynamicSQLcachestoresrecently-executedSQL statements.The Index SmartGuidealsoimportsSQLstatementsfrom statically-compiledSQLstatements,whichareknown aspackagesin DB2 terminol-ogy. Othersourcesof SQL statementsinclude the QueryPatroller load schedulingproduct,and recentlyexplainedSQL statements.Lastly, statementscan be enteredman-ually or usingcut-and-paste.The workload is storedin auser-ownedtable,calledADVISE WORKLOAD.

In otherwindows of theSmartGuide(seethetabsat thetop), theusermayoptionallyspecifyconstraintson thees-timateddisk to beconsumedby all indexesrecommended,

User InterfaceIndex SmartGuide Graphical

db2advis command-linetool

Database "Sample"

DB2 OptimizerSQL Cache

DB2 Universal Database

USER

System Memory System Disk

Database Statistics

Advise Tables

Packages

Data

Figure 1. Architecture of DB2 Advisor

or onthemaximumtimefor DB2 Advisor to spendimprov-ing its recommendations.For example,theusercanrequireDB2 Advisor to work for no morethan5 minutes,andtorecommendthatall indexesconsumeno morethan5 Giga-bytes.

The SmartGuidethencalls the db2advisutility, an ap-plication program that contains the major optimizationlogic of DB2 Advisor. For eachstatementin the AD-VISE WORKLOAD table,it invokesthe DB2 UDB Opti-mizerin oneof two new EXPLAIN modesthateitherREC-OMMEND INDEXES or EVALUATE INDEXES. TheOptimizerstorestheindexesit recommendsin anotheruser-ownedtable,calledADVISE INDEX. Thescreensnapshotin Figure3 shows theindex recommendedby db2advisfortheworkloadof Figure2. By clicking on the”Show work-loaddetails”button,theusercanseehow muchtherecom-mendedindexeswill benefiteachstatementin theworkload.

Alternatively, the usermay invoke the db2advisutilitydirectly from the commandline, providing options forspecifyingthe database,the workloadof SQL statements,the constraints, and various other options. Example1 shows the invocation of db2advis for a single SQLstatementin the ”sample” database. In less than twoseconds,DB2 Advisor determinesthe best indexes tocreateand the estimatedimprovement in the executiontimeif they werecreated,aswell astheDDL to createthem.

EXAMPLE 1:$ db2advis -d sample -s "select * fromt1,t2 where t1.c1 = t2.c2"

execution started at timestamp 1999-07-06-19.02.32.617867Calculating initial cost (without recomm-

Page 3: DB2 Advisor: An Optimizer Smart Enough to Recommend Its ... · DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes Gary Valentin, Michael Zuliani, Daniel C. Zilio

Figure 2. Specifying a workload of statements

mended indexes) [82.237053] timeronsInitial set of proposed indexes is ready.Found maximum set of [2] recommended in-dexesCost of workload with all indexes included[25.879040] timeronstotal disk space needed for initial set [2]MBtotal disk space constrained to [-1] MB

2 indexes in current solution

[ 82.2371] timerons (without indexes)

[ 25.8790] timerons (with current solu-tion)

[%68.53] improvement

Trying variations of the solution set.

--

-- execution finished at timestamp 1999-07-

06-19.02.34.154307

--

--

-- LIST OF RECOMMENDED INDEXES

-- ===========================

-- index[1], 1MB

CREATE INDEX WIZ0 ON "VALENTIN"."T2" ("C2"

ASC) ;

-- index[2], 1MB

CREATE INDEX WIZ2 ON "VALENTIN"."T1" ("C1"

ASC) ;

-- ===========================

--

Index Advisor tool is finished.

The algorithmfor the index-recommendationengineindb2advisis coveredin thenext two sections.Thefirst sec-tion will discussthesimplecaseof recommendingindexesfor asingleSQLstatement.Thesubsequentsectionextendsthealgorithmto accommodatefor a workloadof queries.

3. Single query optimization

The algorithm for recommendingindexes is an exten-sionof theexistingprocessfor optimizinganSQL queryintheDB2 Compiler. Theold processis augmentedwith theinjection of a multitudeof ”virtual indexes” - hundredsofindexes whosemetadatahasbeentemporarily introducedinto the schemaonly for the durationof the optimizationprocess.

To illustrate the approach,supposethat all possiblein-dexes were temporarily injected into the schemamodel.TheDB2 Compilerwould thenbe facedwith its usualop-timization process,except that therewould be a lot moreindexesin the schemato consider. Whenthe optimizationprocesshascompleted,theDB2 Compilerproducestheop-timal QueryAccessPlan. If this plancontainsoneor morevirtual indexes,thentheseindexesaretherecommendedin-dexes. Effectively, we let the optimizerchoosewhich in-dexesit likes.

Page 4: DB2 Advisor: An Optimizer Smart Enough to Recommend Its ... · DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes Gary Valentin, Michael Zuliani, Daniel C. Zilio

Figure 3. View the recommended indexes

In practice,thereareproblemswith this approach.Themostimmediateissueis thattheenumerationof all possibleindexesproducesa working setwhich is too big. In DB2,a tablewith n columnscansupporta very largenumberofindexes,asshown by Formula1.

Formula 1 (Number of Possible Indexes) Given a tablewith � columns,howmanydifferent indexescanexist con-taining

�columns,where

����� � ? Thereare � choicesforthefirst columnin the index. For thesecondcolumn,thereare ��� remainingchoices.Asmore columnsare added,thetotal numberbecomes���� �������� ��������� ������������ ��� � or ����� ����� � �� . Therefore thetotal numberof indexesthatcanbecreatedon a tablewith � columnsis

!"#�$&%

�������� � ��

However, in DB2 UDB, each columnof an index mayindividually be definedas either ascendingor descending.Therefore, for a given

�, the spaceof possibleindexesis

multipliedby � # . Asa result,weadjustour first formulatobecome:

!"#�$&%

�'� # (�������� � ��

Therefore,in practice,therehasto bealimit onthenum-berof virtual indexesenumerated.TheDB2 Advisor limits

thenumberof virtual indexesby usingtheDB2 Optimizeritself to suggestindexesintelligently, baseduponits knowl-edgeof how it wantsto evaluatea given query. We callthis approachthe ”Smart column Enumerationfor IndexScans”(SAEFIS)enumerationalgorithm. This algorithmanalyzesthe statementpredicatesand clausesto producesetsof columnsthat might be exploited in a virtual index.Thereare5 suchsets:

EQ columnsthatappearin EQUAL predicates

O columnsthat appearin the INTERESTING ORDERSlist. This includescolumnsfrom ORDER BY andGROUPBY clauses,or join predicates.

RANGE columnsthatappearin rangepredicates

SARG columnsthat appearin any predicatesbut nestedsubqueriesor thoseinvolving a largeobject(LOB).

REF Remainingcolumnsreferencedin theSQLstatement.

Thenvariouscombinationsof (subsetsof) thesesetsareformed,in order, eliminatingany duplicatecolumns:

1. EQ+ O

2. EQ+ O + RANGE

3. EQ+ O + RANGE + SARG

4. EQ+ O + RANGE + REF

Page 5: DB2 Advisor: An Optimizer Smart Enough to Recommend Its ... · DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes Gary Valentin, Michael Zuliani, Daniel C. Zilio

5. O + EQ

6. O + EQ+ RANGE

7. O + EQ+ RANGE + SARG

8. O + EQ+ RANGE + REF

As a safetynet to make surethe simplestindexes arenot missed,andto evaluatehow well theSAEFISapproachworks,we have alsoimplementedanalgorithmto enumer-ateall possibleindexes,stoppingafter a certainmaximumnumberof indexesis reached.We call this the”Brute Forceand Ignorance”(BFI) enumerationalgorithm. Thereareseveralwaysto implementthisenumeration.Wehavetakena simplerecursive algorithmandextendedit to accommo-datefor ascending/descendingcolumns.

Hereis a pseudo-codeversionof thefinal recommenda-tion algorithm:

ALGORITHM 1:RECOMMEND INDEXES(StatementS)

1. Enable”RECOMMEND INDEXES” mode

2. EntertheDB2 Optimizer

3. Inject the schemawith virtual indexesusingSAEFISandgeneratetheir statistics

4. Inject the schemawith virtual indexesusingBFI andgeneratetheir statistics

5. Constructthebestplanfor S by calling theDB2 Opti-mizer

6. Scantheoptimalplan,searchingfor virtual indexes

7. Submit these indexes back to the user as ”recom-mended”.

Theessenceof this algorithmis that theDB2 Optimizerbothsuggestscandidateindexesandmakesthedecisiononwhichindexesperformbest.Importantly, bothstepshappenin a singlecall to theDB2 UDB engine.This approachhasmany advantages.Thefirst advantageis that theefficiencyof the recommendationprocessis maximizedby enteringthe DB2 Optimizer(andhencethe DB2 Engine)just oncepersinglequery. Thesecondadvantageis thatnosecondaryor externaloptimizeris needed,eitherto suggestcandidateindexesor to evaluatetheir cost. This reducesthe main-tenanceof codethat is redundantof the Optimizer’s costequations.Instead,by having the Optimizer itself simplyinject likely-looking virtual indexes, and estimatingtheirstatistics,wehaveeasilyextendedtheDB2 Optimizerfroman SQL Optimizer into an index selectionoptimizer. Thelast advantageis that the DB2 Optimizerdoesnot needtobe significantlymodified. Oncethe virtual indexesarein-jected, the Optimizer continuesworking as it always has

by enumeratingplans,join orderings,andaccessmethods.Only a small amountof codewaswritten in orderto enu-meratevirtual indexesandinject theminto theschema.

Note that Algorithm 1 could be usedas a subroutinewithin any existing Index Recommendationalgorithm,notjustouralgorithm,whichis detailedin Section4 below. Forexample,it couldbepluggedin asamethodfor enumeratingindexesin DanielZilio’ s Branch-and-Boundbasedmethod[Zilio 98] or in Whang’sDrop-basedmethod[Whang85].

3.1. Index Statistics

Oncethe index columnsaredefined,the optimizerstillrequiresstatistical information about each virtual index.Without properstatistics,the optimizer will be unabletoevaluatethe cost of scanningan index, fetching selectedrows from anindex, or updatinganindex.

Thestatisticsfor virtual indexesaregeneratedbasedonthecorrespondingtableandcolumnstatistics,deducingin-formation on index cardinalities,B+-Tree levels, and thenumberof leaf pagesandnon-leafpages.Somepropertiescannotbe deducedeasily, suchas clusteringand unique-ness. For theseproperties,we assignpessimisticvalues.For example,we assumethattherewill beno clusteringonthetablepertheindex order. This behaviour allows theop-timizer to be cautiousasit usesvirtual indexes,andavoidcostingtheseindexesat performancelevels which cannotbeguaranteed.

Thestatisticsfor eachvirtual indexesarederivedasfol-lows:

Index Key Width, KW: the sum of the averagewidth ofeachcolumnin theindex definition.

Index Clustering: none(worst-casevalue).

Index Density: none(worst-casevalue).

Percent Free: DB2 default,15%.

Cardinality of an index with�

columns, FKCARD:)+*-,/.�0/1 �3254 �76 ,/.80/1�9�: #; $&% ,+</=>,/.80/1 ;@?where

CARD: cardinalityof thetable

COLCARD ; : cardinality(i.e. numberof distinctval-ues)of the

4th columnof theindex

Number of Leaf Pages, NL: calculated from the indexcardinality, page size, overheadsfor each key andpage,assumingeachpageis fully packed with keys,usingthefollowing formula:*BA+A �DC&EGF�HJI�KLCNM�OP8QSRNP M�OTU= �WV&XLYZY P I\[�]&^`_`aP CNC

Page 6: DB2 Advisor: An Optimizer Smart Enough to Recommend Its ... · DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes Gary Valentin, Michael Zuliani, Daniel C. Zilio

where:

KPP: keysperpage

PSIZE: pagesize (can be 4096 or 8192 bytes in DB2UDB)

POH: pageheaderoverheadin a leafpage

KOH: key overhead.

Total Number of Non-Leaf Pages, TNL: calculatedfromnumberof leaf pages,key size,andpageoverheadasa recursive function. The recursionstartsat the leaflevel, andcomputesthenumberof pagesateachlevel,continuinguntil thenumberof pageshasreachedone(representingtherootnode):bcA+TU= �dC&EGF�HJI�KLe Y CNM�OP8QSR e Y I�M&OTU=gfihkj � )ml/=n=o*-bcpc,/.80/1TU=gf ! j � e Yrqtsvuxw�yI&CNe YTU=ob{z+bc=n| � �

}8TB= � !"; $&%

TB=gf ; j

where:

EPNL numberof entriespernon-leafpage

NLPOH pageheaderoverheadin anon-leafpage

NLEOH overheadof anentryin a non-leafpage

NLf ; j numberof non-leafpagesin level

4NLEVELS numberof levelsin theindex

4. Workload Optimization

In thissection,wewill presenttheextensionsto thealgo-rithm that permit the DB2 Advisor to recommendindexesfor a workloadof statements.

Ideally, we would optimize the recommendationof in-dexesfor aworkloadof statementsin asingleinvocationoftheDB2 Optimizer. Thereis amethodof usinganoptimizerto work on several statementsin oneinvocation,which iscalledMassQueryOptimization(MQO). Today, however,no commercialRelationalDatabaseproductsupportsMassQueryOptimization,and thereforethis wasnot an optionfor theDB2 Advisor.

As seenbeforein Figure 1, the DB2 Advisor hasasacomponenta utility calleddb2advis.In thisutility, we haveaddedan index-selectionalgorithmwhich usesthe resultsof thesingle-queryrecommendationsasastartingpointand

searchesfor the optimal combinationof indexesfor a fullworkload.

The workload optimization algorithm contained indb2advismodelsthe index selectionproblemasan appli-cationof the classicKnapsackProblem,a specialtype of0-1 integer programming[GN 72]. Eachindex is an itemthatmayor maynot beput into theknapsack,asindicatedby a variablefor that index thatcanbe0 or 1 (a partof anindex is useless).Eachindex alsohasanassociatedbenefitandsize.Thebenefitfor anindex is definedastheimprove-mentin estimatedexecutiontime thatan index contributesto all queriesthat exploit it, timesthe frequency that eachqueryoccursin theworkload.Thesizeis just theestimatedsizeof theentireindex, in bytes.Theknapsackhasa fixedmaximumsizefor all itemsin the solution. The objectiveis to maximizethe benefitof all itemsin the knapsack.Iftheintegrality constraintis relaxed,it is well known thattheoptimalsolutionacceptstheentitiesinto theknapsackin or-derof decreasingratio of benefitto size,until theknapsackis full.

Thereare,however, a few complicationsin our straight-forwardapplicationof theKnapsackProblem.First of all,we have relaxedintegrality, but in reality it makesno senseto have a fraction of an index. Secondly, negative bene-fit accruesfor updatingeachindex in UPDATE, INSERT,andDELETE statementsto that index’s table. But at thetime we computethebenefitfor suchstatements,we don’tyet know all theindexesthatmight becreatedby RECOM-MEND INDEX. Thirdly, we have attributedall thebenefitresultingfrom a setof indexesto every index in a query. Inreality, thebenefitof eachindex is a functionof whatotherindexesexist (i.e. thebenefitof index A candiffer whenin-dex B is presentor absent),andattributing all thebenefittoevery index of thequeryis double-counting.This relatestotheconceptof ”separability”,discussedin thenext section.To adjustfor all of thesecomplications,werefinetheinitialsolution found by the Knapsackorder in a routine calledTRY VARIATION, which createsa variantof the solutionby randomlyswappingasmallsetof indexesin thesolutionfor a small setof indexesnot in the solution. The work-load is thenre-EXPLAINedwith this variantsetof virtualindexesin theEVALUATE INDEXESEXPLAIN mode.Ifthe variantsolutionis cheaperoverall, it becomesthe cur-rentsolution. TRY VARIATION continuesuntil theuser’stimebudgethasbeenexhausted.

Algorithm 2 describesthe algorithm of db2advisfor aworkloadW of SQL statements:

ALGORITHM 2:

1. GetWorkloadW, includingthefrequency of executionof eachstatement.

2. R = ~3. For eachStatementS in W,

Page 7: DB2 Advisor: An Optimizer Smart Enough to Recommend Its ... · DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes Gary Valentin, Michael Zuliani, Daniel C. Zilio

(a) EXPLAIN S with existing indexes, returningS.costwith existing indexes.

4. For eachStatementS in W,

(a) EXPLAIN S in RECOMMEND INDEX mode,i.e. with virtual indexes

(b) R = R � RECOMMEND INDEXES(S)

5. For eachindex I in R

(a) I.benefit = S.costwith existing indexes -S.costwith virtual indexes

(b) I.size= bytesin index

6. Sortindexesin R by decreasingbenefit-to-costratio.

7. Combine any index subsumedby an index with ahigherratiowith thatindex.

8. Acceptindexesfrom setR until disk constraintis ex-hausted.

9. while (timedid not expire) repeat

(a) TRY VARIATION

As statedbefore,thefinal stepcanbeallowedto processfor any lengthof time. This allows for flexibility in variouscases:Whereafeasiblesolutionis neededquickly, thealgo-rithm canbegivenlessprocessingtime; whenobtaininganoptimal solutionis paramount,the algorithmcanbe givenmoreprocessingtime.

5. Comparison with Previous Work

Many papershavebeenwrittenonthissubject.TheDB2Advisor is uniquebecauseit canrecommendindexesfor anSQL statementwithin a singlecall to theRDBMS engine,usingtheDB2 Optimizerfor theoptimization.

Early designs for index recommendationsstarted inthe eighties [ISR 83], [BPS90], [FON 92], [CFM 95],[GHRU 97], [CBC 93], [Whang85]. Theseearly papershadseveralshortcomings.First, they wererestrictedby ex-isting technology. For example,noneof thesepapersusedanoptimizerfor costestimates.Onepossiblereasonis thattheexistingoptimizerswouldnotexternalizetheircostesti-mates.Thesepapersdid, however, identify thenatureof theproblemasa variationon theclassicKnapsackProblem.

Secondly, with theexceptionof [GHRU 97], all of thesepapersconcernedthemselvesonly with single-columnin-dexes.[Whang85] hadaninterestingaddition,proposingaDROPoptimizationalgorithmfor theindex selectionprob-lem,asopposedto a rule-basedoptimization.

Anotherweaknessin theseearlyalgorithmswastheas-sumptionof separability. [Whang85] madethecasethatin-dex selectionfor eachrelationcanbe madeindependentlyof other relations. This assumptiongreatly simplifies theselectionproblem,but is this assumptioncorrect? In fact,it is incorrectin many commoncases.For example,in thecaseof a nested-loopjoin betweentwo relations

0and

|,

the presenceof an index on relation0

reducesthe poten-tial needfor anindex on relation

|, andvice-versa,solong

asoneof the two relationshasan index so that it canap-ply the join predicateon the innerrelation. Obviously, thisassumptionis flawed.

Later solutionshave usedthe RDBMS enginefor eval-uatingsolutionsets,but never for recommendingcandidateindexes. The recommendationprocessalwaysoccursin amoduleexternal to the RDBMS engine. Theselatter de-signsinclude[FST 88], [CN 98b], and[Zilio 98].

[Zilio 98] recognizedthe strong interdependencebe-tweenindexesandpartitioningkeys. Zilio’ simplementationrecommendedpartitioning keys as well as indexes. Zilioused a branch-and-boundoptimization algorithm, whichtypically takeslongerto find the optimal solutionthanthebenefit-to-sizeratioorderingof db2advis.

[CN 98b] was implementedin a commercialRDBMS,Microsoft SQL Server. Chauduri& Narasayyahave madean essentialcontribution by combiningthe advantagesofsingle-columnrecommendationwith multi-column opti-mizationalgorithms.By consideringindex candidateswithasmallnumberof columns,they aremorelikely to optimizefor several queriesusing the samecandidateindexes,andstill squeezeinto smalldisk-constraintsor smallknapsacks.Taking this into account,their designstartsby consideringsingle-columnindexesfirst, andworking on wider indexesas time permits. Their goal was to reducethe numberofoptimizerinvocations.

However, thesameadvantageof reducingthenumberofoptimizercallscanbeachievedby placingtheenumerationalgorithminsidethe optimizer. That is the key to our im-plementation,andwe believe it to be the bettertechnique.The differenceis most dramaticon a single-querybasis,whereour algorithmrecommendsindexesin a singleopti-mizerinvocation.Anotheradvantageof this algorithmover[CN 98b] is therecommendationof wider indexes,intrinsicin the SAEFISalgorithm. The SAEFISenumerationcon-sidersthethreemostlikely usesof theindex scanandcom-binationsthereof. Yet anotheradvantageis that the enu-merationoriginatesinsidethe DB2 engine,leveragingtheexisting optimizer, andthusreducingmaintenancecostsoftwo distinctoptimizers.

Page 8: DB2 Advisor: An Optimizer Smart Enough to Recommend Its ... · DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes Gary Valentin, Michael Zuliani, Daniel C. Zilio

6. Performance Measurements

Thereare several performanceaspectsthat needto beaddressedby DB2 Advisor. Thefirst concernis thequalityof therecommendedindexes.How goodarethey? This is adifficult questionto answer, but wehaveobservedtwo caseswheretheDB2 Advisorhasbeenused.

Thefirst suchcasewaswith theTPCDworkload,anin-dustrybenchmarkfor decisionsupport.RunningDB2 Ad-visor on theTPCDV1 workloadshowed that, in 14 out ofthe17 queries,DB2 Advisor recommendedindexeswhichperformedoptimally, or asnearto optimal asis known totheDB2 TPCDteam.In theremaining3 queries,theDB2Advisor missedsomekey indexes. The reasonfor this isthat theseindexeshadto be definedasUNIQUE in orderto take advantageof the improvement. But unfortunatelyweplacedtherestrictionon theDB2 Advisornot to recom-mendUNIQUE indexes,becauseuniquenessis application-dependantandcannotnecessarilybededucedfrom theex-istingdata.

In anothercase,theDB2 Advisor wasfacedwith a verycomplex machine-generatedquerythatranin over48hours.After the creationof three indexes recommendedby theDB2 Advisor, theelapsedtimereducedto 11 minutes.Thisshows the dramaticeffect that automaticrecommendationcanhave in thosecaseswherea humaneye is not availableto analyzetheincomingSQL,or thequeryis too complex.

Another aspectof operatingperformanceis the execu-tion time of DB2 Advisor. Becausethe DB2 Advisor canbeinterruptedatany time,thenhow muchtimeshouldit beallowedto execute,beforetherecommendationsare”goodenough”?

In orderto answerthis question,we exercisedthe DB2Advisoragainsta1GBTPCDdatabase,with 6 levelsof diskconstraints.Theresultsappearin Figure4 andFigure5. Astheresultsin Figure4 indicate,within 90seconds,all levelsof constraintshad madea contribution to performanceof50%to 88%.This is reflectedin theabruptdropthatoccursbetween60 and90 seconds.This improvementshows thatmuchof the benefitof new indexesis achieved soonafterthe initial optimizerpass,asseenin steps1 through8 ofAlgorithm 2.

The benefitsof allowing small permutationsof that ini-tial solution in step9 of Algorithm 2 is seenin Figure5,the detailedimprovementchart. In this example,optimalindexeswerefoundafter6 minutes,but thetime to achieveoptimality is very dependentuponthesizeandcomplexityof theworkload.

7. Future Work

Oneof thestrongestfeaturesof our algorithmrelatestofuturework. Oneof thefuturedirectionsfor this projectis

Figure 4. Quality of recommended indexesover time

Figure 5. Quality of recommended indexesover time (detail)

Page 9: DB2 Advisor: An Optimizer Smart Enough to Recommend Its ... · DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes Gary Valentin, Michael Zuliani, Daniel C. Zilio

to extendthisalgorithmto therecommendationof material-ized views and indexeson materializedviews. Currently,the selectionof materializedviews in DB2 is performedmostly at the QueryReWrite level - not in the Optimizer.This meansthat the re-routing decisionsare madeusingrules,ratherthanallowing theOptimizerto evaluateseveralalternativesaccordingto estimatedcost.However, thereareefforts underway to allow morere-routingdecisionsto bemadein theOptimizer, aswell aseffortsto addMassQueryOptimizationto the DB2 Optimizer. This will provide theopportunityfor moreadvancedadvisingthatgoeswell be-yondindexes.

Anotherfuturedirectionis to usethismethodfor therec-ommendationof partitioningkeys in a paralleldatabaseen-vironment.It is possibleto plug alternativepartioningkeysinto the DB2 Optimizer and then evaluatethem using anoutsideenginesuchasDanielZilio’ s PhysicalDesignrec-ommendationtool [Zilio 98]. At the heartof this technol-ogy is theefficiency of usingtheinternaloptimizerasmuchaspossible.This avoidsdoingexcessive calls in andout oftheRDBMS,andavoidshaving to duplicatetheoptimizer’sintelligenceoutsidetheengine.

We arelooking at expandingthe conceptto includethesuggestionof all database-relatedconfiguration: Includ-ing datalayout,datapropertiessuchasreferentialintegrityand constraints,partitioning keys, clustering,reorganiza-tion, andstatisticscollection.

Currently, this technologygreatlysimplifiestheprocessof selectinga setof indexes.But thelong-termgoalof thistechnologyis that a DBA will not even know what an in-dex is, or what it is usedfor, andcanconcentrateon theirprimaryconcern:thecreationanduseof data.

8. Conclusion

The DB2 Advisor is uniquein its useof a queryopti-mizerfor bothsuggestingandevaluatingpotentialindexes.Usinginformationthatit mustderivefor optimizingaqueryanyway, theOptimizercanreadilysuggestmuchbettercan-didatesfor new indexes than can an external routine thatmust repeatedlyinvoke the optimizeras it blindly iteratesthroughthenumerouspossiblecombinationsof columnsforpotentialindexes.TheDB2 Advisorsuggestsmulti-columnvirtual indexesby combiningcolumnsfrom predicates,or-ders,andindex-only access;estimatestheir attributes;andthenevaluatesthemagainstother, existing indexesusingitsusualqueryoptimizationlogic. Virtual indexesthatarecho-senby theoptimizerarerecommendedto theuser.

For workloadsof multiple queries,this RECOMMENDINDEX modeis alsousedto determinethebenefitof eachsuchrecommendedindex, by comparingtheestimatedcostfor eachquerywith andwithout thesevirtual indexes.Thecostis simply thesizeof theindex in bytes.Treatingthein-

dex selectionproblemasanapplicationof thewell-knownKnapsackProblem,the db2advisutility selectsthosein-dexes with the largest benefit-to-costratio, which is theoptimal solutionwhenthe integrality constraintis relaxed.Selectioncontinuesuntil thecumulative sizeof all indexeschosenexceedsthedisk constraint.Thesolutionis refinedby iteratively swappinga few indexesthat arein the solu-tion with thosethatarenot, to accountfor therelaxationofintegrality.

Bothsingle-queryandworkloadindex selectionby DB2Advisor have beenimplementedin IBM’ s DB2 UniversalDataBaseVersion6.1.Performanceevaluationhasverifiedthat it both finds indexeswhich significantly improve theexecutionof complex queries,andthattheutility findstheseindexesin a timeproportionalto thenumberof queries,butcancontinueto iteratively improveits recommendations.

We believe thatexploiting a queryoptimizerin this wayhas tremendouspotential for efficiently automatingotheraspectsof databasedesign. After all, the costmodelof aqueryoptimizer is a sophisticatedmathematicalmodelofhow aquerywouldperform,giventheschemaandphysicalattributesof thedatabase.It thereforeprovidesanidealwayto evaluatetheimpactof variationsin theschemaand/oritsattributes.Weareinvestigatingadditionalwaysfor theDB2Advisor to exploit theDB2 queryoptimizerto recommendandevaluatealternativedatabasedesigns.

References

[BPS90] ElenaBarucci,RenzoPinzani,andRenzoSprug-noli, ”Optimal selectionof secondaryindexes”, IEEETrans.onSoftwareEngineering,16(1):32-38,January1990.

[CBC 93] Sunil Choenni, Henk M. Blanken, and ThielChang,”On theSelectionof SecondaryIndicesin Re-lationalDatabases”,Data& KnowledgeEngineering,11(3):207-233,1993.

[CFM 95] Alberto Capara,Matteo Fischetti,Dario Maio,”ExactandApproximateAlgorithmsfor theIndex Se-lection Problemin PhysicalDatabaseDesign”, IEEETransactionson Knowledge and Data Engineering,7(6):955-967,December1995.

[FON 92] Martin R. Frank, Edward R. Omiecinski, andShamkantB. Navathe,”Adaptive andAutomatedIn-dex Selectionin RDBMS”, InternationalConferenceon Extendig DatabaseTechnology (EDBT), pages277-292,Vienna,Austria,March1992.

[GHRU 97] HimanshuGupta,Venky Harinarayan,AnandRajaraman,andJeffrey D. Ullman, ”Index Selection

Page 10: DB2 Advisor: An Optimizer Smart Enough to Recommend Its ... · DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes Gary Valentin, Michael Zuliani, Daniel C. Zilio

for OLAP”, In Proceedingsof the InternatoinalCon-ferenceon Data Engineering,pages208-219,Birm-ingham,U.K., April 1997.

[ISR 83] Maggie Y. L. Ip, L. V. Saxton, and Vijay V.Raghavan,”On theSelectionof anOptimalSetof In-dexes”, IEEE Transactionson SoftwareEngineering,9(2):135-143,March1983.

[CN 98a] Surajit Chaudhuriand Vivek Narasayya,”Au-toAdmin ’What-if ’ Index AnalysisUtility”, Procs.ofthe 1998 ACM SIGMOD Conf. (Seattle,1998), pp.367-378.

[CN 98b] Surajit Chaudhuriand Vivek Narasayya,”Mi-crosoft Index Tuning Wizard for SQL Server 7.0”,Procs.of the 1998 ACM SIGMOD Conf. (Seattle,1998),pp.553-554.

[Falkowski 92] Bernd-JuergenFalkowski, ”CommentsonanOptimalSetof Indicesfor a RelationalDatabase”,IEEE Trans. on Software Engineering 18,2 (Feb.1992),pp.168-171.

[FST88] S. Finkelstein, M. Schkolnick, and P. Tiberio,”PhysicalDatabaseDesignfor RelationalDatabases”,ACM Trans.onDatabaseSystems13,1 (March1988),pp.91-128.

[GN 72] RobertS.GarfinkelandGeorgeL. Nemhauser, In-teger Programming,JohnWiley & Sons,New York(1972),pp214-241.

[Whang85] Kyu-Young Whang,”Index Selectionin Re-lational Databases”,Proc.Intl. Conf. on Foundationson Data Organization(FODO) (Kyoto, Japan),May1985, pp. 369-378. Also reprinted in Foundationsof Data Organization,Sakti P. Ghosh,Yahiko Kam-bayashi,and Katsumi Tanaka(eds.), PlenumPress(1987),pp.487-500.

[Zilio 98] Zilio, DanielC., ”PhysicalDatabaseDesignDe-cisionAlgorithmsandConcurrentReorganizationforParallel DatabaseSystems”,PhD Thesis,Universityof Toronto,1998.