A Practitioner's Introduction to Database - Computer Journal

A Practitioner's Introduction to Database PerformanceBenchmarks and Measurements

S. W. DIETRICH*,1 M. BROWN,2 E. CORTES-RELLO2 AND S. WUNDERLIN2

1 Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85287-5406, U.S.A.2 Bull Worldwide Information Systems, Inc., 13430 N. Black Canyon Highway, Phoenix, AZ 85029, U.S.A.

Database performance benchmarks provide an important measure for the comparison of database management systems.This paper provides an introduction to performance benchmarks for centralised databases. The introduction includesbenchmarks for transaction processing, such as DebitCredit, TPC-A and TPC-B, and benchmarks for decision support,such as the Wisconsin benchmark and its extension by Bull to a vendor benchmark, known as the Single-User DecisionSupport benchmark. An important contribution of this paper is a practitioner's perspective of the issues involved inperformance measurement.

Received January 1992, revised March 1992

1. INTRODUCTIONThe performance of a database management system(DBMS) plays an integral role in the decision of acompany to utilise that database. The price of thedatabase is also an important factor. Such decisions mustbe based on information that compares the performanceand price of database management systems of differentvendors. Benchmarks provide a yardstick with which tomeasure these important factors.

Database performance benchmarks have evolved overtime. The performance folklore, as recorded by a groupof computer professionals1, includes several benchmarks.One benchmark for measuring the performance of trans-action processing in a database system is the DebitCreditbenchmark. The flexibility in this transaction processingbenchmark, however, provided incomparable results.The need for industry standard benchmarks led to thedevelopment of the Transaction Processing PerformanceCouncil (TPC), whose varying membership includesmany computer organisations. The council modified theDebitCredit benchmark into an industry standard bench-mark TPC-A,14 which has specific guidelines for themeasurement of performance and price.

Although TPC-A provides very specific guidelines, themeasurement of the benchmark is not a trivial task.Depending on the tools available on the system to betested, the measurement may take a matter of weeks ormonths. Once a measurement is taken, the performancegroup may tune the system configuration to increaseperformance and to reduce the cost. This tuning processis iterative and may take much longer than the initialmeasurement.

This paper provides a practitioner's introduction toperformance benchmarks for centralised databases, andreports on the experience of practitioners in themeasurement of database benchmarks. This paper is notmeant to be a comprehensive survey but an introductionto the benchmarks that defined the database performancearea, such as DebitCredit, TPC-A, TPC-B and theWisconsin benchmark. The paper also describes anextension of the Wisconsin benchmark to a vendor

* Work supported by Bull Worldwide Information Systems underGrant CRP 91228. Correspondence should be addressed to this author.

benchmark by Bull, known as the Single-User DecisionSupport benchmark or SUDS. Other benchmarks existthat target various domain-specific applications andassumptions, such as the AS3AP benchmark,16 the SetQuery benchmark,12 the Engineering database bench-mark,5 and the Neal Nelson database benchmark®.11 Adescription of these benchmarks can be found in a recentbook by Gray,9 which is an invaluable reference in thefield of performance measurement.

2. TERMINOLOGY

Many speciality areas are laden with built-in terminologythat prevents persons outside that speciality to com-municate effectively. This section defines the terminologywithin the specialty of database performancebenchmarks.

2.1 Types of benchmarks

There are three types of benchmarks: industry-standard,vendor, and customer-application. Although benchmarksultimately measure the performance of the system, eachtype of benchmark has its own goal.

An industry-standard benchmark provides an externalview of the product and therefore, samples the per-formance of the database system on a specific, usuallysimple, application. The measurements of an industry-standard benchmark are meant to be published toprovide information for comparison across various^ retry /A A r rV V11VJV1 O.

Before the specification of industry-standardbenchmarks, however, vendors were running their ownbenchmarks to identify performance improvements fortheir product. Today, vendors continue to run their ownbenchmarks since industry-standard and vendorbenchmarks address different goals. A vendor benchmarkmust be comprehensive, providing an introspective viewof the evolving product. As the product evolves, so mustthe benchmark that tests its performance. Themeasurements of a vendor benchmark are meant to staywithin the company to guide development efforts and toprovide sales support.

Customer-application benchmarks are designed by thecustomer for an important application where perform-

322 THE COMPUTER JOURNAL, VOL. 35, NO. 4, 1992

Dow

nloaded from https://academ

ic.oup.com/com

jnl/article/35/4/322/348187 by guest on 21 Decem

ber 2021

DATABASE PERFORMANCE BENCHMARKS AND MEASUREMENTS

ance is critical. After designing and documenting thebenchmark, customers provide selected vendors with thebenchmark. The vendors then compete for the customer'sbusiness by measuring the benchmark to provide thecustomer with cost and performance figures. The cost ofdesigning and documenting application benchmarks,however, is a considerable factor for the customer. Thiscost may lead to increased usage of industry-standardmeasurements in the decision making process for thecustomer. The Transaction Processing PerformanceCouncil, however, warns that while industry-standardbenchmarks play an important role in comparingproducts of different vendors, there are times whenspecific customer-application benchmarking is critical.

2.2 Transactions and the ACID test

A transaction is informally defined as an atomic (all ornothing) program unit that performs database accessesor updates, taking a consistent (correct) database stateinto another consistent database state. The atomicityand consistency requirements of a transaction havenecessary implications on concurrency control andrecovery control. Concurrency control must guaranteethat a transaction remains consistent during concurrentexecution; this property is known as isolation. Recoverycontrol must guarantee that a transaction is preservedacross failures; this property is known as durability. TheAtomicity, Consistency, Isolation and Durabilityproperties of a transaction are known as the ACIDproperties of a transaction.

The Transaction Processing Performance Council1415

defines the ACID properties of a transaction as follows.Atomicity. 'The system under test must guarantee that

transactions are atomic; the system will either performall individual operations on the data, or will assure thatno partially-completed operations leave any effects onthe data.'

Consistency. 'Consistency is the property that requiresany execution of the transaction to take the databasefrom one consistent state to another.'

Isolation. 'Operations of concurrent transactions mustyield results which are indistinguishable from the resultswhich would be obtained by forcing each transaction tobe serially executed to completion in some order.'

Durability. 'The testbed system must guarantee theability to preserve the effects of committed transactionsand insure database consistency after recovery ' fromsingle failures, which are clearly specified in the TPCdocuments.14'l5

Typically, database texts (e.g. Ref. 8) characterise atransaction by the properties of Atomicity, Consistency,Isolation., Durability and Serialisability. The additionalproperty of serialisability states that the result of theconcurrent execution of transactions is equivalent tosome serial execution of the transactions. This definitionof serialisability is equivalent to the definition of isolationabove. The database texts, however, usually refer toisolation as the property of a transaction such that thetransaction does not reveal its uncommitted results toother transactions. Forcing the results to be equivalent tosome serial execution order of the transactions wouldalso enforce the textbook definition of isolation, since atransaction would only reveal committed results in a

serial execution. Thus, the above ACID propertiescapture the desired properties of a transaction.

2.3 Transaction processing versus decision support

Transaction processing environments typically consist ofupdate-intensive database services characterised by'significant disk input/output, moderate system andapplication execution time, and transaction integrity'15.The term 'transaction integrity' refers to the ACIDproperties of a transaction.

Performance metrics for transaction processing aretypically a measure of throughput, which is the rate atwhich transactions are completed. Thus, the metric unitis transactions per second, which is abbreviated tps.Another important measure is the response time of thetransaction, which is the elapsed time for the executionof the transaction. In practice, a combination of thesemeasures is used. The transaction processing bench-marks, discussed in the next section, optimise throughputwith constraints on response time. These benchmarksalso include a cost metric. Since there is a relationshipbetween the performance of a system and its cost, theprice of a system is normalised with respect to tps. Thus,a system is benchmarked with respect to its performancein tps and its price in $/tps.

Decision support differs from transaction processing.A decision support environment is typically not update-intensive and is characterised by 'a wide range offunctions, provided over small to large databases'.4 Theuse of ad hoc queries for decision support is facilitated bythe flexibility of query specification offered by relationaldatabase query languages, such as SQL.

Performance metrics for decision support are typicallya measure of response time, which is also called queryelapsed time. Other performance data, however, istypically collected on the utilisation of the CPU and I/Oso that the performance of the system can be analysed.Although decision support benchmarks do not typicallyinclude a throughput or cost component, the Set Querybenchmark12 includes both the computation of theaverage query throughput in units of query per minute(QPM) and a calculation for pricing.

3. BENCHMARKSThis section describes various performance benchmarkswith respect to the type of processing that the benchmarkis designed to test. The two categories of benchmarksdescribed include update-intensive transaction processingand the ad hoc query processing of decision support.Most benchmarks, including those discussed in thissection, measure the use of the system in transactionprocessing and querying rather than the performance ofthe utilities of the DBMS, such as bulk loading andorganisation of the database. An example of a benchmarkthat includes tests to measure database loading andstructuring, in addition to transaction processing andquerying, is the AS3AP benchmark.16

The transaction processing benchmarks, DebitCredit,TPC-A and TPC-B, are designed to test update-intensiveoperations in a particular banking enterprise. Thedecision support benchmarks, Wisconsin and SUDS, are

THE COMPUTER JOURNAL, VOL. 35, NO. 4, 1992 323

21-2

Dow


ic.oup.com/com


ber 2021

S. W. DIETRICH, M. BROWN, E. CORTES-RELLO AND S. WUNDERLIN

branches

N

tellers

N

Tellerld DeltaAmount

accounts

Time

TellerBalance

Accountld

Figure 1. Entity relationship diagram for bank enterprise.

designed to test ad hoc querying on a carefullyconstructed database. The semantics of the enterprisesare implicit in the database and the benchmark programs.Database technology is evolving toward making thesemantics of an enterprise explicit through thespecification of semantic constraints. The next generationof benchmarks, with the standardisation of the extensionto SQL, known as SQL2,6 will probably address theperformance impact of the added responsibility of theDBMS to manage these constraints.

3.1 Transaction processing

The transaction processing benchmarks are designed totest update-intensive operations. We describe the originalDebitCredit benchmark, followed by its evolution to theindustry standard benchmarks: TPC-A and TPC-B. TheTPC is currently working on another benchmark fororder entry transaction processing, called TPC-C.

3.1.1 DebitCredit

As its name implies, the DebitCredit benchmark1 is in thedomain of banking, where the operation of interest is adebit or credit to an account performed by a teller at aparticular branch. Historically, the origin of the bench-mark is based upon the on-line requirements of a retailbank for 1000 branches with 10000 tellers and 10000000accounts. The additional requirement for the system wasa peak load of 100 transactions per second (tps).

The DebitCredit database maintains information onaccoun t s , t e l l e r s , and branches and a h i s -t o r y file of bank transactions. Figure 1 is an Entity-

Relationship (ER) diagram for the bank enterprise. ERdiagrams are a common tool used for the conceptualmodelling phase of a database design.8 The entities,denoted by rectangles, are accounts , t e l l e r s andbranches. The relationships are denoted by diamonds.The relationships account-of and t e l l e r _ o f in-dicate the branch associated with the account and teller,respectively. The relationship h i s t o r y is a ternaryrelationship, relating the account, teller and branchinvolved in the transaction. The attributes, denoted byovals, indicate properties of the entities or relationshipsto which they are connected. The underlined attributesare key attributes, which are uniquely identifyingattributes for the entities.

The schema for the relational data model represen-tation of the bank enterprise is shown in Figure 2. Therelations for branches , t e l l e r s and accountsinclude the attributes associated with the entities fromthe ER diagram. In addition, the accounts andt e l l e r s relations include the attribute Branchld,which indicates the associated branch as given by therelationships account_of and t e l l e r _ o f . The pri-mary keys of the accounts , t e l l e r s and branchesrelations are given by the corresponding key attributesfrom the ER diagram. The records for the relationsaccounts , t e l l e r s and branches must be 100bytes in length, which can be achieved through the use ofan additional filler field to construct a record of therequired size. The h i s t o r y relation contains theattributes corresponding to the key attributes of thebranches, t e l l e r s and accounts entities and therelationship's descriptive attributes DeltaAmount andTime. The h i s t o r y record consists of 50 bytes, usingfiller if needed, and the benchmark assumes that there is


Dow


ic.oup.com/com


ber 2021


branches(Branchld, BranchBalance)tellers(TellerId, Branchld, TellerBalance)accounts(Accountld, AccountBalance, Branchld)history(AccountId, Tellerld, Branchld, DeltaAmount, Time)

Figure 2. Relational schema for bank enterprise.

one history file that must be able to store 90 days worthof history data.

The DebitCredit transaction represents a change to anaccount, either a debit or a credit, that is performed bya teller at a particular branch. The transaction includesinput from the terminal, the updates to the account,teller, and branch information, the writing of the historyrecord for the transactions and an output message to theterminal. The message processing assumes the use of anX.25 wide area network protocol on block modeterminals, such as IBM 3270.

The input and output processing to the terminal in theDebitCredit transaction tests the tasks associated withon-line transaction processing (OLTP), which adds anadditional requirement of multiple on-line terminalsessions to the requirements of transaction processingpreviously introduced. An important component ofOLTP is the transaction arrival distribution. Afterreceiving a response, each emulated terminal must waitbefore sending its next request to update the database.This waiting is known as 'Think Time'. The think timein the DebitCredit benchmark is 100 seconds, indicatingthat each terminal waits, on the average, 100 secondsbefore submitting the next transaction. Therefore, tomeet the requirement of a peak load of 100 tps, thebenchmark specifies that there are 100 terminals per tps.

The statements used to process the updates of thetransaction in the industry-standard query languageSQL are shown in Figure 3. (Note that the terminal I/Ostatements that are part of the transaction are notshown.) The transaction updates the balance ofaccounts , t e l l e r s , and branches and inserts ah i s t o r y record, including a timestamp, of the trans-action. The identifiers prefixed by: represent the valuesfor the particular transaction under consideration.Identifiers that are not prefixed represent attribute names,which are the same as in Figure 2.

The DebitCredit transaction has to satisfy additionalrequirements, including response time, concurrencycontrol and recovery control. The benchmark requiresthat 95% of the transactions provide less than onesecond response time, as measured at the system under

test (SUT). Concurrency control requirements necessitatethe protection of all data files by locking and logging.For recovery control, the log file is assumed to bereplicated to handle single failures.

The DebitCredit benchmark provided guidelines butthe implementation of the benchmark was vulnerable tointerpretations by the implementer. This degree offlexibility and the lack of full disclosure of how theperformance measurements were obtained, resulted inmeasurements that were not necessarily comparable.These incomparable results led to the development of thefirst industry-standard benchmark TPC-A by the Trans-action Processing Performance Council.

3.1.2 TPC-A

The TPC-A benchmark specification,14 consisting of 42pages, provides substantial guidelines. TPC-A requiresstrong ACID properties of the system with specific testsfor checking these properties. TPC-A also requires scalingrules for the database to maintain a fixed relationshipbetween the size of the relations and the transaction loadon the system. In addition, a full disclosure of thereporting procedure is required for TPC-A with auditingstrongly recommended but not required. However, beforeany TPC-A results can be announced, the disclosurereport must be given to the TPC.

There is also a difference between DebitCredit andTPC-A with respect to the requirements on the historyfile. In TPC-A, the history file may be horizontallypartitioned, whereas DebitCredit required a unifiedhistory file. The DebitCredit benchmark also requiredstorage for the history file for 90 days. TPC-A uses amore realistic constraint that requires storage for thehistory file for eight hours of operation of the SUT, withpricing requirements for 90 days of history data.

The TPC-A transaction strongly resembles theDebitCredit transaction.13 The main difference is that theterminal I/O is outside the boundaries of the transaction.There is a driver system that emulates the terminalrequirements. Recall that after receiving a response, eachemulated terminal must wait before sending its next

update accountsset AccountBalance = AccountBalance + : DeltaAmountwhere Accountld= : Accountld

update tellersset TellerBalance=TellerBalance+ :DeltaAmountwhere Tellerld= : Tellerld

update branchesset BranchBalance=BranchBalance+ :DeltaAmountwhere Branchld= : Branchld

insert into history(Accountld, Tellerld, Branchld, DeltaAmount, Time)values(:Accountld, :Tellerld, :Branchld, :DeltaAmount, :CurrentTimeFigure 3. SQL statements for updates in transaction.


Dow


ic.oup.com/com


ber 2021


request to update the database. In TPC-A, the average ofthe think time added to the response time must be tenseconds. The think time is approximated by a delay,which provides an essentially random arrival distribution.In DebitCredit, the average think time is 100 seconds.Due to the differences in the average think time betweenthe two benchmarks, the 100 terminals per tps inDebitCredit was updated to 10 terminals per tps in TPC-A.

The response time requirements of TPC-A specify that90% of the transactions must respond within twoseconds, with the response time being measured at theremote terminal emulator (RTE) rather than at the SUT.Note that the response time measured at the RTEincludes the delay across the communication network,which in TPC-A may include local area networks. Due tothe differences in communication delay between localarea networks and wide area networks, the reportingmetrics for tps in TPC-A include the specification oftpsA-Wide or tpsA-Local.

3.1.3 TPC-B

The Transaction Processing Performance Councildesigned another benchmark, TPC-B,15 that tests thedatabase aspects of the transaction. TPC-B does notgenerate transactions through terminal emulation butthrough driver programs, which generate transactions asquickly as possible without allowing for any think time.This relaxation is expected to lead to increasedthroughput, which is measured in transactions per second(tps). The reporting metric used for TPC-B is tpsB, whichdenotes the tps metric following the specification ofTPC-B, and is not comparable to results from TPC-A.

TPC-B requires the throughput to be subject to aresidence time constraint specifying that 90% of alltransactions must have a residence time of less than 2seconds. Since TPC-B does not include terminal emu-lation, residence time is measured by the elapsed time atthe driver between supplying the inputs to the transactionand receiving a corresponding response. This residencetime constraint is similar to the response time constraintof TPC-A, where response time is measured by theelapsed time between sending the first byte of the inputmessage and receiving the last byte of the outputmessage.

The TPC-B benchmark is viewed by performancepractitioners as a precursor to the TPC-A benchmarkwith respect to performance measurement. Typically,TPC-B is installed and measured to test the databaseinstallation and to tune the system. TPC-A, which hasadded requirements for on-line processing, is theninstalled and measured.

3.2 Decision supportRelational databases provide the capability of ad hocquery specification, typically through the use of theindustry-standard query language SQL. A decisionsupport environment takes advantage of this flexibilityto specify ad hoc queries to the database. This flexibleenvironment requires a more comprehensive approach tobenchmarking the performance of the database across awide range of functions. The Wisconsin benchmark,2

which consists of a carefully constructed database and a

comprehensive set of queries, provides a systematicapproach to benchmarking relational database systems.This paper reports on the database and queries for theoriginal Wisconsin benchmark as they appeared in alater paper7 and discusses the extension of the benchmarkfor parallel database systems.7 The Wisconsin benchmarkis also extensible for use as a vendor benchmark, asillustrated by the development of the Bull Single-UserDecision Support (SUDS) benchmark.4 The TPC iscurrently working on a decision support benchmark,called TPC-DSS.

3.2.1 Wisconsin

The database for the Wisconsin benchmark does notcorrespond to a particular application, such as the TPCbenchmarks, but is carefully designed to producepredictable results for decision support benchmarking.The names of the relations and the attributes provide aself-description of its contents. This systematic namingconvention allows for the design of an extensive databasefor systematic benchmarking.

A relation is named by the number of tuples that therelation contains, which is called its cardinality. Forexample, the relation ONEKTUP denotes a relation thatcontains one thousand tuples. More than one relation ofthe same cardinality can exist in the database. Forexample, TENKTUP1 and TENKTUP2 denote tworelations both having ten thousand tuples. The Wisconsindatabase consists of ONEKTUP, TENKTUP1 andTENKTUP2 relations. The specification of the attributesin a relation, e.g. TENKTUP, is given in Figure 4.3

Attribute name

unique 1unique2twofourtentwentyhundredthousandtwothousfivethoustenthousodd 100even 100stringulstringu2string4

Attribute domain

0..99990..99990..10..30..90..190..990..9990..19990..49990..99991,3,5, ...,992,4,6, ..., 100per templateper templateper template

Attribute value

unique, randomunique, randomcyclic: 0, 1cyclic: 0, 1, 2, 3cyclic: 0, 1, ..., 9cyclic: 0, 1, ..., 19cyclic: 0, 1, ..., 99cyclic: 0, 1, ..., 999cyclic: 0, 1, ..., 1999cyclic: 0, 1, ...,4999cyclic: 0, 1, ...,9999cyclic: 1, 3, ..., 99cyclic: 2,4, ..., 100derived from unique 1derived from unique2cyclic: A, H, O, V

Figure 4. Original Wisconsin benchmark: attributespecification.

A relation contains two unique (integer-valued)attributes: uniquel and unique2 . The values of theuniquel and unique2 attributes in the relationinstance are determined randomly in the range between0 and one less than the cardinality of the relation. Thus,both attributes are candidate keys although the unique2attribute serves as a designated sort key, when required.

An integer-valued attribute, in general, is named bythe number of distinct values that the attribute containsin the relation. For example, the attribute ten has tendistinct values (0..9). The range of values that the


Dow


ic.oup.com/com


ber 2021


attribute assumes appears with a uniform distribution inthe relation by cycling through its possible values.Obviously, an attribute named in this way is typically anon-unique attribute, since the value of the attributeappears in the relation more than once (assuming thatthe cardinality of the relation exceeds the number ofdistinct values of the attribute). For example, theTENKTUP1 relation of the Wisconsin database containsthe following non-unique integer-valued attributes withcyclic order: two, four , t en , twenty, hundred,thousand, twothous, f i ve thous , t e n t h o u s ,oddlOO and evenlOO. The attributes, oddlOO andevenlOO, represent the odd and even numbers,respectively, in the range of 1 to 100. These non-uniqueinteger-valued attributes are used to model variousselectivity factors.

The database also has string attributes to test stringoperations. Each string consists of 52 letters, and mustobey the following template:

$xxxxxxxxxxxxxxxxxxxxxxxxx$xxxxxxxxxxxxxxxxxxxxxxxx$

where $ designates a letter in the set {A.. V}, whichconsists of 22 letters. There is a substring consisting of 25x's between the first and second S, and a substringconsisting of 24 x's between the second and third $. Thisbasic string pattern allows 10648 (22*22*22) uniquestring values, and may be easily modified to allow foradditional values.

A relation contains three string attributes:s t r i n g u l , s t r i n g u 2 and s t r i n g 4 . The attributess t r i n g u l and s t r i n g u 2 are string versions ofun ique l and un ique2 , respectively. Either attributemay be used as a key attribute. The s t r i n g u 2 attributeis typically used for sorting and indexing. The attributes t r i n g 4 has four distinct values. The unique values areconstructed by forcing the $ positions of the string tohave the same value and to be chosen from a set of fourletters: {A, H, O, V}. The s t r i n g 4 attribute plays asimilar role as the non-unique integer-valued attributes.

The Wisconsin Benchmark contains a set of queriesthat test various operations: selection, projection, join,aggregation, append, delete, and modify. There arevariations on each type of query for selectivity factorsand the availability of primary/secondary indexes. Thebenchmark is considered an industry-standard bench-mark since it has a fixed set of queries and was used tocompare the performance of products across severalvendors.2

The Wisconsin benchmark provides a very useful toolfor performance comparison. A retrospective view ofthe benchmark by (some of) its authors3 indicatedsuggestions for improvement. A revised Wisconsinbenchmark7 addresses scalability issues for bench-marking of parallel database systems.

The schema of the relations in the scalable Wisconsinbenchmark have been updated, resulting in the attributespecification shown in Figure 5.7 The u n i q u e l attributeremains unchanged, representing a candidate key of therelation with its 0 to (cardinality — 1) values beingrandomly distributed. All other attributes have beenchanged either with respect to its domain of values or itscomputation of values. The updated unique2 attributeis a declared key and is ordered sequentially. Theattributes two, fou r , t en and twenty now have arandom rather than a cyclic ordering of values, which is

Attribute name

uniquelunique2twofourtentwentyonePercenttenPercenttwentyPercentfiftyPercentunique3evenOnePercentoddOnePercentstringulstringu2string4

Attribute domain

O..(MAX-1)O..(MAX-1)0..10..30..90..190..990..90..40..1O..(MAX-1)0,2,4, ...,1981,3,5, ...,199per templateper templateper template

Attribute value

unique, randomunique, sequentialuniquel mod 2uniquel mod 4uniquel mod 10uniquel mod 20uniquel mod 100uniquel mod 10uniquel mod 5uniquel mod 2uniquelonePercent*2(onePercent *2)+ 1derived from uniquelderived from unique2cyclic: A, H, O, V

Figure 5. Scalable Wisconsin benchmark: attributespecification.

derived by an appropriate mod of the u n i q u e l values.For the TENKTUP1 relation in the original benchmark,the attributes hundred , t housand , twothous andf i v e t h o u s were used to provide access to a knownpercentage of values in the relation, respectively, 1 %,10%, 20%, and 50%. In the revised benchmark, theseattributes have been replaced by new attributesonePercent , t enPercen t , twen tyPercen t , andf i f t y P e r c e n t . The order of the values of these'percentage' attributes are random and are based on amod of the uniquel value. The t en thous attribute ofthe TENKTUP1 relation referred to a single tuple. Thisattribute has been replaced by the attribute un ique3 ,which takes on the value of u n i q u e l . The evenlOOand oddlOO attributes have been updated to even-OnePercent and oddOnePercent, deriving attributevalues from an appropriate function defined over thevalue of the new attribute onePercent .

The string attributes in the revised benchmark havealso changed. Although the length of the strings are thesame, the template for the strings has been modified. Thes t r i n g u l and s t r ingu2 attributes derive their valuesfrom the uniquel and unique2 values, and must obeythe following template:

$$$$SS$xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

where the first seven characters in the string, indicated by$, are a letter in the set {A.. Z} and there is a concludingsubstring consisting of 45 x's. This template allows 267

possible values of the string, and is easily modifiable.This change to the string template addresses the concernthat most strings are differentiated in the early portion ofthe strings. The original benchmark had differentiatingpositions in the middle and at the end of the string. Thes t r i n g 4 attribute still takes on four unique values in acyclic fashion. The unique values are constructed byforcing the first four positions of the following stringtemplate

SSSSxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

to have the same value and to be chosen from a set offour letters: {A, H, O, V}.

The Wisconsin benchmark provides for the systematicbenchmarking of database systems. Its extensibility is


Dow


ic.oup.com/com


ber 2021


evident both from its revision into a parallel databasesystem benchmark and from its extension into acomprehensive vendor benchmark.

3.2.2 Single-user decision support

The Bull Single-User Decision Support (SUDS) bench-mark4 developed as an extension of the Wisconsinbenchmark to measure the performance of INTEREL,the Bull decision support product for Bull's proprietaryoperating system GCOS 8. INTEREL is a relationalsystem that uses the industry-standard query languageSQL. INTEREL also has the capability to access datacontained in several file types. For example, INTERELprovides decision support on data in an IDS/II (network)database through the use of a utility that provides arelational view of the network schema. INTEREL canthen access the data in the network database using SQL.

The SUDS benchmark analyses the performance ofINTEREL, with the goal of identifying areas within thesystem that require performance enhancement. Thebenchmark assumes a single-user execution environment.For each statement executed, an internal performancemonitor measures and reports on response time, pro-cessor time, input/output and memory usage.

The benchmark contains approximately 230 SQLstatements, whereas the Wisconsin benchmark contains32 statements. SUDS measures the performance ofvarious retrievals, such as unique versus non-unique andindexed versus non-indexed retrievals, variations of thewhere clause, scalar aggregation, the order by clause, thegroup by clause, the unique qualification, updates,insertions, deletions, set operations, simple joins, viewsand complex joins.

The SUDS benchmark results provide a detailed andcomprehensive view of the INTEREL product, leadingto a substantial performance improvement.4 Also, theknowledge learned from the benchmark provides usefulinformation for technical customer support personnel,who can share this knowledge with the developers ofcustomer applications.

4. PERFORMANCE MEASUREMENT

The task of measuring performance is a complex process,and many iterations and adjustments are needed toobtain the best performance possible in a given computersystem. The following sections describe the tasks involvedin a 'serious' measurement of database performance,using the TPC-B benchmark as an example. Although wedescribe the measurement process for industry-standardbenchmarks, the measurement process for vendor-specific and customer-application benchmarks is similarbut differs slightly due to the dissimilarities in constraintsand objectives of the various types of benchmarks.

4.1 Hardware configuration

As a first step, the hardware configuration to be measuredmust be specified. The specification includes the modeland some other characteristics, for example, a Bull DPX-2/340 with four processors, 32 MBytes of memory ineach processor, three disk controllers and 16 disk drives.Also a specification of the basic software that will run ontop of the hardware platform is needed, for example,

BOS 2.0 - the Bull version of the UNIX operatingsystem.

Even at this early stage, an estimation of the expectedperformance is required. Based upon previousmeasurements, the expected number of transactions persecond can be inferred. The database instance needed forthe performance measurement is then scaled to theexpected number of transactions per second. In TPC-B,for each tps configured, the database instance must havea minimum of 100000 accounts, 10 tellers, and 1 branch.The ratios between these numbers must also bemaintained. A change in any value must be reflected bya proportional change in the other values. (For TPC-A,additional scaling rules dictate that a minimum of 10terminals is included for each tps configured.) Given thecapacity and I/O rate of available disks and controllersand the average number of I/Os per transaction, it ispossible to anticipate the number of disks and controllersneeded. It is always better to have some excess storageand I/O rate capacity. Once the machine is available, ithas to be physically installed and possibly hooked upinto a network.

4.2 System software configuration and tuning

Once the machine and the basic software are available,they have to be configured and tuned. The system hasseveral layers that need configuration: the operatingsystem, communications software and the databasemanagement system. Initially, most of the configurationparameters will be set to default values. Some operatingsystem parameters may have to be adjusted in order to beable to start running the database management system.

In the process of tuning, tools must be used to monitorthe overall behaviour of the system. Examples of suchtools are UNIX's sar or easytune, which is a specialperformance monitoring tool of Bull BOS. The toolsprovide real time information on important parameterssuch as memory utilisation, swapping activity, processorutilisation, buffer and caching activity, and physical andlogical I/O activity. Other tools provide real timeinformation on DBMS critical parameters such asbuffers, and the status of locking and logging. Theinformation is used to detect bottlenecks and tune theoperating system (OS) and the database managementsystem (DBMS). An example of such a tool is INGRES'Interactive Performance Monitor.10

4.3 Application tuning

The next layer to tune is the benchmark software. In anormal case, such as TPC-B, the benchmark is a series ofprograms and scripts. Some programs perform setup andstatistical functions, while others execute transactions.Some examples of the statistics that must be collectedare: tps, response time per transaction, and logging andlocking statistics. Some examples of the setup functionsare: setting up database tables, setting up temporarytables to collect statistics and setting up timing plans forthe execution of the benchmark. The database is scaledwith respect to the expected throughput to avoid acompletely in-core database. This scaling process involvesestimating the expected throughput. If the measuredthroughput is larger than the estimated throughput, the


Dow


ic.oup.com/com


ber 2021


database has to be re-sized, which is a time consumingprocess.

In a more complex case, such as TPC-A, a terminaldriver is needed to emulate terminals that issuetransactions through a network. The benchmarkprograms must be optimised to minimise the con-sumption of processor time and memory. An objective ofthe optimisation is to minimise the overhead of executionof the statistical functions compared to the load to bemeasured.

Particular attention must be given to problems withlocking and concurrency control. While the ultimate goalof a performance measurement is to obtain a metricsuch as tps or SK/tps, the measurement process hassome important constraints. The measurement mustpreserve the ACID properties of the database. The TPC-B document15 clearly describes tests to run to verify theACID properties.

4.3.1 Performance

The objective of this phase is to maximise the transactionsper second (tps) metric. At this time, the real workbegins. Each one of the relevant configuration parametersfrom the OS and the DBMS should be analysed, and a'reasonable' value must be assigned.

On the operating system side, the parameters relatedto file system, system buffers and allocation of memoryare key to performance. On the database side the databuffers, the concurrency control parameters, thelogging/archiving and recovery parameters are the mostinfluential.

Some database management systems have specialfeatures that may increase performance with theconstraints of the benchmark. For example, somedatabases, such as Ingres and Oracle, provide 'fast-commit' facilities, or faster ways to execute the trans-action, such as stored (precompiled) database proceduresin Ingres and macroexecution in Teradata RDBC.

Many operating system parameters are directly relatedto the database management system parameters; in someother cases the relationship is indirect. Some apparentlyindependent parameters may affect each other in anindirect manner. For example, the block size of thedatabase management system is related to the block sizeof the operating system.

Once the parameters have been set up, the physicalstructure of data must be considered. Some databasesprovide multiple storage structures. For example, inIngres the database designer has the choice of btree,heap, isam and hash storage structures. In many cases,the tables and indexes can be spread over multipledevices. The choice of storage structure depends on whatis best for the benchmark, the database managementsystem and the operating system.

Once an initial setup has been configured, the tuning isdone via experimentation. The process is painful, and alot of patience and discipline are needed. Good recordsmust be kept of the step-by-step process.

The objective of the measurement process is not onlyto generate a number but to analyse whether the numberis reasonable and repeatable. For example, a measure-ment that indicates a 100% processor utilisation may notindicate that the maximum transaction rate has beenachieved. An analysis of the processor utilisation by the

operating system, database management system andapplication is needed to understand how the processor isbeing used by the different components. Once the tpsmetric has been obtained, other runs of the benchmarkunder the same conditions (hardware, software andconfiguration) should yield the same results; that is, themeasurements must be repeatable.

By this process, diminishing returns are obtained perunit of effort. At first the most obvious flaws arediscovered; and later, given the same unit of effort lessimprovement will be observed in the transactions persecond (tps) metric.

It is difficult to know when to stop. There are no goodguidelines. The measurement process stops when you runout of time, resources, or simply when the probability ofimprovement to the tps metric is negligible.

4.3.2 Cost

Up to this point the effort has been targeted towardmaximising of the transactions per second (tps) metric.Once the maximum tps has been obtained from the basichardware/software platform, the next consideration iscost; that is, the metric thousands of dollars per tps($K/tps). To minimise $K/tps, the benchmark must berun in the least expensive configuration of hardware andsoftware in terms of dollars.

Technical factors. At this stage, a lot will be knownabout the utilisation of resources such as memory,processor and I/O. For example, the number of physicalI/O operations per transaction will be known; and thisinformation, along with the knowledge about the I/Orates of disks and controllers will be enough to decide theoptimal combination of controllers and disks, and theoptimal allocation of the tables.

Also, the database may be re-scaled to reflect, in amore precise manner, the expected performance of theDBMS. The implication is that less disks may be neededto store the database, and the total cost of the systemmay decrease.

Marketing factors. The problem of minimising the$K/tps is not only a matter of increasing the tps anddecreasing the hardware and software resources; it isalso a marketing problem, and specifically it is a pricingproblem.

Technical people tend to work in an isolated en-vironment were the decisions are based on technicalarguments; but in the case of minimising SK/tpsmarketing people can play a major role. Certaincombinations of software and hardware can be priced inspecial ways. Of course, the configuration being pricedmust be available to the public - at the price quoted.

4.4 Integration

After the tps metric has been maximised and the SK/tpshas been minimised, a disclosure report must be written.It is recommended to audit the measurement, in order tocheck compliance with the benchmark definition.

From the description above, the reader may get themisleading impression that the process of measuringperformance, although complicated and lengthy, issequential. Figure 6 shows a more realistic picture of theperformance measurement process.

Problems may occur at any of the phases. For example,sometimes the hardware configuration process can be


Dow


ic.oup.com/com


ber 2021


performance

Figure 6. Integration of performance and cost tuning.

slowed because some trivial connectors or cables werenot ordered or were lost. Also, as measurements are doneas early as possible in the development cycle, some ofthe layers involved may not be stable enough, and themeasurement process will be an opportunity to discovercertain hard-to-detect problems.

The system software configuration and tuning is aniterative process. Initially, the default values of thesystem parameters are used, and sometimes they areinadequate. Tuning is an experimental process; but theexperiments are not random: they are guided by theknowledge of the operating system and the database

management system. During the process, the hardwareresources may prove to be inadequate and the entireprocess may start all over again.

In the application tuning phase, the knowledge aboutthe database management system, the operating systemand their relationships is again crucial. During thisphase, the system software tuning continues as a parallelactivity.

Tuning for performance (tps maximisation) can un-cover problems in any of the three previous phases. Forexample, we may discover that the benchmark ap-plication is using too much memory, that a crucialparameter of the operating system must be changed, orthat there are problems with the DBMS configuration.

Finally, tuning the cost ($K/tps minimisation) involvesa very difficult dialogue between people with technicaland marketing backgrounds. The cost minimisationprocess may affect the choice of software components ormay involve changes in the hardware configuration.

5. SUMMARY

This paper provided an introduction to performancebenchmarks that defined the database performance area,including benchmarks for transaction processing, such asDebitCredit, TPC-A, TPC-B, and benchmarks fordecision support, such as the Wisconsin benchmark andits extension by Bull to a vendor benchmark, known asthe Single-User Decision Support benchmark. We alsogave a practitioner's perspective of the issues involved inperformance measurement. The task of measuringperformance is a time consuming and complex process.The measurement of benchmarks typically requires anoptimisation phase. For industry-standard benchmarks,the goal is to maximise the performance metrics and tominimise the cost metrics. This optimisation phase is aniterative process. Each layer of the system must be tunedaccording to its own parameters but the tuning of onelayer probably has an effect on another layer. Theoptimisation is also not limited to pure technical factors;marketing factors play an important role in minimisingthe cost metric.

Acknowledgements

The authors would like to thank the anonymous refereefor thoughtful comments that improved the paper.Thanks are also due to Forouzan Golshani, Aime Bayleand the performance Group at Bull Worldwide In-formation Systems in Phoenix for feedback on earlierversions of this paper.

REFERENCES1. Anon, A measure of transaction processing power.

Datamation, (1 April 1985) pp. 113-116. Also appears inReadings in Database Systems, edited M. Stonebraker, pp.300-312. Morgan Kaufmann, San Mateo, CA (1988).

2. D. Bitton, D. J. DeWitt and C. Turbyfill, Benchmarkingdatabase systems — a systematic approach. In Proceedingsof Ninth International Conference on Very Large DataBases, pp. 8-19 (1983).

3. D. Bitton and C. Turbyfill, A retrospective on theWisconsin benchmark. In Readings in Database Systems,

edited M. Stonebraker, pp. 280-299. Morgan Kaufmann,San Mateo, CA (1988).

4. M. Brown, Vendor benchmarking for relational decisionsupport. BullHN Technical Update, pp. 8-13 (April1990).

5. R. G. G. Cattell, An engineering database benchmark. InRef. 9, pp. 247-281.

6. C. J. Date, An overview of SQL2. In Relational DatabaseWritings 1989-1991, edited C. J. Date, pp. 413^23.Addison-Wesley, Reading, MA (1992).


Dow


ic.oup.com/com


ber 2021


7. D. DeWitt, The Wisconsin benchmark: past, present, andfuture. In Ref. 9, pp. 119-165.

8. R. Elmasri and S. Navathe, Fundamentals of DatabaseSystems. Benjamin-Cummings, Menlo Park, CA (1989).

9. J. Gray (ed.), The Benchmark Handbook for Database andTransaction Processing Systems. Morgan Kaufmann, SanMateo, CA (1991).

10. INGRES Interactive Performance Monitor User's Guide,64-9(9)47913.

11. N. Nelson, The Neal Nelson benchmark®: A benchmarkbased on the realities of business. In Ref. 9, pp. 283-300.

12. P. E. O'Neil, The set query benchmark. In Ref. 9, pp.209-245.

13. O. Serlin, Measuring OLTP with a better yardstick.Datamation, pp. 62-64 (15 July 1990).

14. Transaction Processing Performance Council, TPC Bench-mark^ A, Standard Specification (10 November 1989).Also appears in Ref. 9, pp. 39-78.Transaction Processing Performance Council, TPC bench-mark51* B, Standard Specification (23 August 1990). Alsoappears in Ref. 9, pp. 79-117.C. Turbyfill, C. Orji and D. Bitton, AS3AP: an ANSI SQLstandard scaleable and portable benchmark for relationaldatabase systems. In Ref. 9, pp. 167-207.

15

16

Book Review

F. MADDIXHuman-Computer Interaction: Theory andPractice Ellis Horwood, 1990. £18.95 ISBN0-13-446220-3

For a reluctant book reviewer, the page oferrata that fell out of my copy was less thanencouraging. Secondly, glancing at the refer-ences, a mere 71 of them, it was obvious to methat this book is seriously under-referenced,particular as my name did not appear inappendix A, where the references are buried.In fact, one of my papers is referenced, but isso at a chapter end. Indeed, there is a problemwith the whole style of referencing in the bookwhich is inconsistent (sometimes names anddates, sometimes name and numbers, andsome references in appendix A and some not).

If the references are one problem with thisbook, the content index is a more importantsecond problem. At five pages, in double-column format, it at first looks adequate.However, the naturally suspicious will notethat each index item has few page numbersassociated with it, and at most only six, andthis is a very rare exception. Taking what, forme, is becoming a standard test of HCI books,I looked up a couple of vital and difficult indexentries: (i) task analysis; (ii) user modelling.Apart from my expertise in these two areas ofHCI, they are informative because task analy-sis and the whole concept of tasks is central toHCI, as is user modelling, and both aretechnically difficult. In a word, the index isinadequate. 'Task analysis', 'Task model'and 'Tasks' yields but four index entries and'User model' a mere two. Within the bookthere are numerous references to these twotopics, some of them in bold (e.g. p. 226, 'Thetask') and some even as section headings (e.g.p. 44, ' Other User models'; p. 88, ' TaskAnalysis'), which just do not occur in thecontent index at all. I estimate that these twotopics between them require two to three timesthe little indexing they enjoy. Indeed, one ofmy major criticisms of the book is that usermodelling is not dealt with adequately as atopic. In fact, there is a great deal on users inthe book, as is essential for any HCI book, butit is dispersed and it appears that the author isoften not aware of the need to deal with the

whole issue in a coherent manner. To take oneexample for illustration, there is a wholesection on 'Operator psychology' (pp. 247-8)which describes users in the context of theirpersonality attributes, which is clearly anaspect of user modelling, yet, of course, it isnot indexed either under ' User model' or evenunder 'Operator psychology'.

I wish I had had a chance to be the editor ofthis book, as overall it makes a very valuablecontribution to the HCI textbook literature.Unfortunately I think it is structured in amanner that is odd and less than helpful. Likemany HCI textbooks it has a few chapters ofirrelevant material, most particularly chapter3 on 'Information processing'. The focus ofthe chapter is wrong simply because infor-mation theory's definition of the bit is in-appropriate for describing the informationcontent of stimuli perceived by people becausethe actual content is determined by the state(knowledge, goals, task constraints) of theperceiving information processor (the mind).Furthermore, while I would have replaced thischapter with one on user modelling, 1 wouldstill question it utility for practical HCI. Tome it seems that there is a tremendous gapbetween a lot of psychological waffle (and I dohave a doctoral degree in experimental psy-chology) about people and the impact (little)all this psychology has on the design ofcomputer systems. Furthermore, I questionhow even some relevant user psychology canbe transmitted to system designers, whogenerally do not have a background in'difficult' subjects such as psychology,sociology, philosophy, economics, etc.

The division of chapters 7 and 8,' Interfaces'and 'Visual interfaces' respectively, seems tome arbitrary. I also question the utility ofchapter 12's 'Experimental studies', whichdescribe four very brief student experimentsand the results of which I would not trust forthe design of real computer systems, whichhave to support particular types of userperforming specific types of task. Certainlythere is a need to balance these small exampleswith real software engineering examples, andit is here that chapter 13 (' HCI and design')fails. The author's heart is in the right place

with respect to the need to go beyond thedirect end-user or operator, and that a socialperspective is also essential, but this view,expressed in the early chapters, is not carriedthrough the rest of the book. Similarly, Iwould have appreciated a section, if not achapter, on ethical issues associated withcomputers since HCI, it has been argued, bybeing inherently interdisciplinary and user-centred, is particularly well positioned toaddress this topic.

Whatever it says in the preface, this is astudent textbook and really one for computerscience or information technology students. Ithas sufficient computer technology to maintainsome interest for such an audience, but toomuch for those learning about HCI fromother disciplines such as psychology. In thiscontext, its overall weakness in psychologymay be seen as an advantage. Since I do teachcomputer science undergraduates and master-level students about HCI, I intend to rec-ommend this book next year as a primaryreference source for my students. The book isnot suitable for those in industry because of itsweakness at providing advice and practicalexample of HCI in real system design andsoftware engineering.

Finally, this is a book with a layout of whichthe publishers should be thoroughly ashamed.1 would guess that this book has been preparedon the author's computer system and that thepublishers have simply taken the author'sversion and printed it. The pictures, tables,etc. are not floated so that there are frequent,large blank sections at the bottom of pages.These blanks are not just unaesthetic butactually interfere with the book's semantics,since they do not denote the meaningful endof a section or topic. I know that EllisHorwood have been publicly criticised inreviews before over such matters, and this isyet another example of what is wrong with fartoo much technology publishing today. I thinkthe author got a raw deal, and what is reallyquite a good textbook has been ruined by anabsence of the services that I demand from thepublishers I deal with.

DAN DIAPERLiverpool


Dow


ic.oup.com/com


ber 2021

A Practitioner's Introduction to Database - Computer Journal

Documents