Top Banner

of 26

Lecture 1 Database Tuning

Apr 05, 2018

Download

Documents

dushyantymail
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/2/2019 Lecture 1 Database Tuning

    1/26

    Database Tuning

  • 8/2/2019 Lecture 1 Database Tuning

    2/26

    Performance The measure of efficiency for an application or

    multiple applications running in the sameenvironment

    Is usually measured in: response time: the time that a single task takes to

    completeCan be shortened by: Reducing contention and wait times, particularly disk I/O wait

    times

    Using faster components Reducing the amount of time the resources are needed

    throughput: the volume of work completed in a fixedtime period

    is commonly measured in transactions per second (tps), but itcan also be measured per minute, per hour, per day, and so

    on

  • 8/2/2019 Lecture 1 Database Tuning

    3/26

    Database Performance

    Can be thought by using the concepts of supplyand demand Users demand information from the database

    The DBMS supplies information to those requesting it

    The rate at which the DBMS supplies the demand forinformation can be termed database performance

    Five factors influence database performance:

    Workload

    ThroughputResource

    Optimization

    Contention

  • 8/2/2019 Lecture 1 Database Tuning

    4/26

    Factors Influencing Database Performance

    Workload that is requested of the DBMS definesthe demand A combination of online transactions, batch jobs, ad

    hoc queries, data warehousing analysis, and systemcommands directed through the system

    Can fluctuate drastically, sometimes can be predicted

    Throughput defines the overall capability of thecomputer to process data Composite of I/O speed, CPU speed, parallel

    capabilities of the machine, and the efficiency of theoperating system and system software

  • 8/2/2019 Lecture 1 Database Tuning

    5/26

    Factors Influencing Database Performance (cont.)

    Resources of the system include database kernel,disk space, memory, cache controllers, andmicrocode

    Optimization of queries is primarily accomplishedinternal to the DBMSMany factors that need to be optimized:

    SQL formulation

    Database parameters

    Contention is the condition in which two or more

    components of the workload are attempting touse a single resource in a conflicting way As contention increases, throughput decreases

  • 8/2/2019 Lecture 1 Database Tuning

    6/26

    Database Performance Definition

    Database performance can be defined asthe optimization ofresource use to increasethroughput and minimize contention,enabling the largest possible workload to

    be processed

    Performance management tasks not

    covered by the definition should be handled

    by someone other that the DBA or at aminimum shared between the DBA andother technicians

  • 8/2/2019 Lecture 1 Database Tuning

    7/26

    Performance Tuning

    Adjusting various parameters and design choices toimprove system performance for a specific application.

    Tuning is best done by

    1. identifying bottlenecks, and

    2. eliminating them.

    Can tune a database system at 3 levels: Hardware -- e.g., add disks to speed up I/O, add memory to

    increase buffer hits, move to a faster processor.

    Database system parameters -- e.g., set buffer size to avoid

    paging of buffer, set checkpointing intervals to limit log size.System may have automatic tuning.

    Higher level database design, such as the schema, indices

    and transactions (more later)

  • 8/2/2019 Lecture 1 Database Tuning

    8/26

    Bottlenecks

    Performance of most systems (at least before they aretuned) usually limited by performance of one or a fewcomponents: these are called bottlenecks E.g. 80% of the code may take up 20% of time and 20% of code

    takes up 80% of time

    Worth spending most time on 20% of code that take 80% of time

    Bottlenecks may be in hardware (e.g. disks are very busy,CPU is idle), or in software

    Removing one bottleneck often exposes another

    De-bottlenecking consists of repeatedly finding

    bottlenecks, and removing them This is a heuristic

  • 8/2/2019 Lecture 1 Database Tuning

    9/26

    Identifying Bottlenecks

    Transactions request a sequence of services e.g. CPU, Disk I/O, locks With concurrent transactions, transactions may have to wait

    for a requested service while other transactions are beingserved

    Can model database as a queueing system with a queuefor each service transactions repeatedly do the following

    request a service, wait in queue for the service, and get serviced

    Bottlenecks in a database system typically show up as very

    high utilizations (and correspondingly, very long queues) of aparticular service E.g. disk vs CPU utilization

  • 8/2/2019 Lecture 1 Database Tuning

    10/26

  • 8/2/2019 Lecture 1 Database Tuning

    11/26

    Tunable Parameters

    TuningTuning

    Tuning

    TuningTuning

    ofof

    of

    ofof

    hardwareschema

    indices

    materialized viewstransactions

  • 8/2/2019 Lecture 1 Database Tuning

    12/26

    Tuning of Hardware

    Even well-tuned transactions typically require a few I/Ooperations Typical disk supports about 100 random I/O operations per

    second

    Suppose each transaction requires just 2 random I/O operations.

    Then to support n transactions per second, we need to stripe dataacross n/50 disks (ignoring skew)

    Number of I/O operations per transaction can be reducedby keeping more data in memory If all data is in memory, I/O needed only for writes

    Keeping frequently used data in memory reduces disk accesses,reducing number of disks required, but has a memory cost

  • 8/2/2019 Lecture 1 Database Tuning

    13/26

    Hardware Tuning: Five-Minute Rule

    Question: which data to keep in memory: If a page is accessed n times per second, keeping it in memory saves n* price-per-disk-drive

    accesses-per-second-per-disk

    Cost of keeping page in memory

    price-per-MB-of-memory

    pages-per-MB-of-memory

    Break-even point: value ofn for which above costs are equal Buying memory: If accesses are more then saving is greater than cost

    Solving above equation with current disk and memory prices leads to:

    5-minute rule: if a page that is randomly accessed isused more frequently than once in 5 minutes it shouldbe kept in memory

    (by buying sufficient memory!)

  • 8/2/2019 Lecture 1 Database Tuning

    14/26

    Hardware Tuning: One-Minute Rule

    For sequentially accessed data, more pages canbe read per second. Assuming sequential readsof 1MB of data at a time:1-minute rule: sequentially accessed data

    that is accessed once or more in a minuteshould be kept in memory

    Prices of disk and memory have changed greatlyover the years, but the ratios have not changed

    much so rules remain as 5 minute and 1 minute rules, not 1

    hour or 1 second rules!

  • 8/2/2019 Lecture 1 Database Tuning

    15/26

    Hardware Tuning: Choice of RAID Level To use RAID 1 or RAID 5?

    Depends on ratio of reads and writes

    RAID 5 requires 2 block reads and 2 block writes to write out one datablock

    If an application requires r reads and w writes per second RAID 1 requires r + 2w I/O operations per second RAID 5 requires: r + 4w I/O operations per second

    For reasonably large r and w, this requires lots of disks tohandle workload RAID 5 may require more disks than RAID 1 to handle load!

    Apparent saving of number of disks by RAID 5 (by using parity, asopposed to the mirroring done by RAID 1) may be illusory!

    Thumb rule: RAID 5 is fine when writes are rare and datais very large, but RAID 1 is preferable otherwise If you need more disks to handle I/O load, just mirror them since

    disk capacities these days are enormous!

  • 8/2/2019 Lecture 1 Database Tuning

    16/26

    Tuning the Database DesignSchema tuning

    Can be done in several ways: Splitting tables

    Sometimes splitting normalized tables can improve performance Can split tables in two ways:

    Horizontally

    Vertically Adds complexity to the applications

    Denormalization: Adding redundant columns Adding derived attributes Collapsing tables

    Duplicating tables Cluster together on the same disk page records that would

    match in a frequently required join, compute join very efficiently when required.

  • 8/2/2019 Lecture 1 Database Tuning

    17/26

    Tuning the Database Design

    Schema tuning (cont.)

    Vertical Splitting Example

    account relation with the following schema:

    account(account-number, brach-name, balance)can be split into two relations:account-branch(account-number, branch-name)account-balance(account-number, balance)

  • 8/2/2019 Lecture 1 Database Tuning

    18/26

    Tuning the Database DesignIndex tuning (cont.)

    When should indexes be considered Unique indexes are implicitly used in conjunction with a

    primary key for the primary key to work Foreign keys are also excellent candidates for an index

    because they are often used to join the parent table

    Most, if not all, columns used for table joins should be indexed Columns that are frequently referenced in the ORDER BY and

    GROUP BY clauses should be considered for indexes Indexes should be created on columns with a high number of

    unique values, or columns when used as filter conditions inthe WHERE clause return a low percentage of rows of datafrom a table

    The effective use of indexes requires a thorough knowledgeof table relationships, query and transaction requirements,and the data itself

  • 8/2/2019 Lecture 1 Database Tuning

    19/26

    Tuning the Database DesignIndex tuning (cont.)

    When should indexes be avoided Indexes should not be used on small tables Indexes should not be used on columns that return a high

    percentage of data rows when used as a filter condition inaquery's WHERE clause

    Tables that have frequent, large batch update jobs run canbe indexed However, the batch job's performance is slowed considerably by the

    index Might consider dropping the index before the batch job, and then re-

    creating the index after the job has completed

    Indexes should not be used on columns that contain a highnumber of NULL values

    Columns that are frequently manipulated should not beIndexed

  • 8/2/2019 Lecture 1 Database Tuning

    20/26

    Tuning the Database DesignMaterialized Views

    Materialized views can help speed up certain queries Particularly aggregate queries

    Overheads Space Time for view maintenance

    Immediate view maintenance: done as part of update time overhead paid by update transaction

    Deferred view maintenance: done only when required update transaction is not affected, but system time is spent on view

    maintenance until updated, the view may be out-of-date

  • 8/2/2019 Lecture 1 Database Tuning

    21/26

    Tuning of Transactions Basic approaches to tuning of transactions

    Improve set orientation

    Reduce lock contention

    Rewriting of queries to improve performance was importantin the past, but smart optimizers have made this lessimportant

    Communication overhead and query handling overheadssignificant part of cost of each call Combine multiple embedded SQL/ODBC/JDBC queries into

    a single set-oriented query

    Set orientation -> fewer calls to database E.g. tune program that computes total salary for each department using

    a separate SQL query by instead using a single query that computestotal salaries for all department at once (using group by)

    Use stored procedures: avoids re-parsing and re-optimizationof query

  • 8/2/2019 Lecture 1 Database Tuning

    22/26

    Tuning of Transactions (Cont.)

    Reducing lock contention

    Long transactions (typically read-only) thatexamine large parts of a relation result in lock

    contention with update transactions E.g. large query to compute bank statistics and

    regularbank transactions

    To reduce contention Use multi-version concurrency control

    E.g. Oracle snapshots which support multi-version 2PL

    Use degree-two consistency (cursor-stability) for longTransactions

  • 8/2/2019 Lecture 1 Database Tuning

    23/26

    Tuning of Transactions (Cont.) Long update transactions cause several problems

    Exhaust lock space

    Exhaust log space

    and also greatly increase recovery time after a crash, and may evenexhaust log space during recovery if recovery algorithm is badlydesigned!

    Use mini-batch transactions to limit number of updates

    that a single transaction can carry out. E.g., if a single largetransaction updates every record of a very large relation,log may grow too big.* Split large transaction into batch of ``mini-transactions,'' each

    performing part of the updates

  • 8/2/2019 Lecture 1 Database Tuning

    24/26

  • 8/2/2019 Lecture 1 Database Tuning

    25/26

  • 8/2/2019 Lecture 1 Database Tuning

    26/26