Database Recovery Techniques By Marcus Hall, Michael Dodd, James (Tripp) Massey, and Julian Gracia.

Database Recovery TechniquesBy Marcus Hall, Michael Dodd, James (Tripp) Massey, and Julian Gracia

Some of the Items that will be discussed in this Presentation regarding Database Recovery Techniques.

Recovery Techniques Presentation Outline:

• Two main techniques for recovery from noncatastrophic transaction failures: Deferred update and Immediate update.

• Salvation program: Run after a crash to attempt to restore the system to a valid state. No recovery data used. Used when all other techniques fail or were not used. Good for cases where buffers were lost in a crash and one wants to reconstruct what was lost.

• Incremental dumping: Modified files copied to archive after job completed or at intervals.

• Audit trail: Sequences of actions on files are recorded. Optimal for "backing out" of transactions. (Ideal if trail is written out before changes).

Database Recovery Techniques

Some of the Items that will be discussed in this Presentation regarding Database Recovery Techniques.

Recovery Techniques Presentation Outline:

• Differential files: Separate file is maintained to keep track of changes, periodically merged with the main file.

• Backup/current version: Present files form the current version of the database. Files containing previous values form a consistent backup version.

• Multiple copies: Multiple active copies of each file are maintained during normal operation of the database. In cases of failure, comparison between the versions can be used to find a consistent version.

• Careful replacement: Nothing is updated in place, with the original only being deleted after operation is complete.


• What is data

• The quantities, characters, or symbols on which operations are performed by a computer, being stored and transmitted in the form of

electrical signals and recorded on magnetic, optical, or mechanical recording media.

• What is a Database?

• A structured set of data held in a computer, server or the cloud especially one that is accessible in various ways.


** Purpose of Database Recovery

• To bring the database into the last consistent state, which existed prior to the failure.

• To preserve transaction properties (Atomicity, Consistency, Isolation and Durability).

**Example:

• If the system crashes before a fund transfer transaction completes its execution, then either one or both accounts may have incorrect value. Thus, the database must be restored to the state before the transaction modified any of the accounts.

**Source: Ramez Elmasri and Shamkant B. Navathe (Purdue University)



• Why do we need Database Recovery Techniques?

• Purpose of Backup and Recovery

• As a backup administrator, your principal duty is to devise, implement, and manage a backup and recovery strategy. In general, the purpose of a backup and recovery strategy is to protect the database against data loss and reconstruct the database after data loss. Typically, backup administration tasks include the following:

• Planning and testing responses to different kinds of failures

• Configuring the database environment for backup and recovery

• Setting up a backup schedule

• Monitoring the backup and recovery environment

• Troubleshooting backup problems

• Recovering from data loss if the need arises

• Safeguard against unexpected data loss and application errors. For example, a disk may fail, causing the loss of datafiles. You can restore a backup of the data and reconstruct the lost data through media recovery. Media recovery refers to the various operations involved in restoring, rolling forward, and rolling back a backup of database files.

• As a backup administrator, you may also be asked to perform other duties that are related to backup and recovery:

• Data preservation, which involves creating a database copy for long-term storage

• Data transfer, which involves moving data from one database or one host to another



What is Data Recovery?

Data recovery is the process of salvaging and handling the data through the data from damaged, failed, corrupted, or inaccessible secondary storage media when it cannot be accessed normally. Often the data are being salvaged from storage media such as internal or external hard disk drives, solid-state drives (SSD), USB flash drive, storage tapes, CDs, DVDs, RAID, and other electronics. Recovery may be required due to physical damage to the storage device or logical damage to the file system that prevents it from being mounted by the host operating system (OS).


Why do we need Database Recovery?

• A major responsibility of the database administrator is to prepare for the possibility of hardware, software, network, process, or system failure. If such a failure affects the operation of a database system, you must usually recover the database and return to normal operation as quickly as possible. Recovery should protect the database and associated users from unnecessary problems and avoid or reduce the possibility of having to duplicate work manually.

• Recovery processes vary depending on the type of failure that occurred, the structures affected, and the type of recovery that you perform. If no files are lost or damaged, recovery may amount to no more than restarting an instance. If data has been lost, recovery requires additional steps.

Source: Oracle.com

Errors and FailuresSeveral problems can halt the normal operation of an Oracle database or affect database I/O to disk. The following sections describe the most common types. For some of these problems, recovery is automatic and requires little or no action on the part of the database user or database administrator.

User Error

• A database administrator can do little to prevent user errors (for example, accidentally dropping a table). Usually, user error can be reduced by increased training on database and application principles. Furthermore, by planning an effective recovery scheme ahead of time, the administrator can ease the work necessary to recover from many types of user errors.


Statement Failure

• Statement failure occurs when there is a logical failure in the handling of a statement in an Oracle program. For example, assume all extents of a table (in other words, the number of extents specified in the MAXEXTENTS parameter of the CREATE TABLE statement) are allocated, and are completely filled with data; the table is absolutely full. A valid INSERT statement cannot insert a row because there is no space available. Therefore, if issued, the statement fails.

• If a statement failure occurs, the Oracle software or operating system returns an error code or message. A statement failure usually requires no action or recovery steps; Oracle automatically corrects for statement failure by rolling back the effects (if any) of the statement and returning control to the application. The user can simply re-execute the statement after correcting the problem indicated by the error message.

Process Failure

• A process failure is a failure in a user, server, or background process of a database instance (for example, an abnormal disconnect or process termination). When a process failure occurs, the failed subordinate process cannot continue work, although the other processes of the database instance can continue.

• The Oracle background process PMON detects aborted Oracle processes. If the aborted process is a user or server process, PMON resolves the failure by rolling back the current transaction of the aborted process and releasing any resources that this process was using. Recovery of the failed user or server process is automatic. If the aborted process is a background process, the instance usually cannot continue to function correctly. Therefore, you must shut down and restart the instance.


Network Failure

• When your system uses networks (for example, local area networks, phone lines, and so on) to connect client workstations to database servers, or to connect several database servers to form a distributed database system, network failures (such as aborted phone connections or network communication software failures) can interrupt the normal operation of a database system. For example:

• A network failure might interrupt normal execution of a client application and cause a process failure to occur. In this case, the Oracle background process PMON detects and resolves the aborted server process for the disconnected user process, as described in the previous section.

• A network failure might interrupt the two-phase commit of a distributed transaction. Once the network problem is corrected, the Oracle background process RECO(RECO is a background process for distributed transactions. The RECO process manager two-phase commits to track and resolve in-doubt transactions.) of each involved database server automatically resolves any distributed transactions not yet resolved at all nodes of the distributed database system.

Database Instance Failure

• Database instance failure occurs when a problem arises that prevents an Oracle database instance (SGA and background processes) from continuing to work. An instance failure can result from a hardware problem, such as a power outage, or a software problem, such as an operating system crash. Instance failure also results when you issue a SHUTDOWN ABORT or STARTUP FORCE command.


What Is a Backup?A database backup is a representative copy of data. When the original data is lost, you can use the backup to reconstruct lost information (the physical files that constitute your database). In the event of a media failure, your database backup is the key to successfully recovering your data.

Perform Backups Frequently and Regularly

• Data is changed at a high rate = more frequently

• Data is mainly read-only = less frequently

Immediately Backup Appropriate Portions of the DatabaseWhen Making Structural Changes

• create or drop a tablespace• add or rename a datafile in an existing tablespace

When to Take Backups

Keep Older Backups

• Consider keeping two or more backups previous to the current backup. If your most recent backups are not usable (for example, the tape drive used for backups writes bad backups), you will not lose all of your data.

Export Database Data for Added Protection and Flexibility

• Export the whole database• Export specific tables• Store locally or in the cloud

Storing Your Backup

• On a separate hard drive• In a safety deposit box or office safe• Online cloud based storage

Test Backup and Recovery Strategies

• Practice your backup and recovery strategies to reduce problems in a real situation• Performing test recoveries regularly ensures that your procedures work• You stay familiar with recovery procedures• You are less likely to make a mistake in a crisis.

Data Centers and Your Data

Data Centers store all the data that is not stored locally on your devices. If any physical damage was to happen to these centers all your data could be lost.

There are a set of standards, ANSI/BICSI 002-2011, to follow when constructing a data center. It sets forth requirements, recommendations and additional information that should be considered when working with critical systems, like the electrical, mechanical, and telecommunication networks, as well as other significant needs, such as site selection, security, and building needs."

Data Center Location

Avoid• Areas too low that may flood• Areas too high because of high wind and lightning• Tornado Alley• Coastal Hurricane Areas• Fault Lines

Ideal• Underground• Multiple Facilities

Deferred update

• These techniques do not physically update the database on disk until after a transaction reaches its commit point; then the updates are recorded in the database.

• If a transaction fails before reaching its commit point, it will not have changed the database in any way, so UNDO is not needed. REDO could be needed

• During transaction execution, the updates are recorded only in the log and in the cache buffers.

Deferred update

• We can state a typical deferred update protocol as follows:

• 1. A transaction cannot change the database on disk until it reaches its commit point.

• 2. A transaction does not reach its commit point until all its REDO-type log entries are recorded in the log and the log buffer is force-written to disk.

When the checkpoint was taken at time t1, transaction T1 had committed.

Before the system crash at time t2, T3 and T2 were committed but not T4 and T5.

There is no need to redo the write_item operations of any transaction committed before the last checkpoint time t1.

T2 and T3 must be redone, because both transactions reached their commit points after the last checkpoint.

T4 and T5 are ignored, because they are effectively canceled or rolled back because none of their write_item operations were recorded in the database.

Deferred update

• The method’s main benefit is that transaction operations never need to be undone, for two reasons:

• 1. A transaction does not record any changes in the database on disk until after it reaches its commit point. Hence, a transaction is never rolled back because of failure during transaction execution.

• 2. A transaction will never read the value of an item that is written by an uncommitted transaction, because items remain locked until a transaction reaches its commit point. Hence, no cascading rollback will occur.

Immediate update

• When a transaction issues an update command, the database on disk can be updated immediately, without any need to wait for the transaction to reach its commit point.

• Provisions must be made for undoing the effect of update operations that have been applied to the database by a failed transaction.

• If the recovery technique ensures that all updates of a transaction are recorded in the database on disk before the transaction commits, there is never a need to REDO any operations of committed transactions.

Shadow Paging• Current Directory

-Whose entries point to the most recent or current database pages on disk.

• Shadow Directory

-a copy of a the Current Directory. During transaction execution, the shadow directory is never modified.

Shadow Paging

• For pages updated by transaction, two versions are kept.

• The Old Version, which is referenced by the shadow directory.

• The New Version, which is referenced by the current directory.

Shadow Paging

SQL Server Database Recovery Techniques

• Full Backups

• Differential Backups

• Transaction Log

Example 1

Example 2

Example 3

Types of failures that occur in SQL Server

• Pre-condition

• Post condition

1. Input errors

2. Table creation

3. Table drop errors

4. Configuration

5. Deletion

• Raid

• Catastrophic System Failure

• Rollback

• Rollforward

• ADR

• Diskfailure

• Read-only files

• Aries Recovery Procedure

• Dirty page

• Shadow paging

• Checkpointing

Recovery Techniques For Database Systems

Salvation Program

• A Salvation Program is the program that is run after a crash to restore the system to a valid state.

• No recovery data is used.• It’s used if other recovery techniques fail or are not used,

or if no crash resistance is provided. • Plays an important role in the HIVE system.• It rescues the information that is still recognizable.

Incremental Dumping • Copies updated files to archival storage• Performed either after TX completion or regular intervals• Creates checkpoints of updated files• Used to restore DB files after a crash• Backup time is faster than full backups.• Incremental backups require less disk, tape, or network drive space.• You can keep several versions of the same files on different backup sets.

Audit Trail

• Keeps track of a sequence of actions.• Useful for DB restoration to pre-crash state.• Supports backing out of a TX, i.e. kind of undo operation.• Useful for rule validation

Differential files

• It’s a backup that only saves the difference in the data since the last full backup

• Can consist of two parts:

• the main file which is unchanged

• the differential file which records all the alterations requested for the main file.

• The differential file can also be used to implement crash resistance.

Differential files

The Advantages

• Differential backups require even less disk, tape, or network drive space than incremental backups.

• Backup time is faster than full or incremental backups.

The Disadvantages

• Access to data is slow, because it must be fetched from the database.

• Can be overcome with Hashing.

• The differential file must be merged with the database, causing downtime.

Backup and Current Versions

• The files containing the present values of existing files form the current version of the database.

• Backup versions of files or databases can be kept in order to make possible the restoration of the files to a previous state.

• Similarly, complete copies of the database can be made regularly in order to make possible the restoration of the database to an earlier state.

Disadvantages

• The operations of unfinished transactions, performed before the failure will be lost.

Multiple Copies

• More than one cope of each file is held.

• Except during update, the multiple copies must always have the same value.

• Widely used for testing updates.

Two Major Methods:

• Majority Voting:

• Holding two copies with flags to indicate "update-in-progress."

• During a cycle one of the two versions is updated. At the end of the cycle this version is copied onto the other version.

• In general two copies and two flags (bits) are sufficient to provide crash resistance.

Careful Replacement

• It seeks to avoid updating structures “in place.”

• The update is performed on a copy of a component (record, page, disk-block), which replaces the original only if the update is successful; and the copy is kept until after the replacement is made successfully.

• There are two instances of the data structure only during update; otherwise there is just one copy, which contains the current value.

• This approach avoids the disadvantages of differential files

Disadvantages

• The file or data structure must be tractable.

• Overhead costs are incurred in disk accesses.

• -this can outweigh its advantages.

Conclusion:• All in all Databases and Information must be safeguarded through

recovery techniques. Databases are susceptible to failures that might include mechanical error, user error, or program error.

• Databases are a necessary tool in the information technology era and in order to ensure a smooth ride, techniques as outlined above must be learned and practiced.

• Some techniques are Oracle software based while other techniques are generic and have more of a wider base use.

Database Recovery Techniques By Marcus Hall, Michael Dodd, James (Tripp) Massey, and Julian Gracia.

Documents

recovery data

recovery strategy

database environment

purpose of database

purpose of backup

main techniques

consistent backup version

backup schedulemonitoring