Top Banner
http://www.oracle.com/technetwork/articles/oem/exadata-commands-part1-402441.html http://www.oracle.com/technetwork/articles/oem/exadata-commands-part2-402442.html http://www.oracle.com/technetwork/articles/oem/exadata-commands-part3-402445.html http://www.oracle.com/technetwork/articles/oem/exadata-commands-part4-402446.html Oracle Exadata Commands Reference Part 1: Jumpstarting on Exadata by Arup Nanda Know your Oracle Exadata Database Machine and understand the building blocks where commands will be applied. > Back to Series TOC (Note: The purpose of this guide is educational; it is not intended to replace official Oracle- provided manuals or other documentation. The information in this guide is not validated by Oracle, is not supported by Oracle, and should only be used at your own risk.) Let's begin with a whirlwind tour of the Oracle Exadata Database Machine . It comes in a rack with the components that make up a database infrastructure: disks, servers, networking gear, and so on. Three configuration types are available: full rack (see below), half rack, or quarter rack. The architecture is identical across all three types but the number of components differs.
96
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Exadata

http://www.oracle.com/technetwork/articles/oem/exadata-commands-part1-402441.htmlhttp://www.oracle.com/technetwork/articles/oem/exadata-commands-part2-402442.htmlhttp://www.oracle.com/technetwork/articles/oem/exadata-commands-part3-402445.htmlhttp://www.oracle.com/technetwork/articles/oem/exadata-commands-part4-402446.html

Oracle Exadata Commands Reference

Part 1: Jumpstarting on Exadataby Arup Nanda

Know your Oracle Exadata Database Machine and understand the building blocks where commands will be applied.> Back to Series TOC(Note: The purpose of this guide is educational; it is not intended to replace official Oracle-provided manuals or other

documentation. The information in this guide is not validated by Oracle, is not supported by Oracle, and should only be used

at your own risk.)

Let's begin with a whirlwind tour of the Oracle Exadata Database Machine. It comes in a rack with the components that make up a database infrastructure: disks, servers, networking gear, and so on. Three configuration types are available: full rack (see below), half rack, or quarter rack. The architecture is identical across all three types but the number of components differs.

Figure 1 Exadata Components, high-level view, at time of writing

Page 2: Exadata

Now let's dive into each of these components and the role they play. The following list applies to a full rack; you can also view them contextually via a really neat 3D demo.

Database Nodes – The Exadata Database Machine runs Oracle Database 11g Real Application Cluster. The cluster and the database run on the servers known as database nodes or compute nodes (or simply “nodes”). A full rack has 8 nodes running Oracle Linux or Oracle Solaris.

Storage cells - The disks are not attached to the database compute nodes, as is normally the case with the direct attached storage, but rather to a different server known as the storage cell (or just “cell”; there are 14 of them in a full rack). The Oracle Exadata Server Software runs in these cells on top of the OS.

Disks – each cell has 12 disks. Depending on the configuration, these disks are either 600GB high performance or 2TB high capacity (GB here means 1 billion bytes, not 1024MB). You have a choice in the disk type while making the purchase.

Flash disks – each cell also has 384GB of flash disks. These disks can be presented to the compute nodes as storage (to be used by the database) or used a secondary cache for the database cluster (called smart cache).

Infiniband circuitry – the cells and nodes are connected through infiniband for speed and low latency. There are 3 infiniband switches for redundancy and throughput. Note: there are no fiber switches since there is no fiber component.

Ethernet switch – the outside world can communicate via infiniband, or by Ethernet. There is a set of Ethernet switches with ports open to the outside. The clients may connect to the nodes using Ethernet. DMAs and others connect to the nodes and cells using Ethernet as well. Backups are preferably via infiniband but they can be done through network as well.

KVM switch – there is a keyboard, video, and mouse switch to get direct access to the nodes and cells physically. This is used initially while setting up and when the network to the system is not available. In a normal environment you will not need to go near the rack and access this KVM, not even for powering on and off the cells and nodes. Why not? You’ll learn why in the next installment. (Not all models have a KVM switch.)

The nodes run the Oracle Clusterware, the ASM instances, and the database instances. You may decide to create just one cluster or multiple ones. Similarly you may decide to create a single database on the cluster or multiple ones. If you were to create three databases – dev, int and QA - you would have two choices:

One cluster – create one cluster and create the three databases

Three clusters – create three different clusters and one database in each of them

The first option allows you to add and remove instances of a database easily. For instance, with 8 nodes in a full rack, you may assign 2 nodes to dev, 2 to int, and 4 to QA. Suppose a full-fledged production stress test is planned and that temporarily needs all 8 nodes in QA to match 8 nodes in production. In this configuration, all you have to do is shut down the dev and int instances and start the other four instances of QA on those nodes. Once the stress test is complete, you can shut down those 4 QA instances and restart the dev and int instances on them.

If you run multiple production databases on a single rack of Exadata, you can still take advantage of this technique. If a specific database needs additional computing power temporarily to ride out a seasonal high demand, just shut down one instance of a different database and restart the instance of the more demanding one in that node. After the demand has waned, you can reverse the situation. You can also run two instances in the same node but they will compete for the resources – something you may not want. At the I/O level, you can control the resource usage by the instances using the IO Resource Manager (IORM).

On the other hand, with this option, you are still on just one cluster. When you upgrade the cluster, all the databases will need to be upgraded. The second option obviates that; there are individual clusters for each database – a complete separation. You can upgrade them or manipulate them any way you want without affecting the others. However, when you need additional computational power for other nodes, you can’t just start up an instance. You need to remove a node from that cluster and add the node to the other cluster where it is needed – an activity more complex compared to the simple shutdown and startup of instances.

Since the cells have the disks, how do the database compute nodes access them - or more specifically, how do the ASM instances running on the compute nodes access the disks? Well, the disks are presented to cells only, not to the compute nodes. The compute nodes see the disks through the cells. For the lack of a better analogy, this is akin to network-attached storage. (Please note, the cell disks are not presented as NAS; this is just an analogy.)

Page 3: Exadata

The flash disks are presented to the cell as storage devices as well, just like the normal disks. As a result they can be added to the pool of ASM disks to be used in the database for ultra fast access, or they can be used to create the smart flash cache layer, which is a secondary cache between database buffer cache and the storage. This layer caches the most used objects but does not follow the same algorithm as the database buffer cache, where everything is cached first before sending to the end user. Smart flash cache caches only those data items which are accessed frequently – hence the term “smart” in the name. The request for data not found in the smart flash cache is routed to disks automatically.The Secret Sauce: Exadata Storage ServerSo, you may be wondering, what’s the “secret sauce” for the Exadata Database Machine’s amazing performance? A suite of software known as Exadata Storage Server, which runs on the storage cells, is the primary reason behind that performance. In this section we will go over the components of the storage server very briefly (not a substitute for documentation!).

Cell OffloadingThe storage in the Exadata Database Machine is not just dumb storage. The storage cells are intelligent enough to process some workload inside them, saving the database nodes from that work. This process is referred to as cell offloading. The exact nature of the offloaded activity is discussed in the following section.

Smart ScanIn a traditional Oracle database, when a user selects a row or even a single column in a row, the entire block containing that row is fetched from the disk to the buffer cache, and the selected row (or column, as the case may be) is then extracted from the block and presented to the user’s session. In the Exadata Database Machine, this process holds true for most types of access, except a very important few. Direct path accesses – for instance, full table scans and full index scans – are done differently. The Exadata Database Machine can pull the specific rows (or columns) from the disks directly and send them to the database nodes. This functionality is known as Smart Scan. It results in huge savings in I/O. 

For instance your query might satisfy only 1,000 rows out of 1 billion but a full table scans in a traditional database retrieves all the blocks and filters the rows from them. Smart Scan, on the other hand, will extract only those 1,000 rows (or even specific columns from those rows, if those are requested) – potentially cutting I/O by 10 million times! The cell offloading enables the cells to accomplish this.

Not all the queries can take advantage of Smart Scan. Direct buffer reads can. An example of such queries is a full table scan. An index scan will look into index blocks first and then the table blocks – so, Smart Scan is not used.

iDBHow can storage cells know what columns and rows to filter from the data? This is done by another component inherently built into the storage software. The communication between nodes and cells employ a specially developed protocol called iDB (short for Intelligent Database). This protocol not only request the blocks (as it happens in an I/O call in a traditional database) but can optionally send other relevant information. In those cases where Smart Scan is possible, iDB sends the names the table, columns, predicates and other relevant information on the query. This information allows the cell to learn a lot more about the query instead of just the address of the blocks to retrieve. Similarly, the cells can send the row and column data instead of the traditional Oracle blocks using iDB.

Storage IndexesHow does Smart Scan achieve sending only those relevant rows and columns instead of blocks? A special data structure built on the pattern of the data within the storage cells enables this. For a specific segment, it stores the minimum, maximum, and whether nulls are present for all the columns of that segment in a specified region of the disk, usually 1MB in size. This data structure is called a storage index. When a cell gets a Smart Scan-enabled query from the database node via iDB, it checks which regions of the storage will not contain the data. For instance if the query predicate states where rating = 3, a region on the disk where the minimum and maximum values of the column RATING are 4 and 10 respectively will definitely not have any row that will match the predicate. Therefore the cell skips reading that portion of the disk. Checking the storage index, the cell excludes a lot of regions that will not contain that value and therefore saves a lot of I/O.

Page 4: Exadata

Although it has the word “index” in its name, a storage index is nothing like a normal index. Normal indexes are used to zero in on the locations where the rows are most likely to be found; storage indexes are used just for the opposite reason – where the rows are most likely not to be found. Also, unlike other segments, these are not stored on the disks; they reside in memory.

Smart CacheDatabase buffer cache is where the data blocks come in before being shipped to the end user. If the data is found there, a trip to the storage is saved. However, if it not found, which might be the case in case of large databases, the I/O will inevitably come in. In Exadata Database Machine, a secondary cache can come in between the database buffer cache and the storage, called Smart Cache. The smart cache holds frequently accessed data and may satisfy the request from the database node from this cache instead of going to the disks – improving performance.

Infiniband NetworkThis is the network inside the Exadata Database Machine – the nervous system of the machine through which the different components such as database nodes and storage cells. Infiniband is a hardware media running a protocol called RDP (Reliable Datagram Protocol), which has high bandwidth and low latency – making the transfer of data extremely fast.

Disk LayoutThe disk layout needs some additional explanation because that’s where most of the activities occur. As I mentioned previously, the disks are attached to the storage cells and presented as logical units (LUNs), on which physical volumes are built.  

Each cell has 12 physical disks. In a high capacity configuration they are about 2TB and in a high performance configuration, they are about 600GB each. The disks are used for the database storage. Two of the 12 disks are also used for the home directory and other Linux operating system files. These two disks are divided into different partitions as shown in Figure 2 below.

Page 5: Exadata

Figure 2 Disk Layout

The physical disks are divided into multiple partitions. Each partition is then presented as a LUN to the cell. Some LUNs are used to create a filesystem for the OS. The others are presented as storage to the cell. These are

called cell disks. The cell disks are further divided as grid disks, ostensibly referencing the grid infrastructure the disks are used inside. These grid disks are used to build ASM Diskgroups, so they are used as ASM disks. An ASM diskgroup is made up of several ASM disks from multiple storage cells. If the diskgroup is built with normal or high redundancy (which is the usual case), the failure groups are placed in different cells. As a result, if one cell fails, the data is still available on other cells. Finally the database is built on these diskgroups.

These diskgroups are created with the following attributes by default:

Parameter Description Value

_._DIRVERSION           The minimum allowed version for directories  11.2.0.2.0

COMPATIBLE.ASM          The maximum ASM version whose features can use this diskgroup. For instance ASM Volume Management is available in 11.2 only. If this parameter is set to 11.1, then this diskgroup

 11.2.0.2.0

Page 6: Exadata

can’t be used for an ASM volume.

IDP.TYPE                Intelligent Data Placement, a feature of ASM that allows placing data in such a way that more frequently accessed data is located close to the periphery of the disk where the access is faster.

 dynamic

CELL.SMART_SCAN_CAPABLE Can this diskgroup be enabled for Exadata Storage Server’s Smart Scan Capability?

 TRUE

COMPATIBLE       The minimum version of the database that can be created on this diskgroup. The far back you go back in version number, the more the message passing between RExadata Database MachineS and ASM instances causing performance issue. So, unless you plan to create a pre-11.2 database here (which you most likely  do not plan on), leave it as it is.

 11.2.0.2

AU Size The size of Allocation Unit on this disk. The AU is the least addressable unit on the diskgroup.

 

On two of the 12 disks, the operating system, Oracle Exadata Storage Server software, and other OS related filesystems such as /home are located. They occupy about 29GB on a disk. For protection, this area is mirrored as RAID1 with on another disk. The filesystems are mounted on that RAID1 volume. 

However, this leaves two cell disks with less data than the other ten. If we create an ASM diskgroup on these 12 disks, it will have an imbalance on those two disks. Therefore, you (or whoever is doing the installation) should create another diskgroup with 29TB from the other 10 cell disks. This will create same sized ASM disks for other diskgroups. This “compensatory” diskgroup is usually named DBFS_DG. Since this diskgroup is built on the inner tracks of the disk, the performance is low compared to the outer tracks. Therefore instead of creating a database file here, you may want to use it for some other purpose such as ETL files. ETL files need a filesystem. You can create a database filesystem on this diskgroup – hence the name DBFS_DG. Of course, you can use it for anything you want, even for database files as well, especially for less accessed objects.

Now that you know the components, look at the next section to get a detailed description of these components.Detailed SpecificationsAs of this writing, the current (third) generation of Exadata Database Machine comes in two models (X2-2 and X2-8); various sizes (full rack, half rack, and quarter rack); and three classes of storage (high performance, high capacity SAS, and high capacity SATA). For detailed specifications, please see the configuration specs on the Oracle website: X2-2, X2-8, X2-2 Storage Server.SummaryIn this installment you learned what Exadata is, what different hardware and software components it is made of, what enables its fast performance, and what you should be managing. A summary is provided below. In the next

installment, you will learn about command categories and initial commands.

Term Description

Cell Offloading

The ability of the storage cells to execute some part of the processing of a query, and in the process filter the unnecessary data at the storage level.

Smart Scan The feature that allows the cells to search for data only in relevant

Page 7: Exadata

cells; not all

iDB Intelligent Database protocol that allows database nodes to pass along information on the query, e.g. the predicate. It enables Smart Scan.

Node Also known as Database Node or Compute Node. This is where the database, ASM and clusterware are run. The clients connect to this. Runs Oracle Enterprise Linux.

Cell Also known as Storage Cells, which run the Exadata Storage Server software. The disks for database are attached to this. Runs Oracle Enterprise Linux.

Smart Flash Flash memory based storage to be used as a disk, or as a secondary cache for frequently accessed segments to reduce disk access.

> Back to Series TOC

Arup Nanda ([email protected]) has been an Oracle DBA for more than 14 years, handling all aspects of database administration, from performance tuning to security and disaster recovery. He is an Oracle ACE Director and was Oracle Magazine's DBA of the Year in 2003.

Oracle Exadata Commands Reference

Part 2: Command Categories, Configuration, and Basic Commandsby Arup Nanda

Learn different categories of commands and what to do after your new Exadata Database Machine is powered on.

> Back to Series TOC(The purpose of this guide is educational; it is not intended to replace official Oracle-provided manuals or other

documentation. The information in this guide is not validated by Oracle, is not supported by Oracle, and should only be used

at your own risk.)

In Part 1, you learned about the composition of the Oracle Exadata Database Machine and its various components. Figure 1 shows the different components again and what types of commands are used in each.

Page 8: Exadata

Figure 1 Command categories Linux commands - Let’s start with the lowest-level component – the physical disk. The physical disk,

as you learned from the previous installment, is the actual disk drive. It has to be partitioned to be used for ASM and regular filesystem. Normal disk management commands come here, e.g. fdisk. The storage cells are Linux servers; so all the regular Unix server administration tasks – shutdown, ps, etc., are relevant here. (For a refresher on Linux commands, you can check out my five-part series on advanced Linux commands.)

CellCLI - Let’s move on the next stack in the software: the Exadata Storage Server. To manage this, Oracle provides a command line tool: CellCLI (Cell Command Line Interpreter). All the cell-related commands are entered through the CellCLI.

DCLI - The scope of the CellCLI command is the cell where it is run, not in other cells. Sometimes you may want to execute a command across multiple cells from one command prompt, e.g. shutting down multiple nodes. There is another command line tool for that: DCLI.

SQL – Once the cell disks are made available to the database nodes, the rest of the work is similar to what happens in a typical Oracle RAC database, in the language you use every day: SQL. SQL*Plus is an interface many DBAs use. You can also use other interfaces such as Oracle SQL Developer. If you have Grid Control, there are lots of commands you don’t even need to remember; they will be GUI based.

Page 9: Exadata

ASMCMD – ASMCMD this is the command line interface for managing ASM resources like diskgroups, backups, etc.

SRVCTL – SRVTCL is a command-line interface to manage Oracle Database 11.2 RAC Clusters. At the database level, most of the commands related to cluster, e.g. starting/stopping cluster resources, checking for status, etc. can be done through this interface.

CRSCTL – CRSCTL is another tool to manage clusters. As of 11.2, the need to use this tool has dwindled to near zero. But there is at least one command in this category.

These are the basic categories of the commands. Of these only CellCLI and DCLI are Exadata specific. The rest, especially SQL, should be very familiar to DBAs. 

Now that you know how narrow the scope of the commands is, do you feel a bit more relaxed? In the next sections we will see how these commands are used. (Note: Since CellCLI and DCLI are Exadata-specific commands, most DBAs making the transition to DMA are not expected to know about them. The next installment of the series – Part 3 –focuses on these two command categories exclusively.)ConfigurationLet’s start with the most exciting part: Your shiny new Exadata Database Machine is here, uncrated, mounted on the floorboards and connected to power. Now what?

Fortunately, the machine comes pre-imaged with all the necessary OS, software and drivers. There is no reason to tinker with the software installation. In fact, it’s not only unnecessary but dangerous as well, since it may void the warranty. You should not install any software on storage cells at all, and only the following on the database servers themselves:

Grid Control Agent (required for management through Grid Control, explained in Part 4)

RMAN Media Management Library (to back up to tape)

Security Agent (if needed)

You are itching to push that button, aren’t you? But wait; before you start the configuration you have to have the following information handy:

Network – you should decide what names you will use for the servers, decide on IP addresses,  have them in DNS, etc.

SMTP and SNMP information - for sending mails, alerts, etc.

Storage layout to address your specific requirements – for instance do you want Normal or High Redundancy, how many diskgroups do you want, what do you want to name them, etc.?

Once all these are done, here are the rough steps:

1. Storage configuration

2. OS configuration

3. Creation of userids in Linux or Oracle Solaris

4. ASM configuration

5. Clusterware installation

6. Database creation

Let’s examine the steps. Please note with several models, capacity classes, and types of hardware, it is not possible to provide details about all the possible combinations. Your specific environment may be unique as well. 

The following section shows a sample configuration and should be followed as an illustration only. For simplicity, the OS covered here is Oracle Linux.

Configuration Worksheet

Page 10: Exadata

Oracle provides a detailed configuration worksheet that allows you to enter specific details of your implementation and decide on exact configuration. This worksheet is found in Exadata storage server in the following directory :

opt/oracle/cell/doc/docThe exact file you want to open is e16099.pdf, which has all the worksheets to guide you how to configure. Here is an excerpt from the worksheet:

Figure 2 Worksheet excerpt

The configuration worksheet creates the following files in the directory /opt/oracle.SupportTools/onecommand. Here is a listing of that directory:# lsall_group cell_group config.dat patchesall_ib_group cell_ib_group dbs_group priv_ib_groupall_nodelist_group checkip.sh dbs_ib_group tmp

These files are very important. Here is a brief description of each file:

File Name Description

all_group List of database nodes and storage cells in this Exadata Database Machine. Here is an excerpt:

proldb01proldb02

Page 11: Exadata

proldb03proldb04

These are the database server nodes.

all_ib_group All host names of the private interconnects, both of cell servers and database nodes. Here is an excerpt from this file:

proldb01-privproldb02-privproldb03-privproldb04-privproldb05-priv

all_nodelist_group All host names – public, hosts, private interconnects – of both storage and database nodes. Here is an excerpt from this file:

proldb07proldb08prolcel01prolcel02prolcel03

cell_group Host names of all cell servers. Here is an excerpt from this file:

prolcel01prolcel02prolcel03prolcel04prolcel05

cell_ib_group Hostnames of private interconnects of all cell servers. Here is an excerpt from this file:

prolcel01-privprolcel02-privprolcel03-privprolcel04-privprolcel05-priv

config.dat The data file that is created from the configuration worksheet and is used to create the various scripts. Here is an excerpt from this file:

customername=AcmeBankdbmprefix=prolcnbase=dbcellbase=celmachinemodel=X2-2 Full rackdbnodecount=8cellnodecount=14

dbs_group Hostnames of the database nodes, similar to the cell servers. Here is an excerpt from the file:

Page 12: Exadata

proldb01proldb02proldb03proldb04

dbs_ib_group Hostnames of private interconnects of the database nodes, similar to the cell servers. Here is an excerpt from the file:

proldb01-privproldb02-privproldb03-privproldb04-priv

priv_ib_group All private interconnect hostnames and their corresponding IP addresses are listed in this file. This is used to populate /etc/hosts file. Here is an excerpt from the file:

### Compute Node Private Interface details172.32.128.1    proldb01-priv.test.prol proldb01-priv172.32.128.2    proldb02-priv.test.prol proldb02-priv172.32.128.3    proldb03-priv.test.prol proldb03-priv172.32.128.4    proldb04-priv.test.prol proldb04-priv

checkip.sh This is a shell script to validate the accuracy of the network configuration. This is one of the most important files. The chckip script is called at multiple places with different parameters as you will see to perform validation at multiple places.

Hardware ProfileThe next thing to do is to check the hardware profile. Oracle provides a tool for that as well. This is the command you should use:

# /opt/oracle.SupportTools/CheckHWnFWProfile

The output should be:

[SUCCESS] The hardware and firmware profile matches one of the supported profiles

If you see something different here, the message should be self-explanatory. The right thing to do at this point is to call up Exadata installation support since some hardware/software combination is not as expected.

Physical DisksNext, you should check the disks to make sure they are up and online. Online does not mean they are available to ASM; it simply means the disks are visible to the server. To check the disks are visible and online, use this command:

# /opt/MegaRAID/MegaCli/MegaCli64 Pdlist -aAll |grep "Slot \|Firmware"

Here is truncated output:

Slot Number: 0Firmware state: Online, Spun UpSlot Number: 1Firmware state: Online, Spun Up… Output truncated …Slot Number: 11Firmware state: Online, Spun Up

Page 13: Exadata

If a disk is not online, you may want to replace it or at least understand the reason.

Flash DisksAfter checking physical disks you should check flash disks. The Linux command for that is lsscsi, shown below.

# lsscsi |grep -i marvel[1:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdm[1:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdn[1:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdo[1:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdp[2:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdq[2:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdr[2:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sds[2:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdt[3:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdu[3:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdv[3:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdw[3:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdx[4:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdy[4:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdz[4:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdaa[4:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdab

By the way, you can also check the flashdisks from the CellCLI tool as well.  TheCellCLI tool is explainedin detail in the next installment in this series.

#cellcliCellCLI: Release 11.2.2.2.0 - Production on Sun Mar 13 12:57:24 EDT 2011Copyright (c) 2007, 2009, Oracle.  All rights reserved.Cell Efficiency Ratio: 627MCellCLI> list lun where disktype=flashdisk         1_0     1_0     normal         1_1     1_1     normal         1_2     1_2     normal         1_3     1_3     normal         2_0     2_0     normal         2_1     2_1     normal         2_2     2_2     normal         2_3     2_3     normal         4_0     4_0     normal         4_1     4_1     normal         4_2     4_2     normal         4_3     4_3     normal         5_0     5_0     normal         5_1     5_1     normal         5_2     5_2     normal         5_3     5_3     normal

To make sure the numbering of the flashdisks is correct, use the following command in CellCLI. Note that there is a hyphen (“-“) after the first line, since the command is too long to fit in one line and the “-“ is the continuation character.

CellCLI> list physicaldisk attributes name, id, slotnumber -> where disktype="flashdisk" and status != "not present"         [1:0:0:0]       5080020000f21a2FMOD0    "PCI Slot: 4; FDOM: 0"         [1:0:1:0]       5080020000f21a2FMOD1    "PCI Slot: 4; FDOM: 1"         [1:0:2:0]       5080020000f21a2FMOD2    "PCI Slot: 4; FDOM: 2"         [1:0:3:0]       5080020000f21a2FMOD3    "PCI Slot: 4; FDOM: 3"

Page 14: Exadata

         [2:0:0:0]       5080020000f131aFMOD0    "PCI Slot: 1; FDOM: 0"         [2:0:1:0]       5080020000f131aFMOD1    "PCI Slot: 1; FDOM: 1"         [2:0:2:0]       5080020000f131aFMOD2    "PCI Slot: 1; FDOM: 2"         [2:0:3:0]       5080020000f131aFMOD3    "PCI Slot: 1; FDOM: 3"         [3:0:0:0]       5080020000f3ec2FMOD0    "PCI Slot: 5; FDOM: 0"         [3:0:1:0]       5080020000f3ec2FMOD1    "PCI Slot: 5; FDOM: 1"         [3:0:2:0]       5080020000f3ec2FMOD2    "PCI Slot: 5; FDOM: 2"         [3:0:3:0]       5080020000f3ec2FMOD3    "PCI Slot: 5; FDOM: 3"         [4:0:0:0]       5080020000f3e16FMOD0    "PCI Slot: 2; FDOM: 0"         [4:0:1:0]       5080020000f3e16FMOD1    "PCI Slot: 2; FDOM: 1"         [4:0:2:0]       5080020000f3e16FMOD2    "PCI Slot: 2; FDOM: 2"         [4:0:3:0]       5080020000f3e16FMOD3    "PCI Slot: 2; FDOM: 3"Auto-configurationWhile it is possible to configure Exadata Database Machine manually, you don’t need to. In fact, you may not want to. Oracle provides three shell scripts for automatic configuration in the directory /opt/oracle.SupportTools/onecommand (these steps may change in later versions):

check_ip.sh – for checking the configuration at various stages

applyconfig.sh – to change the configuration

deploy112.sh – for final deployment

First, you should check the configuration for validity.  To do that execute:

# check_ip.sh -m pre_applyconfigExadata Database Machine Network Verification version 1.9Network verification mode pre_applyconfig starting ...Saving output file from previous run as dbm.out_17739Using name server xx.xxx.59.21 found in dbm.dat for all DNS lookupsProcessing section DOMAIN  : SUCCESSProcessing section NAME    : SUCCESSProcessing section NTP     : SUCCESSProcessing section GATEWAY : SUCCESSProcessing section SCAN    : ERROR - see dbm.out for detailsProcessing section COMPUTE : ERROR - see dbm.out for detailsProcessing section CELL    : ERROR - see dbm.out for detailsProcessing section ILOM    : ERROR - see dbm.out for detailsProcessing section SWITCH  : ERROR - see dbm.out for detailsProcessing section VIP     : ERROR - see dbm.out for detailsProcessing section SMTP    : SMTP "Email Server Settings" standardrelay.acmehotels.com 25:0SUCCESSOne or more checks report ERROR. Review dbm.out for details

If you check the file dbm.out, you can see the exact error messages.

Running in mode pre_applyconfigUsing name server xx.xxx.59.21 found in dbm.dat for all DNS lookupsProcessing section DOMAINtest.prolProcessing section NAMEGOOD : xx.xxx.59.21 responds to resolve request for proldb01.test.prolGOOD : xx.xxx.59.22 responds to resolve request for proldb01.test.prolProcessing section NTPGOOD : xx.xxx.192.1 responds to time server query (/usr/sbin/ntpdate -q)Processing section GATEWAYGOOD : xx.xxx.192.1 pings successfullyGOOD : xx.xxx.18.1 pings successfully

Page 15: Exadata

Processing section SCANGOOD : prol-scan.test.prol resolves to 3 IP addressesGOOD : prol-scan.test.prol forward resolves to xx.xxx.18.32GOOD : xx.xxx.18.32 reverse resolves to prol-scan.test.prol.ERROR : xx.xxx.18.32 pingsGOOD : prol-scan.test.prol forward resolves to xx.xxx.18.33GOOD : xx.xxx.18.33 reverse resolves to prol-scan.test.prol.ERROR : xx.xxx.18.33 pingsGOOD : prol-scan.test.prol forward resolves to xx.xxx.18.34GOOD : xx.xxx.18.34 reverse resolves to prol-scan.test.prol.ERROR : xx.xxx.18.34 pingsProcessing section COMPUTEGOOD : proldb01.test.prol forward resolves to xx.xxx.192.16GOOD : xx.xxx.192.16 reverse resolves to proldb01.test.prol.ERROR : xx.xxx.192.16 pings GOOD : proldb02.test.prol forward resolves to xx.xxx.192.17GOOD : xx.xxx.192.17 reverse resolves to proldb02.test.prol.ERROR : xx.xxx.192.17 pingsGOOD : proldb03.test.prol forward resolves to xx.xxx.192.18GOOD : xx.xxx.192.18 reverse resolves to proldb03.test.prol.ERROR : xx.xxx.192.18 pings… output truncated …

It will report all issues that must be addressed. After addressing all issues, execute the actual configuration:

# applyconfig.sh

After it completes, connect the Exadata Database Machine to your network and check for the validity:

# check_ip.sh -m post_applyconfig

It will report the output in the same manner as the pre_applyconfig parameter and will report any issue, if present. After fixing the issues, run the deployment script. That script actually executes several steps inside it – 29 in all. The most prudent thing to do is to first list out all the steps so that you can be familiar with them. The option -l (that’s the letter “l”; not the numeral “1”) displays all the steps in the list.

# deploy112.sh –l

To run all the steps you should issue

# deploy112.sh –i

If you would prefer, you can run steps one by one, or groups at a time. To run steps 1 through 3, issue:

# deploy112.sh –i -r 1-3

Or, to run only step 1:

# deploy112.sh -i -s 1

The steps are listed here. (Please note: the steps can change without notice. The most up-to-date list will always be found in the release notes that come with an Exadata box.)

Step Description

0 Validate this server setup

1 Setup SSH for the root user.

2 Validate all nodes.

Page 16: Exadata

3 Unzip files.

4 Update the /etc/hosts directory.

5 Create the cellip.ora and cellinit.ora files

6 Validate the hardware.

7 Validate the InfiniBand network.

8 Validate the cells.

9 Check RDS using the ping command.

10 Run the CALIBRATE command.

11 Validate the time and date.

12 Update the configuration.

13 Create the user accounts for celladmin and cellmonitor.

14 Set up SSH for the user accounts.

15 Create the Oracle home directories.

16 Create the grid disks.

17 Install the grid software.

18 Run the grid root scripts.

19 Install the Oracle Database software.

20 Create the listener.

21 Run Oracle ASM configuration assistant to configure Oracle ASM.

22 Unlock the Oracle Grid Infrastructure home directory.

23 Relink Reliable Data Socket (RDS) protocol.

24 Lock Oracle Grid Infrastructure.

Page 17: Exadata

25 Set up e-mail alerts for Exadata Cells.

26 Run Oracle Database Configuration Assistant.

27 Set up Oracle Enterprise Manager Grid Control.

28 Apply any security fixes.

29 Secure Oracle Exadata Database Machine.

Here is the output of the script (amply truncated at places to conserve space):

# ./deploy112.sh -i

Script started, file is /opt/oracle.SupportTools/onecommand/tmp/STEP-0-proldb01-20110331154414.log

=========== 0 ValidateThisNodeSetup Begin ===============

Validating first boot...

This step will validate DNS, NTS, params.sh, dbmachine.params, and all the

files generated by the DB Machine Configurator

In Check and Fix Hosts...

INFO: This nslookup could take upto ten seconds to resolve if the host isn't in DNS, please wait..

INFO: Running /usr/bin/nslookup prol-scan...

INFO: Running /usr/bin/nslookup proldb02...

SUCCESS: SCAN and VIP found in DNS...

Looking up nodes in dbmachine.params and dbs_group...

SUCCESS: proldb01 has ip address of xx.xxx.192.16..A_OK

SUCCESS: proldb02 has ip address of xx.xxx.192.17..A_OK

… output truncated …

SUCCESS: proldb08 has ip address of xx.xxx.192.23..A_OK

SUCCESS: prol01-vip has ip address of xx.xxx.18.24..A_OK

SUCCESS: Found IP Address xx.xxx.18.24 for prol01-vip using ping...

SUCCESS: Based on bondeth0:xx.xxx.18.16 and NetMask:255.255.255.0 we picked bondeth0 as the

appropriate VIP interface

SUCCESS: prol02-vip has ip address of xx.xxx.18.25..A_OK

SUCCESS: Found IP Address xx.xxx.18.24 for prol01-vip using ping...

SUCCESS: Based on bondeth0:xx.xxx.18.16 and NetMask:255.255.255.0 we picked bondeth0 as the

appropriate VIP interface

… output truncated …

SUCCESS: prol08-vip has ip address of xx.xxx.18.31..A_OK

SUCCESS: Found IP Address xx.xxx.18.24 for prol01-vip using ping...

SUCCESS: Based on bondeth0:xx.xxx.18.16 and NetMask:255.255.255.0 we picked bondeth0 as the

appropriate VIP interface

Checking blocksizes...

SUCCESS: DB blocksize is 16384 checks out

checking patches

checking patches and version = 11202

SUCCESS: Located patch# 10252487 in /opt/oracle.SupportTools/onecommand/patches...

INFO: Checking zip files

INFO: Validating zip file /opt/oracle.SupportTools/onecommand/p10098816_112020_Linux-x86-

64_1of7.zip...

Archive:  /opt/oracle.SupportTools/onecommand/p10098816_112020_Linux-x86-64_1of7.zip

  Length     Date   Time    Name

 --------    ----   ----    ----

        0  11-16-10 03:10   database/

        0  11-16-10 03:03   database/install/

Page 18: Exadata

      182  11-16-10 03:03   database/install/detachHome.sh

… output truncated …

    41092  11-16-10 03:03   database/doc/install.112/e17212/concepts.htm

     1892  11-16-10 03:03   database/doc/install.112/e17212/contents.js

    44576  11-16-10 03:03   database/doc/install.112/e17212/crsunix.htm

ERROR: /usr/bin/unzip -l /opt/oracle.SupportTools/onecommand/p10098816_112020_Linux-x86-64_1of7.zip

did not complete successfully: Return Status: 80 Step# 1

Exiting...

Time spent in step 1  = 1 seconds

INFO: Going to run /opt/oracle.cellos/ipconf /opt/oracle.SupportTools/onecommand/preconf-11-2-1-2-

2.csv -verify -ignoremismatch -verbose to validate first boot...

INFO: Running /opt/oracle.cellos/ipconf -verify -ignoremismatch -verbose on this node...

Verifying of configuration for /opt/oracle.cellos/cell.conf

Config file exists                                                : PASSED

Load configuration                                                : PASSED

Config version defined                                            : PASSED

Config version 11.2.2.1.1 has valid value                         : PASSED

Nameserver xx.xxx.59.21 has valid IP address syntax               : PASSED

Nameserver xx.xxx.59.22 has valid IP address syntax               : PASSED

Canonical hostname defined                                        : PASSED

Canonical hostname has valid syntax                               : PASSED

Node type defined                                                 : PASSED

Node type db is valid                                             : PASSED

This node type is db                                              : PASSED

Timezone defined                                                  : PASSED

Timezone found in /usr/share/zoneinfo                             : PASSED

NTP server xx.xxx.192.1 has valid syntax                          : PASSED

NTP drift file defined                                            : PASSED

Network eth0 interface defined                                    : PASSED

IP address defined for eth0                                       : PASSED

IP address has valid syntax for eth0                              : PASSED

Netmask defined for eth0                                          : PASSED

Netmask has valid syntax for eth0                                 : PASSED

Gateway has valid syntax for eth0                                 : PASSED

Gateway is inside network for eth0                                : PASSED

Network type defined for eth0                                     : PASSED

Network type has proper value for eth0                            : PASSED

Hostname defined for eth0                                         : PASSED

Hostname for eth0 has valid syntax                                : PASSED

Network bondeth0 interface defined                                : PASSED

IP address defined for bondeth0                                   : PASSED

IP address has valid syntax for bondeth0                          : PASSED

Netmask defined for bondeth0                                      : PASSED

Netmask has valid syntax for bondeth0                             : PASSED

Gateway has valid syntax for bondeth0                             : PASSED

Gateway is inside network for bondeth0                            : PASSED

Network type defined for bondeth0                                 : PASSED

Network type has proper value for bondeth0                        : PASSED

Hostname defined for bondeth0                                     : PASSED

Hostname for bondeth0 has valid syntax                            : PASSED

Slave interfaces for bondeth0 defined                             : PASSED

Two slave interfaces for bondeth0 defined                         : PASSED

Master interface ib0 defined                                      : PASSED

Master interface ib1 defined                                      : PASSED

Network bondib0 interface defined                                 : PASSED

IP address defined for bondib0                                    : PASSED

IP address has valid syntax for bondib0                           : PASSED

Netmask defined for bondib0                                       : PASSED

Page 19: Exadata

Netmask has valid syntax for bondib0                              : PASSED

Network type defined for bondib0                                  : PASSED

Network type has proper value for bondib0                         : PASSED

Hostname defined for bondib0                                      : PASSED

Hostname for bondib0 has valid syntax                             : PASSED

Slave interfaces for bondib0 defined                              : PASSED

Two slave interfaces for bondib0 defined                          : PASSED

At least 1 configured Eth or bond over Eth interface(s) defined   : PASSED

2 configured Infiniband interfaces defined                        : PASSED

1 configured bond over ib interface(s) defined                    : PASSED

ILOM hostname defined                                             : PASSED

ILOM hostname has valid syntax                                    : PASSED

ILOM short hostname defined                                       : PASSED

ILOM DNS search defined                                           : PASSED

ILOM full hostname matches short hostname and DNS search          : PASSED

ILOM IP address defined                                           : PASSED

ILOM IP address has valid syntax                                  : PASSED

ILOM Netmask defined                                              : PASSED

ILOM Netmask has valid syntax                                     : PASSED

ILOM Gateway has valid syntax                                     : PASSED

ILOM Gateway is inside network                                    : PASSED

ILOM nameserver has valid IP address syntax                       : PASSED

ILOM use NTP servers defined                                      : PASSED

ILOM use NTP has valid syntax                                     : PASSED

ILOM first NTP server has non-empty value                         : PASSED

ILOM first NTP server has valid syntax                            : PASSED

ILOM timezone defined                                             : PASSED

Done. Config OK

INFO: Printing group files....

######################################################

This is the list of Database nodes...

proldb01

… output truncated …

proldb08

This is the list of Cell nodes...

prolcel01

… output truncated …

prolcel14

This is the list of Database Private node names...

proldb01-priv

… output truncated …

proldb08-priv

This is the list of Cell Private node names...

prolcel01-priv

… output truncated …

prolcel14-priv

This is the list all node names...

proldb01

… output truncated …

prolcel14

This is the list all private node names...

proldb01-priv

… output truncated …

prolcel14-priv

This is the template /etc/hosts file for private nodes...

### Compute Node Private Interface details

172.32.128.1    proldb01-priv.test.prol proldb01-priv

Page 20: Exadata

… output truncated …

172.32.128.8    proldb08-priv.test.prol proldb08-priv

### CELL Node Private Interface details

172.32.128.9    prolcel01-priv.test.prol        prolcel01-priv

… output truncated …

172.32.128.22   prolcel14-priv.test.prol        prolcel14-priv

### Switch details

# The following 5 IP addresses are for reference only. You may

# not be able to reach these IP addresses from this machine

# xx.xxx.192.60 prolsw-kvm.test.prol    prolsw-kvm

# xx.xxx.192.61 prolsw-ip.test.prol     prolsw-ip

# xx.xxx.192.62 prolsw-ib1.test.prol    prolsw-ib1

# xx.xxx.192.63 prolsw-ib2.test.prol    prolsw-ib2

# xx.xxx.192.64 prolsw-ib3.test.prol    prolsw-ib3

Creating work directories and validating  required files

ERROR: Please review and fix all ERROR's, we appear to have 1 errors...

Exiting...

Time spent in step 0 ValidateThisNodeSetup = 1 seconds

Script done, file is /opt/oracle.SupportTools/onecommand/tmp/STEP-0-proldb01-20110331154414.log

Check post-deployment configuration for IP addresses.

# ./checkip.sh -m post_deploy112

Exadata Database Machine Network Verification version 1.9

 

Network verification mode post_deploy112 starting ...

Saving output file from previous run as dbm.out_772

Using name server xx.xxx.59.21 found in dbm.dat for all DNS lookups

Processing section DOMAIN  : SUCCESS

Processing section NAME    : SUCCESS

Processing section NTP     : SUCCESS

Processing section GATEWAY : SUCCESS

Processing section SCAN    : SUCCESS

Processing section COMPUTE : SUCCESS

Processing section CELL    : SUCCESS

Processing section ILOM    : SUCCESS

Processing section SWITCH  : SUCCESS

Processing section VIP     : SUCCESS

Processing section SMTP    : SMTP "Email Server Settings" standardrelay.acmehotels.com 25:0

SUCCESS

If everything should come back OK, your installation and configuration was successful.

Basic Commands

PowerLet’s start by understanding some very first commands you will need: powering on and off. The command for that is IPMITOOL. To power on a cell or database server, issue this from another server:

#  ipmitool -H prolcel01-ilom -U root chassis power on

IPMI – short for Intelligent Platform Management Interface - is an interface standard that allows remote management of a server from another using standardized interface. The servers in the Exadata Database Machine follow that. It’s not an Exadata command but rather a general Linux one. To get all the options available, execute:

# ipmitool –h

To stop a server, use the shutdown command. To stop immediately and keep it down, i.e. not reboot, execute:

# shutdown -h -y now

Page 21: Exadata

To shut down after 10 minutes (the users will get a warning message)

# shutdown -h -y 10

To reboot the server (the “-r” option is for reboot)

# shutdown –r –y now

Or, a simple:

# reboot

Sometimes you may want to shutdown multiple servers. The DCLI command comes handy that time. To shut down all the cells, execute the command:

# dcli -l root -g all_cells shutdown -h -y nowThe –g option allows you to give a filename containing all the cell servers. For instance all_cells is a file as

shown below:

# cat all_cellsprolcel01prolcel02prolcel03prolcel04prolcel05prolcel06prolcel07prolcel08

You could use a similar file for all database servers and name it all_nodes. To shutdown all database servers:

# dcli -l root -g all_nodes shutdown -h -y now

You will learn the DCLI command in detail in the next installment.

MaintenanceFrom time to time you will need to maintain the servers. (Remember, you are the DMA now, not the DBA.) One of the most common tasks is to install new software Images. Let’s see some of the related commands.

To learn what software image is installed, use the following:

# imageinfoKernel version: 2.6.18-194.3.1.0.3.el5 #1 SMP Tue Aug 31 22:41:13 EDT 2010 x86_64Cell version: OSS_11.2.0.3.0_LINUX.X64_101206.2Cell rpm version: cell-11.2.2.2.0_LINUX.X64_101206.2-1Active image version: 11.2.2.2.0.101206.2Active image activated: 2011-01-21 14:09:21 -0800Active image status: successActive system partition on device: /dev/md5Active software partition on device: /dev/md7In partition rollback: ImpossibleCell boot usb partition: /dev/sdac1Cell boot usb version: 11.2.2.2.0.101206.2Inactive image version: undefinedRollback to the inactive partitions: ImpossibleYou can glean some important information from the output above. Note the line Active image version: 11.2.2.2.0.101206.2, which indicates the specific Exadata Storage Server version. It also shows the date

and time the software image was activated, which can be used to troubleshoot. If you see problems occurring from a specific date and time, you may be able to correlate.

On the heels of the above, the next logical question could be, if a new image was installed (activated), what was the version before this. To find out the history of all the image changes, you can use the imagehistory command.

Page 22: Exadata

# imagehistoryVersion                              : 11.2.2.2.0.101206.2Image activation date                : 2011-01-21 14:09:21 -0800Imaging mode                         : freshImaging status                       : success

This is a fresh install, so you don’t see much of history.

Managing InfinibandFor the newly minted DMA nothing is as rattling as the networking commands. It’s like being given a stick-shift car when all you have ever driven is an automatic.

As DBAs you probably didn’t have to execute anything other than ifconfig and netstat. Well, they still apply; so don’t forget that. But let’s see how to extend that knowledge to infiniband.

StatusTo get the status of the Infiniband services. First to check the status of the infiniband devices, use the ibstatus command.

# ibstatusInfiniband device 'mlx4_0' port 1 status:        default gid:     fe80:0000:0000:0000:0021:2800:01a0:fd45        base lid:        0x1a        sm lid:          0xc        state:           4: ACTIVE        phys state:      5: LinkUp        rate:            40 Gb/sec (4X QDR)Infiniband device 'mlx4_0' port 2 status:        default gid:     fe80:0000:0000:0000:0021:2800:01a0:fd46        base lid:        0x1c        sm lid:          0xc        state:           4: ACTIVE        phys state:      5: LinkUp        rate:            40 Gb/sec (4X QDR)… output truncated …

If it comes out OK, the next step is to check the status of the Infiniband Link, using the iblinkinfo. Here is a truncated output to save space.

# iblinkinfo

Switch 0x0021286cd6ffa0a0 Sun DCS 36 QDR switch prolsw-ib1.test.prol:

           1    1[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

           1    2[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

… output truncated …

           1   17[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

           1   18[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

           1   19[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      12   32[  ] "Sun DCS 36 QDR switch

localhost" ( )

           1   20[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

           1   21[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      11   32[  ] "Sun DCS 36 QDR switch

prolsw-ib2.test.prol" ( )

… output truncated …

           1   36[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

Switch 0x0021286cd6eba0a0 Sun DCS 36 QDR switch localhost:

          12    1[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      43    2[  ] "prolcel02 C

172.32.128.10 HCA-1" ( )

… output truncated …

          12   11[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

Page 23: Exadata

          12   12[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      17    2[  ] "proldb04 S 172.32.128.4

HCA-1" ( )

… output truncated …

          12   18[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      11   17[  ] "Sun DCS 36 QDR switch

prolsw-ib2.test.prol" ( )

          12   19[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      20    1[  ] "prolcel13 C

172.32.128.21 HCA-1" ( )

… output truncated …

          12   29[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

          12   30[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>       6    1[  ] "proldb05 S 172.32.128.5

HCA-1" ( )

          12   31[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      11   31[  ] "Sun DCS 36 QDR switch

prolsw-ib2.test.prol" ( )

          12   32[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>       1   19[  ] "Sun DCS 36 QDR switch

prolsw-ib1.test.prol" ( )

          12   33[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

… output truncated …

          12   36[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

Switch 0x0021286ccc72a0a0 Sun DCS 36 QDR switch prolsw-ib2.test.prol:

          11    1[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      42    1[  ] "prolcel02 C

172.32.128.10 HCA-1" ( )

… output truncated …

          11   10[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      14    1[  ] "proldb02 S 172.32.128.2

HCA-1" ( )

          11   11[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

… output truncated …

          11   28[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>       3    2[  ] "proldb07 S 172.32.128.7

HCA-1" ( )

          11   29[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

          11   30[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>       7    2[  ] "proldb05 S 172.32.128.5

HCA-1" ( )

          11   31[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>      12   31[  ] "Sun DCS 36 QDR switch

localhost" ( )

          11   32[  ] ==( 4X xx.0 Gbps Active/  LinkUp)==>       1   21[  ] "Sun DCS 36 QDR switch

prolsw-ib1.test.prol" ( )

          11   33[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

          11   34[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

          11   35[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

          11   36[  ] ==( 4X 2.5 Gbps   Down/Disabled)==>             [  ] "" ( )

TopologyTo get the topology of the infiniband network inside Exadata, use an Oracle supplied tool verify-topology, available in the directory /opt/oracle.SupportTools/ibdiagtools# ./verify-topology.        [ DB Machine Infiniband Cabling Topology Verification Tool ]                [Version 11.2.1.3.b]Looking at 1 rack(s).....Spine switch check: Are any Exadata nodes connected ..............[SUCCESS]Spine switch check: Any inter spine switch connections............[SUCCESS]Spine switch check: Correct number of spine-leaf links............[SUCCESS]Leaf switch check: Inter-leaf link check..........................[SUCCESS]Leaf switch check: Correct number of leaf-spine connections.......[SUCCESS]Check if all hosts have 2 CAs to different switches...............[SUCCESS]Leaf switch check: cardinality and even distribution..............[SUCCESS]Cluster OperationsTo manage the Oracle Clusterware you use the same commands as you would in a traditional Oracle 11g Release 2 RAC database cluster. The commands are:

CRSCTL – for a few cluster related commands

Page 24: Exadata

SRVCTL – for most cluster related commands

CRSCTL is not used much but you need it for some occasions – mostly to shut down the cluster and to start up (if is not started automatically during the machine startup). Remember, you have to be root to issue this command. However, the root user may not have the location of this tool in its path. So, you should use its fully qualified patch while issuing the command. Here is the command to stop the cluster on all nodes:

# <OracleGridInfrastructureHome>/bin/crsctl stop cluster –all

You don’t need to shutdown the cluster on all nodes; sometimes all you need is to shut down the cluster on only one node. To shut down the cluster on one node alone, use:

# <OracleGridInfrastructureHome>/bin/crsctl stop cluster –n <HostName>

Similarly to start the cluster on one of the nodes where the cluster was initially stopped,

# <OracleGridInfrastructureHome>/bin/crsctl start cluster –n <HostName>

Finally, you may want to make sure all the cluster resources are running. Here is the command for that. The status command does not need to be issued by root.

# <OracleGridInfrastructureHome>/bin/crsctl status resource –t

--------------------------------------------------------------------------------

NAME           TARGET  STATE        SERVER                   STATE_DETAILS       

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.DBFS_DG.dg

               ONLINE  ONLINE       proldb01                                     

               ONLINE  ONLINE       proldb02                                     

               ONLINE  ONLINE       proldb03                                     

               ONLINE  ONLINE       proldb04                                     

               ONLINE  ONLINE       proldb05                                     

               ONLINE  ONLINE       proldb06                                     

               ONLINE  ONLINE       proldb07                                     

               ONLINE  ONLINE       proldb08                                     

ora.PRODATA.dg

               ONLINE  ONLINE       proldb01                                     

               ONLINE  ONLINE       proldb02                                     

               ONLINE  ONLINE       proldb03                                     

               ONLINE  ONLINE       proldb04                                     

               ONLINE  ONLINE       proldb05                                     

               ONLINE  ONLINE       proldb06                                     

               ONLINE  ONLINE       proldb07                                     

               ONLINE  ONLINE       proldb08                                     

ora.PRORECO.dg

               ONLINE  ONLINE       proldb01                                     

               ONLINE  ONLINE       proldb02                                     

               ONLINE  ONLINE       proldb03                                     

               ONLINE  ONLINE       proldb04                                     

               ONLINE  ONLINE       proldb05                                     

               ONLINE  ONLINE       proldb06                                     

               ONLINE  ONLINE       proldb07                                     

               ONLINE  ONLINE       proldb08                                     

ora.LISTENER.lsnr

               ONLINE  ONLINE       proldb01                                     

               ONLINE  ONLINE       proldb02                                     

               ONLINE  ONLINE       proldb03                                     

               ONLINE  ONLINE       proldb04                                     

               ONLINE  ONLINE       proldb05                                     

Page 25: Exadata

               ONLINE  ONLINE       proldb06                                     

               ONLINE  ONLINE       proldb07                                     

               ONLINE  ONLINE       proldb08                                     

ora.asm

               ONLINE  ONLINE       proldb01                 Started             

               ONLINE  ONLINE       proldb02                 Started             

               ONLINE  ONLINE       proldb03                 Started             

               ONLINE  ONLINE       proldb04                 Started             

               ONLINE  ONLINE       proldb05                 Started             

               ONLINE  ONLINE       proldb06                 Started             

               ONLINE  ONLINE       proldb07                 Started             

               ONLINE  ONLINE       proldb08                                     

ora.gsd

               OFFLINE OFFLINE      proldb01                                     

               OFFLINE OFFLINE      proldb02                                     

               OFFLINE OFFLINE      proldb03                                     

               OFFLINE OFFLINE      proldb04                                     

               OFFLINE OFFLINE      proldb05                                     

               OFFLINE OFFLINE      proldb06                                     

               OFFLINE OFFLINE      proldb07                                     

               OFFLINE OFFLINE      proldb08                                     

ora.net1.network

               ONLINE  ONLINE       proldb01                                     

               ONLINE  ONLINE       proldb02                                     

               ONLINE  ONLINE       proldb03                                     

               ONLINE  ONLINE       proldb04                                     

               ONLINE  ONLINE       proldb05                                     

               ONLINE  ONLINE       proldb06                                     

               ONLINE  ONLINE       proldb07                                     

               ONLINE  ONLINE       proldb08                                     

ora.ons

               ONLINE  ONLINE       proldb01                                     

               ONLINE  ONLINE       proldb02                                     

               ONLINE  ONLINE       proldb03                                     

               ONLINE  ONLINE       proldb04                                     

               ONLINE  ONLINE       proldb05                                     

               ONLINE  ONLINE       proldb06                                     

               ONLINE  ONLINE       proldb07                                     

               ONLINE  ONLINE       proldb08                                     

ora.registry.acfs

               ONLINE  ONLINE       proldb01                                     

               ONLINE  ONLINE       proldb02                                     

               ONLINE  ONLINE       proldb03                                     

               ONLINE  ONLINE       proldb04                                     

               ONLINE  ONLINE       proldb05                                     

               ONLINE  ONLINE       proldb06                                     

               ONLINE  ONLINE       proldb07                                     

               ONLINE  ONLINE       proldb08                                     

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.LISTENER_SCAN1.lsnr

      1        ONLINE  ONLINE       proldb07                                     

ora.LISTENER_SCAN2.lsnr

      1        ONLINE  ONLINE       proldb02                                     

ora.LISTENER_SCAN3.lsnr

      1        ONLINE  ONLINE       proldb05                                     

ora.cvu

Page 26: Exadata

      1        ONLINE  ONLINE       proldb02                                     

ora.proldb01.vip

      1        ONLINE  ONLINE       proldb01                                     

ora.proldb02.vip

      1        ONLINE  ONLINE       proldb02                                     

ora.proldb03.vip

      1        ONLINE  ONLINE       proldb03                                     

ora.proldb04.vip

      1        ONLINE  ONLINE       proldb04                                     

ora.proldb05.vip

      1        ONLINE  ONLINE       proldb05                                     

ora.proldb06.vip

      1        ONLINE  ONLINE       proldb06                                     

ora.proldb07.vip

      1        ONLINE  ONLINE       proldb07                                     

ora.proldb08.vip

      1        ONLINE  ONLINE       proldb08                                     

ora.prolrd.db

      1        ONLINE  ONLINE       proldb01                 Open                

      2        ONLINE  ONLINE       proldb02                 Open                

      3        ONLINE  ONLINE       proldb03                 Open                

      4        ONLINE  ONLINE       proldb04                 Open                

      5        ONLINE  ONLINE       proldb05                 Open                

      6        ONLINE  ONLINE       proldb06                 Open                

      7        ONLINE  ONLINE       proldb07                 Open                

      8        ONLINE  ONLINE       proldb08                 Open                

ora.oc4j

      1        ONLINE  ONLINE       proldb01                                     

ora.scan1.vip

      1        ONLINE  ONLINE       proldb07                                     

ora.scan2.vip

      1        ONLINE  ONLINE       proldb02                                     

ora.scan3.vip

      1        ONLINE  ONLINE       proldb05  

                     This output shows clearly the status of the various resources. A complete explanation of all the options of CRSCTL is not possible to give. Here is an abbreviated list of the options. To know the exact parameters required for each resource, simply call it with -h option.  For instance, to know about the backup option,

execute

# crsctl backup -hUsage:  crsctl backup css votedisk     Backup the voting disk.

Here is the list of the options for CRSCTL:

       crsctl add        - add a resource, type or other entity       crsctl backup    - back up voting disk for CSS       crsctl check     - check a service, resource or other entity       crsctl config    - output autostart configuration       crsctl debug     - obtain or modify debug state       crsctl delete    - delete a resource, type or other entity       crsctl disable   - disable autostart       crsctl discover  - discover DHCP server       crsctl enable    - enable autostart       crsctl get       - get an entity value       crsctl getperm   - get entity permissions

Page 27: Exadata

       crsctl lsmodules - list debug modules       crsctl modify    - modify a resource, type or other entity       crsctl query     - query service state       crsctl pin       - Pin the nodes in the nodelist       crsctl relocate  - relocate a resource, server or other entity       crsctl replace   - replaces the location of voting files       crsctl release   - release a DHCP lease       crsctl request   - request a DHCP lease       crsctl setperm   - set entity permissions       crsctl set       - set an entity value       crsctl start     - start a resource, server or other entity       crsctl status    - get status of a resource or other entity       crsctl stop      - stop a resource, server or other entity       crsctl unpin     - unpin the nodes in the nodelist       crsctl unset     - unset a entity value, restoring its default

Another command SRVCTL performs most of the server-based operations including resource (such as service) relocation. This is nothing different from the tool on a traditional Oracle RAC 11g Release 2 Cluster. To know more about the options in this tool, execute this command:

# srvctl -h

Usage: srvctl [-V]

Usage: srvctl add database -d <db_unique_name> -o <oracle_home> [-c {RACONENODE | RAC | SINGLE} 

[-e <server_list>] [-i <instname>] [-w <timeout>]] [-m <domain_name>] [-p <spfile>] [-r {PRIMARY |

PHYSICAL_STANDBY | LOGICAL_STANDBY | SNAPSHOT_STANDBY}] 

[-s <start_options>] [-t <stop_options>] [-n <db_name>] [-y {AUTOMATIC | MANUAL}] [-g

"<serverpool_list>"] [-x <node_name>] [-a "<diskgroup_list>"] 

[-j "<acfs_path_list>"]

Usage: srvctl config database [-d <db_unique_name> [-a] ] [-v]

Usage: srvctl start database -d <db_unique_name> [-o <start_options>] [-n <node>]

Usage: srvctl stop database -d <db_unique_name> [-o <stop_options>] [-f]

Usage: srvctl status database -d <db_unique_name> [-f] [-v]

… output truncated …

IPMI ToolEarlier in this article you saw a reference to the IPMI tool. We used it to power the servers on. But that is not the only thing you can do with this tool; there are plenty more options. If you want to find out what options are available, issue the command without any arguments.

# ipmitoolNo command provided!Commands:        raw           Send a RAW IPMI request and print response        i2c           Send an I2C Master Write-Read command and print response        spd           Print SPD info from remote I2C device        lan           Configure LAN Channels        chassis       Get chassis status and set power state        power         Shortcut to chassis power commands        event         Send pre-defined events to MC        mc            Management Controller status and global enables        sdr           Print Sensor Data Repository entries and readings        sensor        Print detailed sensor information        fru           Print built-in FRU and scan SDR for FRU locators        sel           Print System Event Log (SEL)        pef           Configure Platform Event Filtering (PEF)        sol           Configure and connect IPMIv2.0 Serial-over-LAN        tsol          Configure and connect with Tyan IPMIv1.5 Serial-over-LAN        isol          Configure IPMIv1.5 Serial-over-LAN        user          Configure Management Controller users        channel       Configure Management Controller channels

Page 28: Exadata

        session       Print session information        sunoem        OEM Commands for Sun servers        kontronoem    OEM Commands for Kontron devices        picmg         Run a PICMG/ATCA extended cmd        fwum          Update IPMC using Kontron OEM Firmware Update Manager        firewall      Configure Firmware Firewall        shell         Launch interactive IPMI shell        exec          Run list of commands from file        set           Set runtime variable for shell and exec        hpm           Update HPM components using PICMG HPM.1 fileIt’s not possible to explain each option here. Let’s examine one of the most used ones. The option sel shows

System Event Log, one of the key commands you will need to use.

# ipmitool selSEL InformationVersion          : 2.0 (v1.5, v2 compliant)Entries          : 96Free Space       : 14634 bytesPercent Used     : 9%Last Add Time    : 02/27/2011 20:23:44Last Del Time    : Not AvailableOverflow         : falseSupported Cmds   : 'Reserve' 'Get Alloc Info' # of Alloc Units : 909Alloc Unit Size  : 18# Free Units     : 813Largest Free Blk : 813Max Record Size  : 18The output is summary only. To know the details of the Event Log, you can use an additional parameter: list.

# ipmitool sel list   1 | 01/21/2011 | 07:05:39 | System ACPI Power State #0x26 | S5/G2: soft-off | Asserted   2 | 01/21/2011 | 08:59:43 | System Boot Initiated | System Restart | Asserted   3 | 01/21/2011 | 08:59:44 | Entity Presence #0x54 | Device Present   4 | 01/21/2011 | 08:59:44 | System Boot Initiated | Initiated by hard reset | Asserted   5 | 01/21/2011 | 08:59:44 | System Firmware Progress | Memory initialization | Asserted   6 | 01/21/2011 | 08:59:44 | System Firmware Progress | Primary CPU initialization | Asserted   7 | 01/21/2011 | 08:59:49 | Entity Presence #0x58 | Device Present   8 | 01/21/2011 | 08:59:52 | Entity Presence #0x57 | Device Present   9 | 01/21/2011 | 08:59:53 | System Boot Initiated | Initiated by warm reset | Asserted   a | 01/21/2011 | 08:59:53 | System Firmware Progress | Memory initialization | Asserted   b | 01/21/2011 | 08:59:53 | System Firmware Progress | Primary CPU initialization | Asserted   c | 01/21/2011 | 08:59:54 | System Boot Initiated | Initiated by warm reset | Asserted   d | 01/21/2011 | 08:59:55 | System Firmware Progress | Memory initialization | Asserted   e | 01/21/2011 | 08:59:55 | System Firmware Progress | Primary CPU initialization | Asserted   f | 01/21/2011 | 09:00:01 | Entity Presence #0x55 | Device Present... truncated ...

Page 29: Exadata

The output has been shown partially to conserve space. This is one of the key commands you should be aware of. In a troubleshooting episode, you should check the system even log to make sure the components have not failed. If they did, of course, you would have to replace them before going further. If you get a clean bill of health from IPMITOOL, you should go to the next step of making sure you have no issues with the cluster, then no issues with the RAC database and so on.

ASMCMD ToolYou can manage the ASM instance in three different ways:

SQL – traditional SQL commands are enough for ASM but may not be the best for scripting and quick checks such as checking for free space

ASMCMD – a command line tool for the ASM operations. It’s very user-friendly, especially for the SysAdmin-turned-DMA since it does not need knowledge of SQL

ASMCA – ASM Configuration Assistant; has limited functionality

Of these, ASMCMD is the most widely used. Let’s see how it works. You invoke the tool by executing asmcmd at the linux command prompt.

$ asmcmd –pThe -p parameter merely shows the current directory in the prompt. At the ASMCMD prompt, you can enter the commands. To now the free space in diskgroups, you issue the lsdg command.

ASMCMD [+PRORECO] > lsdg

State    Type    Rebal  Sector  Block       AU  Total_MB   Free_MB  Req_mir_free_MB  Usable_file_MB 

Offline_disks  Voting_files  Name

MOUNTED  NORMAL  N         512   4096  4194304   4175360   4172528           379578        

1896475              0             N  DBFS_DG/

MOUNTED  NORMAL  N         512   4096  4194304  67436544  64932284          6130594       

29400845              0             N  PRODATA/

MOUNTED  HIGH    N         512   4096  4194304  23374656  21800824          4249936        

5850296              0             Y  PRORECO/

The command such as ls and cd works just like their namesakes in the Linux world.

ASMCMD [+] > lsDBFS_DG/PRODATA/PRORECO/ASMCMD [+] > cd PRORECO

To know the space consumed by each file, you issue ls –ls command.

ASMCMD [+PRORECO/PROLRD/ONLINELOG] > ls -lsType       Redund  Striped  Time             Sys  Block_Size   Blocks       Bytes        Space  NameONLINELOG  HIGH    COARSE   MAR 31 19:00:00  Y           512  8388609  4294967808  12910067712  group_1.257.744724579ONLINELOG  HIGH    COARSE   MAR 31 19:00:00  Y           512  8388609  4294967808  12910067712  group_xx.277.744725199ONLINELOG  HIGH    COARSE   MAR 31 19:00:00  Y           512  8388609  4294967808  12910067712  group_11.278.744725207ONLINELOG  HIGH    COARSE   MAR 31 19:00:00  Y           512  8388609  4294967808  12910067712  group_12.279.744725215ONLINELOG  HIGH    COARSE   MAR 31 19:00:00  Y           512  8388609  4294967808  12910067712  group_13.270.744725161ONLINELOG  HIGH    COARSE   MAR 31 19:00:00  Y           512  8388609  4294967808  12910067712  group_14.272.744725169… output truncated …

Page 30: Exadata

To get a complete listing of all the ASMCMD commands, use help.

ASMCMD [+] > help       commands:        --------        md_backup, md_restore        lsattr, setattr        cd, cp, du, find, help, ls, lsct, lsdg, lsof, mkalias        mkdir, pwd, rm, rmalias        chdg, chkdg, dropdg, iostat, lsdsk, lsod, mkdg, mount        offline, online, rebal, remap, umount        dsget, dsset, lsop, shutdown, spbackup, spcopy, spget        spmove, spset, startup        chtmpl, lstmpl, mktmpl, rmtmpl        chgrp, chmod, chown, groups, grpmod, lsgrp, lspwusr, lsusr        mkgrp, mkusr, orapwusr, passwd, rmgrp, rmusr        volcreate, voldelete, voldisable, volenable, volinfo        volresize, volset, volstatTo get help about a specific command, use help <Command> ASMCMD [+] > help chkdg.

        chkdg        Checks or repairs the metadata of a disk group.        chkdg [--repair] diskgroup        The options for the chkdg command are described below.        --repair        - Repairs the disk group.        diskgroup       - Name of disk group to check or repair.        chkdg checks the metadata of a disk group for errors and optionally         repairs the errors.        The following is an example of the chkdg command used to check and         repair the DATA disk group.        ASMCMD [+] > chkdg --repair data

Task(s) Command Category

Manage the operating system and servers – nodes as well as cells

Linux Commands such as shutdown fdisk, etc.

Power off and check status of components IPMITOOL (Linux Tool)

Manage the Exadata Storage Server and cell related command

CellCLI Tool

Manage multiple cells at one time DCLI

Manage ASM resources like diskgroup SQL commands (can be SQL*Plus) or ASMCMD

Manage Clusterware CRSCTL

Manage cluster components SRVCTL

Page 31: Exadata

Manage the database SQL commands (can be SQL*Plus)

Next StepsNow that you know the different categories of commands, you should know about the specific ones. In the next installment, will exclusively explore the commands in these two categories to complete your transition to a DMA.> Back to Series TOC 

Arup Nanda ([email protected]) has been an Oracle DBA for more than 14 years, handling all aspects of database administration, from performance tuning to security and disaster recovery. He is an Oracle ACE Director and was Oracle Magazine's DBA of the Year in 2003.

Oracle Exadata Commands Reference

Part 3: Storage Managementby Arup Nanda

A detailed explanation of the CellCLI and DCLI commands, all their parameters and options, and how they are used for managing storage cells> Back to Series TOC(The purpose of this guide is educational: to jump-start the transition from DBA to DMA and serve as a quick reference for

practicing DMAs. It is not intended to replace official Oracle-provided manuals or other documentation. The information in this

guide is not validated by Oracle, is not supported by Oracle, and should only be used at your own risk.)

In the previous two installments, you learned how the Oracle Exadata Database Machine works, about its various components, and the categories of commands applicable to each of those components. One such component is the storage cells, which manage the disks. For many DBAs transitioning into the role of DMAs, perhaps the biggest challenge is becoming comfortable with storage operations, something many DBAs never had to address.The storage cells in Exadata Database Machine are managed via two tools called CellCLI and DCLI. You had a glimpse of the CellCLI commands in action in Part 2, when configuring Exadata for the first time. In this installment, you will learn the detailed commands of these tools. In doing so, hopefully you will appreciate that it is not as difficult as you thought it would be.BasicsThe cellcli command is invoked from the Linux command line in the storage cells, not in the compute nodes.

Similar to SQL*Plus showing SQL> prompt, CellCLI show the CellCLI> prompt where you will enter the commands.

# cellcliCellCLI: Release 11.2.2.2.0 - Production on Fri Apr 01 10:39:58 EDT 2011Copyright (c) 2007, 2009, Oracle.  All rights reserved.Cell Efficiency Ratio: 372M

This will display the CellCLI prompt where you are expected to enter the commands.

CellCLI> … enter your commands here …

HELPThe first command you may want to enter is know about all the other commands available. Well, get some help:

CellCLI> help HELP [topic]   Available Topics:        ALTER

Page 32: Exadata

        ALTER ALERTHISTORY        ALTER CELL… output truncated …

If you want to get more context sensitive help on a specific command, 

CellCLI> help list

By the way, you don’t need to be in the CellCLI command prompt to give a command. If you know the command you want to give, you call it directly from Linux command prompt with the -e option. 

# cellcli -e help

Or, you can use the typical Linux type command terminators, as shown below:

[celladmin@prolcel14 ~]# cellcli << EOF> list cell> EOFCellCLI: Release 11.2.2.2.2 - Production on Sat May 14 16:28:43 EDT 2011Copyright (c) 2007, 2009, Oracle.  All rights reserved.Cell Efficiency Ratio: 160MCellCLI> list cell         prolcel14       online[celladmin@prolcel14 ~]#To exit from CellCLI, use the commands exit or quit.

Recording the OutputLike SQL*Plus, CellCLI also has a spool command that records the output to a file. Here is an example where you want to record the output to a file named mycellcli.txt

CellCLI> spool mycellcli.txt

To append to a file you already have, use append

CellCLI> spool mycellcli.txt append

Or, if you want to overwrite the file, use replace

CellCLI> spool mycellcli.txt replaceTo stop spooling, just issue SPOOL OFF.

What if you want to know what file the output is being written to? Use SPOOL without any argument. It will show the name of the file.

CellCLI> spool currently spooling to mycellcli.txt

Continuation CharacterYou will learn later in the article that you can issue complex queries in CellCLI. Some of these queries might span multiple lines. Since there is no command terminator like “;” in SQL, ending a line makes CellCLI to start interpreting it immediately. To make sure CellCLI understands the line extends beyond the current line, you can use the continuation character “-“, just as you would have done in SQL*Plus.

CellCLI> spool -> mycellcli.txt

Note the continuation prompt “>” above. However, unlike SQL*Plus, CellCLI does not show line numbers for the continued lines.

ScriptingWhen you want to execute a standard command in CellCLI, you can create a script file with the commands and execute it using STARTor @ commands. For instance, to execute a script called listdisks.cli, you would issue one

of the following:

CellCLI> @listdisks.cliCellCLI> start listdisks.cli

Page 33: Exadata

Note that the there is no restriction on the extension. I used cli merely as a convenience, it could have been

anything else.You can also use the routine Linux redirection command. Here is an example where you put all the

commands in the file mycellci.commands and you want to spool the results to mycellci.out. This is good for automatic monitoring systems.

# cellcli <mycellci.commands >mycellci.out

CommentsYou can put comments in scripts as well. Comments start with REM, REMARK or "--", as shown below:

REM This is a commentREMARK This is another comment-- This is yet another commentlist physicaldisk

EnvironmentYou can define two environmental settings

dateformat – to show the  format of the date when it is displayed.

CellCLI> set dateformat localDate format was set: Apr 1, 2011 3:35:17 PM.

CellCLI> set dateformat standardDate format was set: 2011-04-01T15:35:30-04:00.

echo – to toggle the display of commands within scripts.

CellCLI> set echo onCellCLI> set echo off

I am sure you can find resemblance of the control commands with SQL*Plus, a tool you are undoubtedly familiar with. The learning curve should not be tat difficult. Now that you know about the basic commands, let’s dive down to more advanced ones.General StructureThe CellCLI commands have the following general structure:

<Verb> <Object> <Modifier> <Filter>

A verb is what you want or the action, e.g. display something.

An object is what you want the action on, e.g. a disk.

A modifier (optional) shows how you want to operation to be modified, e.g. all disks, or a specific disk, etc.

A filter (optional) is similar to the WHERE predicate of a SQL statement, used with WHERE clause.

There are only a few primary verbs you will use mostly and need to remember. They are:

LIST – to show something, e.g. disks, statistics, Resource Manager Plans, etc.

CREATE – to create something, e.g. a cell disk, a threshold

ALTER – to change something that has been established, e.g. change the size of a disk

DROP – to delete something, e.g. a dropping a disk

DESCRIBE – to display the various attributes of an object

There are more verbs but these five are the most common ones among the CellCLI commands.Let’s see how they are used in common operations. In the following sections you will learn how to perform the management operations – View, Creation, Deletion and Modification – of various components of the storage, e.g. Physical Disks, Grid Disks, LUN, etc.

Page 34: Exadata

Physical Disk OperationsStorage cell is all about disks. As you learned in first two installments, each storage cell has 12 physical disks. To display the physical disks in this particular cell, you execute the following command:

CellCLI> list physicaldisk         34:0            E1K5JW                  normal         34:1            E1L9NC                  normal… output truncted …         [4:0:2:0]       5080020000f3e16FMOD2    normal         [4:0:3:0]       5080020000f3e16FMOD3    normal

You can notice that there is no heading above the output, making it difficult to understand what these values mean. Fortunately we can fix that as we will see later. For now let’s focus on something else: To know more detail about each disk, you can use the detailmodifier. It will show details on each disk. Here is a partial output:

CellCLI> list physicaldisk detail         name:                   34:0         deviceId:               33         diskType:               HardDisk         enclosureDeviceId:      34         errMediaCount:          0         errOtherCount:          0         foreignState:           false         luns:                   0_0         makeModel:              "SEAGATE ST360057SSUN600G"         physicalFirmware:       0805         physicalInsertTime:     2011-01-21T14:32:35-05:00         physicalInterface:      sas         physicalSerial:         XXXXXX         physicalSize:           558.9109999993816G         slotNumber:             0         status:                 normal         name:                   34:1         deviceId:               32         diskType:               HardDisk         enclosureDeviceId:      34         errMediaCount:          0         errOtherCount:          0         foreignState:           false         luns:                   0_1         makeModel:              "SEAGATE ST360057SSUN600G"         physicalFirmware:       0805         physicalInsertTime:     2011-01-21T14:32:40-05:00         physicalInterface:      sas         physicalSerial:         XXXXXX         physicalSize:           558.9109999993816G         slotNumber:             1         status:                 normal… output truncated …

The output shows the details of every disk, making it somewhat unreadable. Many times when you encounter issues with specific disks you may want to find the details of a particular disk. For that, use the name of the disk as another modifier. The name is shown in both types of commands, normal and detail.

CellCLI> list physicaldisk 34:0CellCLI> list physicaldisk 34:0 detail

AttributesWhile the above command is useful for reading, it lacks a few things.

You still don’t get a lot of other details. For instance the RPM of the disk is not shown.

Page 35: Exadata

The results are not in a tabular format, if you want to see more than one disk.

To get these details you can specify named “attributes” in the listing. You can specify these attributes after the modifier attribute. In this example, we have type of the disk (hard disk, or flash disk), model, RPM, Port, and status:

CellCLI> list physicaldisk attributes name, disktype, makemodel, physicalrpm, physicalport, status         34:0            HardDisk        "SEAGATE ST360057SSUN600G"      normal         34:1            HardDisk        "SEAGATE ST360057SSUN600G"      normal… output truncated …         [4:0:2:0]       FlashDisk       "MARVELL SD88SA02"              normal         [4:0:3:0]       FlashDisk       "MARVELL SD88SA02"              normal

This is a much better output that shows the details against each disk. This type of output is quite useful in scripts where you may want to pull the details and format them in certain pre-specified manner or parse them for further processing.

DescribeWhile you may appreciate the above output where using attributes specifically increased the readability of the output, you may also wonder what the valid names of these attributes are. The attribute names vary with the object as well. (For example, diskType is relevant in case of disks only, not in cells.)

Do you need to remember all of them and the context in which they are relevant? Not at all. That’s where a different verb DESCRIBE comes handy; it shows the attributes of an object, similar to what the describe command in SQL*Plus command does for a table to display the columns. Here is how you display the attributes of the physicaldisk object. Remember, unlike SQL*Plus, the command isdescribe; it cannot be shortened to desc. 

CellCLI> describe physicaldisk        name        ctrlFirmware        ctrlHwVersion        deviceId        diskType        enclosureDeviceId        errCmdTimeoutCount        errHardReadCount        errHardWriteCount        errMediaCount        errOtherCount        errSeekCount        errorCount        foreignState        hotPlugCount        lastFailureReason        luns        makeModel        notPresentSince        physicalFirmware        physicalInsertTime        physicalInterface        physicalPort        physicalRPM        physicalSerial        physicalSize        physicalUseType        sectorRemapCount        slotNumber        status

Page 36: Exadata

What if you want to display all the attributes of the physical disks; not just a few? You can list all the attribute names explicitly; or – as an easier way – you can just use the option all, as shown below.

CellCLI> list physicaldisk attributes all

         34:0            33              HardDisk        34                      0      

0                               false   0_0                     "SEAGATE ST360057SSUN600G"     

0805                    2011-01-21T14:32:35-05:00       sas     E1K5JW  558.9109999993816G     

0       normal

         34:1            32              HardDisk        34                      0      

0                               false   0_1                     "SEAGATE ST360057SSUN600G"   

0805                    2011-01-21T14:32:40-05:00       sas     E1L9NC  558.9109999993816G     

1       normal

… output truncated …

         [1:0:0:0]       FlashDisk       4_0             "MARVELL SD88SA02"      D20Y    2011-01-

21T14:33:32-05:00       sas     5080020000f21a2FMOD0    22.8880615234375G               "PCI Slot:

4; FDOM: 0"  normal

         [1:0:1:0]       FlashDisk       4_1             "MARVELL SD88SA02"      D20Y    2011-01-

21T14:33:32-05:00       sas     5080020000f21a2FMOD1    22.8880615234375G      

… output truncated …

This output may not be very useful for reading but if you want to write a script to parse these details and take corrective action, it becomes extremely useful. However, in many cases you may want to see only a certain attributes; not all. In the previous subsection you saw how to select only a few of these attributes.

Checking for ErrorsLet’s see a practical use of the named attributes clause. From time to time you will see errors popping up on disk drives, which may have to be replaced. To show the error related attributes, you may choose a script to select only a few attributes related to errors, as shown below:

CellCLI> list physicaldisk attributes name,disktype,errCmdTimeoutCount,errHardReadCount,errHardWriteCount         34:0            HardDisk… output truncated …         34:11           HardDisk         [1:0:0:0]       FlashDisk… output truncated …         [4:0:3:0]       FlashDisk

There is no error on any of these disks; so you see those fields remaining unpopulated.

FilteringWhat if you are interested in only a certain type of disk, or filter on some attribute? You can use the SQL-esque predicate WHERE clause. Here you want to see some attributes for all hard disks.

CellCLI> list physicaldisk attributes name, physicalInterface, physicalInsertTime -> where disktype = 'HardDisk'         34:0    sas     2011-01-21T14:32:35-05:00         34:1    sas     2011-01-21T14:32:40-05:00         34:2    sas     2011-01-21T14:32:45-05:00         34:3    sas     2011-01-21T14:32:50-05:00         34:4    sas     2011-01-21T14:32:55-05:00         34:5    sas     2011-01-21T14:33:01-05:00         34:6    sas     2011-01-21T14:33:06-05:00         34:7    sas     2011-01-21T14:33:11-05:00         34:8    sas     2011-01-21T14:33:16-05:00         34:9    sas     2011-01-21T14:33:21-05:00         34:10   sas     2011-01-21T14:33:26-05:00         34:11   sas     2011-01-21T14:33:32-05:00

Page 37: Exadata

If you want to change the way the date and time are displayed, you can set the dateformat environmental

variable.CellCLI> set dateformat localDate format was set: Apr 1, 2011 4:05:54 PM.CellCLI> list physicaldisk attributes name, physicalInterface, physicalInsertTime -

> where disktype = 'HardDisk'         34:0    sas     Jan 21, 2011 2:32:35 PM         34:1    sas     Jan 21, 2011 2:32:40 PM… output truncated …Filtering can also be specified with negation, i.e. !=.

CellCLI> list physicaldisk where diskType='Flashdisk'            [1:0:0:0]       5080020000f21a2FMOD0    normal         [1:0:1:0]       5080020000f21a2FMOD1    normal… output truncated …CellCLI> list physicaldisk where diskType!='Flashdisk'         34:0    E1K5JW  normal         34:1    E1L9NC  normal… output truncated …

ModificationsNow that you know how to display the objects, let’s see how you can modify the properties. The properties of the physical disks are not modifiable, except one: the display of service LED. You can turn on or off the service LED of disks 34:0 and 34:1 by issuing the following commands.

CellCLI> alter physicaldisk  34:0,34:1 serviceled onCellCLI> alter physicaldisk  34:0,34:1 serviceled off

To turn service LED on all the hard disks of that cell, use the following command:

CellCLI> alter physicaldisk harddisk serviceled on

On all disks (hard disks and flash disks):

CellCLI> alter physicaldisk all serviceled on

CreationThere is no creation operation of physical disks because they come with the Exadata machine.

DeletionThere is no deletion operation of physical disks either because they have to be removed by an engineer.

Managing LUNFrom the earlier installments you learned that the physical disks are carved up into LUNs. Let’s see the common LUN management commands. To show the LUNs in this cell, you use the following command:

CellCLI> list lun         0_0     0_0     normal         0_1     0_1     normal         0_2     0_2     normal… output truncated …

As in the case of physical disks, you can also display the details of these LUNs by the following command:

CellCLI> list lun detail         name:                   0_0         cellDisk:               CD_00_cell01         deviceName:             /dev/sda         diskType:               HardDisk         id:                     0_0         isSystemLun:            TRUE

Page 38: Exadata

         lunAutoCreate:          FALSE         lunSize:                557.861328125G         lunUID:                 0_0         physicalDrives:         34:0         raidLevel:              0         lunWriteCacheMode:      "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU"         status:                 normal         name:                   0_1         cellDisk:               CD_01_cell01… output truncated …

If you want the details for one specific LUN, use its name as the modifier.

CellCLI> list lun 0_0 detail

Like the previous cases you can show the LUNs in a tabular format by selecting all the attributes as well:

CellCLI> list lun attributes all

         0_0     CD_00_cell01    /dev/sda        HardDisk        0_0     TRUE    FALSE  

557.861328125G          0_0     34:0            0       "WriteBack, ReadAheadNone, Direct, No Write

Cache if Bad BBU"   normal

… output truncated …

         0_11    CD_11_cell01    /dev/sdl        HardDisk        0_11    FALSE   FALSE  

557.861328125G          0_11    34:11           0       "WriteBack, ReadAheadNone, Direct, No Write

Cache if Bad BBU"   normal

         1_0     FD_00_cell01    /dev/sdq        FlashDisk       1_0     FALSE   FALSE  

22.8880615234375G       100.0   [2:0:0:0]       normal

… output truncated …

         5_3     FD_15_cell01    /dev/sdx        FlashDisk       5_3     FALSE   FALSE  

22.8880615234375G       100.0   [3:0:3:0]       normal

If you want to see the available attributes of a LUN, use the describe command.

CellCLI> describe LUN        name        cellDisk        deviceName        diskType        errorCount        id        isSystemLun        lunAutoCreate        lunSize        lunUID        overProvisioning        physicalDrives        raidLevel        lunWriteCacheMode        status

You can use any of these attributes instead of “all”.

CellCLI> list lun attributes name, cellDisk, raidLevel, status         0_0     CD_00_prolcel14         0       normal         0_1     CD_01_prolcel14         0       normal         0_2     CD_02_prolcel14         0       normal… output truncated …

ModificationThe properties of a LUN are not modifiable, except one. The LUNs are re-enabled after a failure. If one is not automatically re-enabled after the failure is corrected, you can manually re-enable it by the following command:

Page 39: Exadata

CEllCLI> alter lun 0_0 reenable

Sometimes the LUN may be already enabled but may not be shown as such. To resync this reality with the system, you may need to enable it forceably. The force modifier allows that.

CEllCLI> alter lun 0_0 reenable force

DeletionThere is no delete operation for LUNs in the CellCLI.

CreationSimilarly you don’t create LUNs in CellCLI so there is no create lun command.Managing CellsFrom the previous installments you learned that storage cells, also called simply cells, are where the disks are located and they are presented to the database nodes. A full rack of Exadata contains fourteen such cells. To display the information on cells, you give the list cell command. 

CellCLI> list cell         Cell01       online

The output does not say much except the name of the cell and if it is online. To display more information, use the detail modifier.

CellCLI> list cell detail         name:                   cell01         bmcType:                IPMI         cellVersion:            OSS_11.2.0.3.0_LINUX.X64_101206.2         cpuCount:               24         fanCount:               12/12         fanStatus:              normal         id:                     XXXXXXXXXX         interconnectCount:      3         interconnect1:          bondib0         iormBoost:              0.0         ipaddress1:             172.32.128.9/24         kernelVersion:          2.6.18-194.3.1.0.3.el5         locatorLEDStatus:       off         makeModel:              SUN MICROSYSTEMS SUN FIRE X4270 M2 SERVER SAS         metricHistoryDays:      7         notificationMethod:     mail         notificationPolicy:     critical,warning,clear         offloadEfficiency:      372.1M         powerCount:             2/2         powerStatus:            normal         smtpFrom:               "Proligence Test Database Machine"         smtpFromAddr:           [email protected]         smtpPort:               25         smtpPwd:                ******         smtpServer:             standardrelay.proligence.com         smtpToAddr:             [email protected],[email protected],[email protected]         smtpUser:                        smtpUseSSL:             FALSE         status:                 online         temperatureReading:     24.0         temperatureStatus:      normal         upTime:                 32 days, 4:44         cellsrvStatus:          running

Page 40: Exadata

         msStatus:               running         rsStatus:               running

You will understand the meaning of all these attributes as you explore the cell modifications later in this section.To display the output in a tabular format in one line, as you saw in the case of the physical disks, use the attribute modifier:

CellCLI> list cell attributes all

         cell01  IPMI    OSS_11.2.0.3.0_LINUX.X64_101206.2       24      12/12   normal 

XXXXXXXX      3       bondib0         0.0     172.32.128.9/24         2.6.18-194.3.1.0.3.el5 

off     SUN MICROSYSTEMS SUN FIRE X4270 M2 SERVER SAS   7       mail    critical,warning,clear 

372.1M  2/2     normal  "Proligence Test Database Machine"   [email protected]  25     

******  standardrelay.proligence.com       

[email protected],[email protected],[email protected]               

FALSE   online  25.0    normal  32 days, 21:52  running         running         running

The output comes as one line containing all the attributes. Of course, you can use specific attributes as you did in other cases. To see all the available attributes, use the previously explained describe command.CellCLI> describe cell        name                    modifiable        bmcType        cellNumber              modifiable        cellVersion        comment                 modifiable        cpuCount        emailFormat             modifiable        events                  modifiable        fanCount        fanStatus        id        interconnectCount        interconnect1           modifiable        interconnect2           modifiable        interconnect3           modifiable        interconnect4           modifiable        iormBoost        ipBlock                 modifiable        ipaddress1              modifiable        ipaddress2              modifiable        ipaddress3              modifiable        ipaddress4              modifiable        kernelVersion        locatorLEDStatus        location                modifiable        makeModel        metricCollection        modifiable        metricHistoryDays       modifiable        notificationMethod      modifiable        notificationPolicy      modifiable        offloadEfficiency        powerCount        powerStatus        realmName               modifiable        smtpFrom                modifiable        smtpFromAddr            modifiable        smtpPort                modifiable        smtpPwd                 modifiable        smtpServer              modifiable        smtpToAddr              modifiable        smtpUser                modifiable

Page 41: Exadata

        smtpUseSSL              modifiable        snmpSubscriber          modifiable        status        temperatureReading        temperatureStatus        traceLevel              modifiable        upTime        cellsrvStatus        msStatus        rsStatus

ModificationThe word modifiable above means that specific attribute can be modified by an ALTER verb. It’s not possible to give all the possible combinations in this short article and not all are used often anyway; so, let’s see the most common ones.

A cell runs muliple services, e.g. the Restart Server, the Management Server and Cell Services. To shut down a cell, you can shut down specific services by name or the entire service. To shutdown only one service, e.g. the Restart Server service, execute:

CellCLI> alter cell shutdown services rsStopping RS services... The SHUTDOWN of RS services was successful.To restart that particular service, use the restart modifier, shown below:CellCLI> alter cell restart services rsStopping RS services... CELL-01509: Restart Server (RS) not responding.Starting the RS services...Getting the state of RS services...  Running

At any point you can confirm it by checking the status:

CellCLI> list cell attributes rsStatus         running

To shutdown the Management Server, execute:

CellCLI> alter cell shutdown services MS

To shutdown the Cell Services, execute:

CellCLI> alter cell shutdown services CELLSRV

What if you want to shutdown the entire cell? The “all” modifier is a shotcut to shutdown all the services:

CellCLI> alter cell shutdown services allStopping the RS, CELLSRV, and MS services...The SHUTDOWN of services was successful.

To restart all the services, execute the following command:

CellCLI> alter cell restart services allStarting the RS, CELLSRV, and MS services...Getting the state of RS services...  runningStarting CELLSRV services...The STARTUP of CELLSRV services was successful.Starting MS services...The STARTUP of MS services was successful.

To turn on or off the LED on the chassis of the cell, execute:

Page 42: Exadata

CellCLI> alter cell led onCellCLI> alter cell led off

Mail SetupThe cell can communicate its status by sending emails. So configuration of email is vital for the proper monitoring of cells. In this section you will learn how to do that. First, verify the status of the email configuration in the cell.

CellCLI> alter cell validate mail

CELL-02578: An error was detected in the SMTP configuration: CELL-05503: An error was detected

during notification. The text of the associated internal error is: Could not connect to SMTP host:

standardrelay.proligence.com, port: 25, response: 421. 

The notification recipient is [email protected].

CELL-05503: An error was detected during notification. The text of the associated internal error is:

Could not connect to SMTP host: standardrelay.proligence.com, port: 25, response: 421. 

The notification recipient is [email protected].

CELL-05503: An error was detected during notification. The text of the associated internal error is:

Could not connect to SMTP host: standardrelay.proligence.com, port: 25, response: 421. 

The notification recipient is [email protected].

Please verify your SMTP configuration.

The above output shows that the mail setup has not been done properly, meaning the cell will not be able to send accurate mails required for monitoring. Let’s configure the mail correctly. 

First, let’s configue the address and name of the sender, i.e. the Cell 07 (this cell). We will give it an address of [email protected]. That address may or may not actually exist in the mail server; it’s not important. When the emails come, this address will be shown as the sender. We will also give a name to the sender, i.e. “Exadata Cell 07”, which helps us indetify the sender in the email.CellCLI> alter cell smtpfromaddr='[email protected]'Cell cell07 successfully alteredCellCLI> alter cell smtpfrom='Exadata Cell 07'Cell cell07 successfully altered

Then, we configure the address the cell should the emails to. This is typically the on-call DBA or DMA. You can configure more than one address as well.

CellCLI> alter cell smtptoaddr='[email protected]'Cell cell07 successfully altered

The cell can send he mails as text or html, which can be configured as shown below.

CellCLI> alter cell emailFormat='text'Cell cell07 successfully alteredCellCLI> alter cell emailFormat='html'Cell cell07 successfully altered

With all these in place, let’s make sure the email works. You can validate the email setup by executing the following:

CellCLI> alter cell validate mailCell cell07 successfully altered

If the setup is correct, you will receive a mail from [email protected] (The email you configured as the sender). Here is the body of that email.

This test e-mail message from Oracle Cell cell07 indicates successful configuration of your e-mail address and mail server.

You may wonder what type of emails the cell sends out. Here is an email sent by the cell when there is a hardware failure:

Page 43: Exadata

 

And here is the email on another failure of the cell disk as a result of flashdisk failure.

Page 44: Exadata

   

BMC ManagementBMC, or Baseboard Management Controller, controls the compoments of the cell. The BMC should be running all the time. You can restart it, if needed by executing the command below. The comamnd also reboots the BMC if it is already running.

CellCLI> alter cell restart bmc

To make sure the BMC sends the alerts to the cell so that they show up as alerts, execute 

CellCLI> alter cell configurebmcCell Cell07 BMC configured successfully

The cells can also send SNMP traps. Any monitoring system based on SNMP traps can receive and process these SNMP traps to show the statistics on cells. To validate the SNMP configuration to be use dfor Automatic Service Requests (ASRs), use the following command:

CellCLI> alter cell validate snmp type=ASRCell cell01 successfully altered

To enable automatic service request generation by the SNMP, youneed to define a subscriber. Here is an example:

CellCLI> alter cell snmpsubscriber=((host='snmp01.proligence.com,type=ASR'))Cell cell01 successfully altered

You may want to validate the firmware of the cell at any time by executing

CellCLI> alter cell validate configuration                    Cell cell07 successfully altered

Finally, if you want to reset the cell to its factory settings, use:

CellCLI> drop cell

If you have defined griddisks in this cell, you have to drop them first. Otherwise the following forces them to be dropped.

Page 45: Exadata

CellCLI> drop cell forceRemember, the drop cell command removes the sell related properties of the server; it does not actually

remove the physical server.CreationThe cells are created in the beginning of the project, and is usually done via the automatic installation script. Later, you will not likely use this command but it’s shown here briefly for the sake of completeness. The command is create cell. Here is the help on the command:

CellCLI> help create cell  Usage: CREATE CELL [<cellname>] [realmname=<realmvalue>,]                 [interconnect1=<ethvalue>,] [interconnect2=<ethvalue>,]                 [interconnect3=<ethvalue>,] [interconnect4=<ethvalue>,]               ( ( [ipaddress1=<ipvalue>,] [ipaddress2=<ipvalue>,]                   [ipaddress3=<ipvalue>,] [ipaddress4=<ipvalue>,] )               | ( [ipblock=<ipblkvalue>, cellnumber=<numvalue>] ) )  Purpose: Configures the cell network and starts services.  Arguments:    <cellname>: Name to be assigned to the cell.                Uses hostname if nothing is specified    <realmvalue>: Name of the realm this cell belongs to.    <ethvalue>: Value of the eth interconnect assigned to this network interconnect.    <ipvalue>: Value of the IP address to be assigned to this network interconnect.    <ipblkvalue>: Value of the IP block to determine IP address for the network.    <numvalue>: Number of the cell in this ip block.Managing Cell DisksEach physical disk in the cell is further exposed as cell disks. To show all the cell disks in this cell, use the list celldisk command as shown below.

CellCLI> list celldisk          CD_00_prolcel14         normal         CD_01_prolcel14         normal         CD_02_prolcel14         normal… output truncated …

This is a rather succinct output without much useful detail. The detail modifier allows display of more detailed information on the cell disks.

CellCLI> list celldisk detail         name:                   CD_00_prolcel14         comment:                         creationTime:           2011-04-27T15:11:27-04:00         deviceName:             /dev/sda         devicePartition:        /dev/sda3         diskType:               HardDisk         errorCount:             0         freeSpace:              0         id:                     xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx         interleaving:           none         lun:                    0_0         raidLevel:              0         size:                   1832.59375G         status:                 normal         name:                   CD_01_prolcel14… output truncated …

This output shows all the cell disks one after the other, which makes up a rather long and unreadable list. If you would rather want to show the details in a tabular format with one line per cell disk, use the attribute all modifier shown below:

Page 46: Exadata

CellCLI> list celldisk attributes all

         CD_00_cell01            Mar 1, 2011 6:20:45 PM          /dev/sda        /dev/sda3      

HardDisk        0       0       xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx    none   0_0     

0               528.734375G     normal

… output truncated …

         FD_00_cell01            Jan 21, 2011 5:07:32 PM         /dev/sdq        /dev/sdq       

FlashDisk       0       0       xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx    none   1_0     

22.875G         normal

To know the columns, or rather the attributes shown here, use the describe command to display the attributes of the cell disk.

CellCLI> describe celldisk        name                    modifiable        comment                 modifiable        creationTime        deviceName        devicePartition        diskType        errorCount        freeSpace        freeSpaceMap        id        interleaving        lun        physicalDisk        raidLevel        size        status

The output columns are displayed in the same order. You can select only a few attributes from this list instead of all. For instance, let’s see only name of the cell disk and the corresponding Linux partition name.

CellCLI> list celldisk attributes name, devicePartition         CD_00_prolcel14         /dev/sda3         CD_01_prolcel14         /dev/sdb3… output truncated …

Similarly you can use the attributes for filtering the output as well. The following shows an example where you wanted to see all cell disks larger than 23GB.

CellCLI> list celldisk attributes name, devicePartition where size>23G         CD_00_prolcel14         /dev/sda3         CD_01_prolcel14         /dev/sdb3         CD_02_prolcel14         /dev/sdc         CD_03_prolcel14         /dev/sdd         CD_04_prolcel14         /dev/sde         CD_05_prolcel14         /dev/sdf         CD_06_prolcel14         /dev/sdg         CD_07_prolcel14         /dev/sdh         CD_08_prolcel14         /dev/sdi         CD_09_prolcel14         /dev/sdj         CD_10_prolcel14         /dev/sdk         CD_11_prolcel14         /dev/sdl

ModificationOnly two attributes of the cell disks are changeable: the comment and the name.

Suppose you want to add a comment “Flash Disk” to the cell disk FD_00_cell01. You need to execute:

CellCLI> alter celldisk FD_00_cell01 comment='Flash Disk'    

Page 47: Exadata

  If you want to make the change to comments on all the hard disks, you would issue:

CellCLI> alter celldisk all harddisk comment=’Hard Disk’CellDisk CD_00_cell01 successfully alteredCellDisk CD_01_cell01 successfully altered… output truncated …

Similarly if you want to change the comment on all the flash disks:

CellCLI> alter celldisk all flashdisk comment='Flash Disk'

DeletionThis command is rarely used but it may be needed when cell disks fail and you want to drop the cell disks and create them fresh. Here is how to drop the cell disk named CD_00_cell01:

CellCLI> drop celldisk CD_00_cell01

If the cell disk contains grid disks, the drop command will fail. Either drop the grid disks (described in the next section) or use the force option.

CellCLI> drop celldisk CD_00_cell01 force

You can also drop all cell disks of certain types, e.g. hard disks or flash disks.

CellCLI> drop celldisk harddiskCellCLI> drop celldisk flashdisk

Or, drop them all:

CellCLI> drop celldisk allManaging Grid DisksIn the previous installments you learned that grid disks are carved out of the cell disks and the grid disks are presented to the ASM instance as disks, which are eventually used to build ASM diskgroups.

CellCLI> list griddisk         DBFS_DG_CD_02_prolcel14         active         DBFS_DG_CD_03_prolcel14         active         DBFS_DG_CD_04_prolcel14         active… output truncated …

Or you can get the details of a specific grid disk:

CellCLI> list griddisk DBFS_DG_CD_02_cell01 detail         name:                   DBFS_DG_CD_02_cell01         availableTo:                     cellDisk:               CD_02_cell01         comment:                         creationTime:           2011-03-01T18:21:41-05:00         diskType:               HardDisk         errorCount:             0         id:                     xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx         offset:                 528.734375G         size:                   29.125G         status:                 active

As with the previous cases, you can also see:

CellCLI> describe griddisk        name                    modifiable        availableTo             modifiable        cellDisk        comment                 modifiable        creationTime        diskType

Page 48: Exadata

        errorCount        id        offset        size                    modifiable        status

You can use these keywords to display specific – not all – attributes of the grid disks. Here is a command to display the name, the cell disks, and the type of the disk:

CellCLI> list griddisk attributes name,cellDisk,diskType                DBFS_DG_CD_02_prolcel14         CD_02_prolcel14         HardDisk         DBFS_DG_CD_03_prolcel14         CD_03_prolcel14         HardDisk         DBFS_DG_CD_04_prolcel14         CD_04_prolcel14         HardDisk… output truncated …

You could have also used the following to show all the attributes:

CellCLI> list griddisk attributes all

If you want to show details for a certain type of attribute only, you can use the filter before the attributes modifier. The following shows the command to show all the grid disks of size  476.546875GB:

CellCLI> list griddisk attributes name,cellDisk,status where size=476.546875G           PRORECO_CD_00_prolcel14         CD_00_prolcel14         active         PRORECO_CD_01_prolcel14         CD_01_prolcel14         active         PRORECO_CD_02_prolcel14         CD_02_prolcel14         active         PRORECO_CD_03_prolcel14         CD_03_prolcel14         active … output truncated …

ASM StatusThe describe command does not show two important attributes:

ASMModeStatus – whether a current ASM diskgroup is using this griddisk. A value of ONLINE indicates this grid disk is being used.

ASMDeactivationOutcome – recall that grid disks can be deactivated, which is effectively taking them offline. Since ASM mirroring ensures that the data is located on another disk, making this disk offline does not lose data. However, if the mirror is offline, or is not present, then making this grid disk offline will result in loss of data. This attribute shows whether the grid disk can be deactivated without loss of data. A value of “Yes” indicates you can deactivate this grid disk without data loss.

CellCLI> list griddisk attributes name, ASMDeactivationOutcome, ASMModeStatus          DBFS_DG_CD_02_cell01    Yes     ONLINE         DBFS_DG_CD_03_cell01    Yes     ONLINE         DBFS_DG_CD_04_cell01    Yes     ONLINE … output truncated …

ModificationOnly name, comment, Available To, status and size are modifiable. Let’s see how to change the comment for a specific grid disk:

CellCLI> alter griddisk PRORECO_CD_11_cell01 comment='Used for Reco';         GridDisk PRORECO_CD_11_cell01 successfully altered

You can change the comment for all the grid disks of a cetain time, e.g. all hard disks:

CellCLI> alter griddisk all harddisk comment='Hard Disk';GridDisk DBFS_DG_CD_02_cell01 successfully alteredGridDisk DBFS_DG_CD_03_cell01 successfully altered… output truncated …

Making a grid disk inactive effectively offlines its associated ASM disk.

CellCLI> alter griddisk PRORECO_CD_11_cell01 inactiveGridDisk PRORECO_CD_11_cell01 successfully altered

Page 49: Exadata

In the process of doing so, you can use force, which forces the ASM disk to become offline even there is data

on that disk.

CellCLI> alter griddisk PRORECO_CD_11_cell01 inactive force

The command will wait until the ASM disk becomes offline. If you want the prompt to come back immediately without waiting, you can use nowait clause.

CellCLI> alter griddisk PRORECO_CD_11_cell01 inactive nowait

You can make it active again:

CellCLI> alter griddisk PRORECO_CD_11_cell01 active   GridDisk PRORECO_CD_11_cell01 successfully altered

DeletionYou rarely have to drop grid disks but some situations may warrant it (if you want to swap disks, for example.) You can do that by dropping the old one and creating a new one. Here is how you drop a specific named grid disk.

CellCLI> drop griddisk DBFS_DG_CD_02_prolcel14

Sometimes you may want to drop a number of grid disks. You can drop all grid disks with a specific prefix in their name, e.g. DBFS.

CellCLI> drop griddisk prefix=DBFS

Or, you can drop all the grid disks from this cell:

CellCLI> drop griddisk all

Sometimes you may want to drop all the grid disks of only a specific type – hard disks or flash disks. You will then execute one of the following:

CellCLI> drop griddisk flashdiskCellCLI> drop griddisk harddisk

If the disk is active, it won’t be dropped. Here is an example:

CellCLI> drop griddisk PRORECO_CD_11_cell01;CELL-02549: Grid disk is in use and FORCE is not specified for the operation.In that case, you can force drop the disk using the force modifier.

CellCLI> drop griddisk PRORECO_CD_11_cell01 forceGridDisk PRORECO_CD_11_cell01 successfully dropped

CreationRarely will you need to create grid disks but you may need to for the same reason you drop them. The create griddisk command does it. Here is an example showing how to create a grid disk from a specific cell disk.

CellCLI> create griddisk PRORECO_CD_11_cell01 celldisk=CD_11_cell01 GridDisk PRORECO_CD_11_cell01 successfully created

If you want to create a grid disk in each cell disk, you can create them individually, or you can use a shortcut:

CellCLI> create griddisk all prefix PRORECO

This will create a grid disk in the naming convention <Prefix>_<Cell Disk Name>, which is usually the name of the database. But you can also use the type of the disk – hard or flash as a prefix instead of the database name.

CellCLI> create griddisk all flashdisk prefix FLASH

You can optionally specify a size, which can be less than or equal to that of the celldisk.

CellCLI> create griddisk PRORECO_CD_11_cell01 celldisk=CD_11_cell01 size=100M

You may want to specify a smaller size to keep less data on the disks.

Managing Flash Disks

Page 50: Exadata

As you recall, each storage cell comes with several flash disk cards and they are presented to the cells as cell disks. Each cell has only one flash cache, which is made up of the various flash disks as the cell disks. To see the flash cache, use:

CellCLI> list flashcache         cell01_FLASHCACHE       normal

Of course the pithy output does not tell us much. The detail modifier takes care of that by displaying a lot more:

CellCLI> list flashcache detail

         name:                   cell01_FLASHCACHE

         cellDisk:              

FD_13_cell01,FD_00_cell01,FD_10_cell01,FD_02_cell01,FD_06_cell01,FD_12_cell01,FD_05_cell01,FD_08_cel

l01,FD_15_cell01,FD_14_cell01,FD_07_cell01,FD_04_cell01,

FD_03_cell01,FD_11_cell01,FD_09_cell01,FD_01_cell01

         creationTime:           2011-01-21T17:07:43-05:00

         degradedCelldisks:      

         effectiveCacheSize:     365.25G

         id:                     xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

         size:                   365.25G

         status:                 normal

The output shows various important data on the flash cache configured for the cell:

1. The flash disks configured as cell disks, e.g. FD_00_cell01, FD_01_cell01, etc. You can get information on these cell disks by list celldisk command, as shown previously. Here is another example:

CellCLI> list celldisk FD_00_cell01 detail         name:                   FD_00_cell01         comment:                "Flash Disk"         creationTime:           2011-01-21T17:07:32-05:00         deviceName:             /dev/sdq         devicePartition:        /dev/sdq         diskType:               FlashDisk         errorCount:             0         freeSpace:              0         id:                     xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx         interleaving:           none         lun:                    1_0         size:                   22.875G         status:                 normal

2. The total size is 365.25GB.3. It’s available.

You can also get the output in a single line:

CellCLI> list flashcache attributes all 

         cell01_FLASHCACHE      

FD_13_cell01,FD_00_cell01,FD_10_cell01,FD_02_cell01,FD_06_cell01,FD_12_cell01,FD_05_cell01,FD_08_cel

l01,FD_15_cell01,FD_14_cell01,FD_07_cell01,

FD_04_cell01,FD_03_cell01,FD_11_cell01,FD_09_cell01,FD_01_cell01         2011-01-21T17:07:43-

05:00               365.25G         686510aa-1f06-4ee7-ab84-743db1ae01d8    365.25G normal

Like the previous cases, you can see what attributes are available for the flashcache.

CellCLI> describe flashcache        name        cellDisk        creationTime        degradedCelldisks        effectiveCacheSize        id        size        status

Page 51: Exadata

From these you can list only a subset of the attributes:

CellCLI> list flashcache attributes degradedCelldisks

ModificationThere is no modification operation for flash cache.

CreationYou create the flash cache using the create flashcache command. You will at least need to specify the cell

disks created on the flash disks.

CellCLI> create

flashcachecelldisk='FD_13_cell01,FD_00_cell01,FD_10_cell01,FD_02_cell01,FD_06_cell01,FD_12_cell01,FD

_05_cell01,FD_08_cell01,FD_15_cell01,FD_14_cell01,FD_07_cell01,

FD_04_cell01,FD_03_cell01,FD_11_cell01,FD_09_cell01,FD_01_cell01'

If you want to use all the flash-based cell disks, you can use:

CellCLI> create flashcache allFlash cache cell01_FLASHCACHE successfully created

To specify a certain size:

CellCLI> create flashcache all size=365.25G

If you want to use only a few cell disks as flash cache and the rest as a grid disks to be ultimately used by ASM diskgroups, you can use only a handful of the cell disks.

CellCLI> create flashcache celldisk='FD_00_cell01' Flash cache cell01_FLASHCACHE successfully created

The rest of the flash disks can be used as grid disks and then used for ASM disk groups. Note how the flash cache size is reduced now.

CellCLI> list flashcache detail         name:                   cell01_FLASHCACHE         cellDisk:               FD_00_cell01         creationTime:           2011-05-15T15:03:44-04:00         degradedCelldisks:               effectiveCacheSize:     22.828125G         id:                     xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx         size:                   22.828125G         status:                 normal

DeletionYou generally don’t need to drop the flash cache, except under one circumstance: when you want to use part (or none) of the flash disks as storage and not as flash cache. To drop the flash cache from the cell, use the following command.

CellCLI> drop flashcacheFlash cache cell01_FLASHCACHE successfully dropped

After that you can either create the flash cache with fewer cell disks or not create it at all.

Flash Cache ContentRemember the purpose of the flash disks? To recap, their purpose is to provide a secondary buffer, known as flash cache. You've already learned how to manage the flash cache, but that brings up a very important point: How do you know if the flash cache has been effective?

A simple check is to look for the contents of the flash cache. To display the contents, use the following command. Beware, in a production system where a lot of data may be cached, this output may be very long. You can always press Control-C to break the output.

CellCLI> list flashcachecontent

Page 52: Exadata

         cachedKeepSize:         0         cachedSize:             81920         dbID:                   374480170         dbUniqueName:           PROPRD         hitCount:               5         missCount:              0         objectNumber:           131483         tableSpaceNumber:       729         cachedKeepSize:         0         cachedSize:             65536         dbID:                   374480170         dbUniqueName:           PROPRD         hitCount:               5         missCount:              0         objectNumber:           131484         tableSpaceNumber:       729… output truncated …

The output shows all the contents of the flash cache. In the previous example, it shows some important data:

dbID and dbUniqueName – DBID and name of the database of whose contents are in the flash cache. Remember, the DBM may contain more than one database so you should know how much is from which database.

objectNumber – the DATA_OBJECT_ID (not OBJECT_ID) value from DBA_OBJECTS view for the database object which is in the cache.

tableSpaceNumber – the tablespace where the object is stored in the database.

Other data points are self explanatory. If you want to see what all attributes of the flash cache contents are visible, you can use theattribute command shown below.

CellCLI> describe flashcachecontent        cachedKeepSize        cachedSize        dbID        dbUniqueName        hitCount        hoursToExpiration        missCount        objectNumber        tableSpaceNumber

Let’s see with an example table called MONTEST1. First we need to know its data_object_id, which is the same as object_id in most cases, except where the segments may be different, e.g. in case of partitioned tables, materialized views and indexes. 

SQL> select object_id, data_object_id from user_objects  2  where object_name = 'MONTEST1'; OBJECT_ID DATA_OBJECT_ID---------- --------------    211409         211409

In this case the data_object_id and object_id are the same, because this is an non-partitioned table. Now, let’s see how many fragments of this table are in the flash cache:

CellCLI> list flashcachecontent where objectnumber=211409 detail          cachedKeepSize:         0         cachedSize:             1048576         dbID:                   374480170         dbUniqueName:           PROPRD         hitCount:               48         missCount:              2237

Page 53: Exadata

         objectNumber:           211409         tableSpaceNumber:       846… output truncated …

The output shows the chunks of the data from this segment in the flash cache. You could also use the same technique to find the objects of a single tablespace in the cache. You need to know the tablespace number, which is something you can get from the table TS$ in the SYS schema. In the output above, we see that the cache chunk is from the tablespace 846. If you want to check the tablespace name, use the following query:

SQL> select name  2  from ts$  3  where ts# = 846;NAME------------------------------USERS_DATA

Conversely, you can get the tablespace number from the name and use it to check the cache entries from CellCLI.

CreationThere is no creation operation for flash cache content. Data goes in there by the way of normal operations.

ModificationThere is no modification operation; the flash cache is managed by the Exadata Storage Server software.

DeletionThere is no deletion operation since the entries are deleted from the cache by the Exadata Storage Server software.

Group Commands – DCLISo far you have seen all the commands possible through CellCLI, which applies to the cell you are logged into at that time. What if you want to manage something on all the cells?

For instance, suppose you want to list the status of all 14 cells in the Exadata full-rack database machine. CellCLI’s list cell command gives that information right away, but for that cell only. You can log on to all the other 13 cells and issue the command to get the status – a process that is not only time consuming but which also may not work very well for scripting purposes, at least not easily. To address that, a new interface is available, named DCLI, which allows you to run commands on all other cells while logged in to single cell.

Let’s see the original problem – you are logged in to cell01 and you want to check the status of cells 1 through 14. You then execute the dcli operation as shown below:

[celladmin@prolcel01 ~]# dcli -c prolcel01,prolcel02,prolcel03,prolcel04,prolcel05,prolcel06,prolcel07,prolcel08, prolcel09,prolcel10,prolcel11,prolcel12,prolcel13,prolcel14 -l root "cellcli -e list cell"prolcel01: cell01        onlineprolcel02: cell02        onlineprolcel03: cell03        onlineprolcel04: cell04        online… output truncated …

Let’s dissect the command.

The last part of the command cellcli –e list cell is what we want to run on all the other cells. The -c option specifies the cells where this command cellcli –e list cell should be run The -l option specifies the user the command should be run as. In this case it’s root. The default is

celadmin.

The DCLI interface is not a command but rather a Python script that executes the command on the other cells.

Page 54: Exadata

This remote execution is done by ssh command, therefore the cells should already have ssh equivalency. If you don’t have it, you can use dcli -k  to establish it.

Note the use of all the cell names with the -c option. What is you always select the cell names? The need to provide the cell names every time may not be convenient and is prone to mistakes. Another parameter, -g (group), allows definition of a group of cells that can be addressed as a whole. First create a file called all_cells

with the hostnames of the cells as shown below:

[celladmin@prolcel01 ~]# cat all_cellsprolcel01prolcel02prolcel03prolcel04prolcel05prolcel06prolcel07prolcel08prolcel09prolcel10prolcel11prolcel12prolcel13prolcel14With this file in place, you can pass this file to the group (-g) parameter of DCLI instead of the cell names:

[celladmin@prolcel01 ~]# dcli -g all_cells -l root "cellcli -e list cell"If you want to see which cells are used as targets of the execution, you can use the -t option, for “target”. The

hostnames of the cells where the commands from DCLI will be executed will be displayed. Suppose you made a mistake by not placing all the cells in that all_cells file but rather only 8, this command will show it clearly.

[celladmin@prolcel01 ~]# dcli -g all_cells -tTarget cells: ['prolcel01', 'prolcel02', 'prolcel03', 'prolcel04', 'prolcel05', 'prolcel06', 'prolcel07', 'prolcel08']

Only cells up to #8 were displayed.

But DCLI is not just to execute CellCLI commands. It’s a remote execution tool. Any command that can be executed from the command line can be given to the DCLI interface to be executed remotely. 

One such command is vmstat. Suppose you want to get a quick pulse on all the cells by executing vmstat 2 2, you can pass it to the DCLI interface, as shown below:

[celladmin@prolcel01 ~]# dcli -l root -g all_cells vmstat 2 2prolcel01: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------prolcel01: r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa stprolcel01: 1  0      0 9077396 381232 1189992    0    0    13    12    0    0  1  0 99  0  0prolcel01: 0  0      0 9121764 381232 1190032    0    0   260   564 1143 25691  3  1 96  0  0prolcel02: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------prolcel02: r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa stprolcel02: 1  0      0 9418412 350612 970424    0    0     7    10    0    0  0  0 99  0  0prolcel02: 1  0      0 9417852 350612 970424    0    0     0    28 1047 23568  3  1 96  0  0

Page 55: Exadata

prolcel03: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------prolcel03: r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa stprolcel03: 1  0      0 9312432 354948 1049780    0    0     7    12    0    0  0  0 99  0  0prolcel03: 0  0      0 9313108 354948 1049892    0    0     0    60 1040 19046  0  0 100  0  0prolcel04: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------prolcel04: r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa stprolcel04: 0  0      0 9275392 360644 1080236    0    0    21    13    0    0  0  0 99  0  0prolcel04: 0  0      0 9275788 360644 1080240    0    0   280    68 1093 17136  0  0 100  0  0prolcel05: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------prolcel05: r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa stprolcel05: 1  0      0 9200652 361064 1058544    0    0     6    12    0    0  0  0 99  0  0prolcel05: 1  0      0 9200840 361064 1058544    0    0     0    52 1036 18000  0  4 96  0  0prolcel06: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------prolcel06: r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa stprolcel06: 0  0      0 9280912 354832 1057432    0    0     8    11    0    0  0  0 99  0  0prolcel06: 0  0      0 9281388 354832 1057440    0    0    32    24 1040 18619  0  0 100  0  0prolcel07: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------prolcel07: r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa stprolcel07: 0  0      0 9795664 287140 877948    0    0     1     8    0    0  0  0 99  0  0prolcel07: 0  0      0 9795136 287140 878004    0    0     0    36 1032 21026  0  1 99  0  0prolcel08: procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------prolcel08: r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa stprolcel08: 0  0      0 9295716 320564 1043408    0    0     6    10    0    0  0  0 99  0  0prolcel08: 0  0      0 9295740 320564 1043412    0    0     0     4 1014 21830  0  0 100  0  0

The output is useful in many ways. Not only it saved you the need of typing the command on all the cells, it displayed the output on the same page allowing you to compare the results across the cells. For instance, you may notice that the values for blocks out (bo) and blocks in (bi) in the output are not similar across all the cells. They are more in case cell01 and cell04, indicating higher IO activity in those cells.

Here is another example. Suppose you want to make sure that OSWatcher is running on all the cells. You can use the following command to check that.

[celladmin@prolcel01 ~]# dcli -l root -g all_cells ps -aef|grep OSWatcherprolcel01: root       557 19331  0 Feb28 ?        00:15:08 /bin/ksh

Page 56: Exadata

./OSWatcherFM.sh 168prolcel01: root      5886 24280  0 21:14 pts/0    00:00:00 grep OSWatcherprolcel01: root     19331     1  0 Feb28 ?        01:08:33 /bin/ksh ./OSWatcher.sh 15 168 bzip2prolcel02: root       554 17501  0 Feb28 ?        00:14:47 /bin/ksh ./OSWatcherFM.sh 168

To complete the discussion on remote execution, please note that it can be executed on any of cells, where the current cell may be excluded. For instance, you are on cell01 and you want to check the status of a specific physical disk on cell08, you don’t need to logon to that cell to use the command. Right from cell0, you can use the dcli command with the -c option to execute on cell08, as shown below:

[celladmin@prolcel01 ~]# dcli -l root -c prolcel08 cellcli -e "list physicaldisk  20:8 detail"prolcel08: name:                 20:8prolcel08: deviceId:             11prolcel08: diskType:             HardDiskprolcel08: enclosureDeviceId:    20prolcel08: errMediaCount:        0prolcel08: errOtherCount:        1prolcel08: foreignState:         falseprolcel08: luns:                 0_8prolcel08: makeModel:            "SEAGATE ST360057SSUN600G"prolcel08: physicalFirmware:     0805prolcel08: physicalInsertTime:   2011-01-21T14:23:57-05:00prolcel08: physicalInterface:    sasprolcel08: physicalSerial:       XXXXXXprolcel08: physicalSize:         558.9109999993816Gprolcel08: slotNumber:           8prolcel08: status:               normal

ModificationApart from checking values, DCLI is also useful to set values across a lot of cells at a time. For instance, suppose your mail SMTP server is changed and you need to set up a different value of smtServer for the cells. The alter cell smtpServer=’…’ command in CellCLI does that; but it needs to be executed on all the cells.

Instead, we can use DCLI to execute them across all the cells specified as a group in the file all_cells, as shown below.

[celladmin@prolcel01 ~]# dcli -l celladmin -g all_cells cellcli -e "alter cell smtpServer=\'smtp.proligence.com\'"prolcel01: Cell cell01 successfully alteredprolcel02: Cell cell02 successfully alteredprolcel03: Cell cell03 successfully alteredprolcel04: Cell cell04 successfully alteredprolcel05: Cell cell05 successfully alteredprolcel06: Cell cell06 successfully alteredprolcel07: Cell cell07 successfully alteredprolcel08: Cell cell08 successfully altered

Use the same technique to update the other SMTP parameters of the cell as well:

[celladmin@prolcel01 ~]# dcli -l root -g all_cells cellcli -e "alter cell smtpToAddr=\'[email protected]\'"

Check up all settings at one place:

[celladmin@prolcel01 ~]# dcli -l root -g all_cells cellcli -e "list cell attributes

name,smtpServer,smtpToAddr,smtpFrom,smtpFromAddr,smtpPort"

prolcel01: cell01        STCEXCPMB02.xxxx.xxxx   [email protected]   "Proligence Test Database

Machine"    [email protected]         25

prolcel02: cell02        STCEXCPMB02.xxxx.xxxx   [email protected]   "Proligence Test Database

Page 57: Exadata

Machine"    [email protected]         25

prolcel03: cell03        STCEXCPMB02.xxxx.xxxx   [email protected]   "Proligence Test Database

Machine"    [email protected]         25

prolcel04: cell04        STCEXCPMB02.xxxx.xxxx   [email protected]   "Proligence Test Database

Machine"    [email protected]         25

prolcel05: cell05        STCEXCPMB02.xxxx.xxxx   [email protected]   "Proligence Test Database

Machine"    [email protected]         25

prolcel06: cell06        STCEXCPMB02.xxxx.xxxx   [email protected]   "Proligence Test Database

Machine"    [email protected]         25

prolcel07: cell07        STCEXCPMB02.xxxx.xxxx   [email protected]   "Exadata Cell 07"      

[email protected]       25

After everything is set up properly, you may want to make sure the email set up is correct by running the alter cell validate command, across all the cells as shown below.

[celladmin@prolcel01 ~]# dcli -l root -g all_cells cellcli -e "alter cell validate mail"

If you get the email from these cells, then the emails have been configured properly.

Abbreviated OutputYou might see that the output from DCLI is not rather large since it shows the output from all the cells. It might be difficult to read, especially for scripting purposes. To address that problem, DCLI has another parameter -n,

which simply shows OK after a successful execution and shows the cells where it was successfully executed. For instance, you can execute the email validation shown earlier with the -n option:

[celladmin@prolcel01 ~]# dcli -l root -g all_cells -n cellcli -e "alter cell validate mail"OK: ['prolcel01', 'prolcel02', 'prolcel03', 'prolcel04', 'prolcel05', 'prolcel06', 'prolcel07', 'prolcel08']

Regular ExpressionsSometimes you want to check something quickly but want to see only exceptions; not all output. For instance, suppose you want to find out which grid disks are inactive. Following command will show the status of the grid disks.

[celladmin@prolcel01 ~]# dcli -l root -g all_cells cellcli -e "list griddisk"

But it will list all the disks – active or inactive. With hundreds of grid disks, the output may be too overwhelming and less useful. Instead, you may want to see those ones that are inactive. The -r option allows you to enter a

regular expression that will not be matched. In this case, we want to see the strings which are not “active”. We can write a command like the following.

[celladmin@prolcel01 ~]# dcli -r '.* active' -l root -g all_cells cellcli -e "list griddisk".* active: ['prolcel01', 'prolcel02', 'prolcel03', 'prolcel04', 'prolcel05', 'prolcel06', 'prolcel07', 'prolcel08']prolcel01: PRORECO_CD_11_cell01  inactive

It clearly showed that the grid disk PRORECO_CD_11_cell01 in cell prolcel01 is inactive. It’s a quick check on abnormal conditions.

Command ScriptsYou can place cellcli commands inside a script and call that script. Suppose you want to develop an error

check script for CellCLI called err.dcl (the extension .dcl is not mandatory; it’s just convenient). Here is how the script looks like:

[celladmin@prolcel01 ~]# cat err.dclcellcli -e list physicaldisk attributes errCmdTimeoutCount,errHardReadCount,errHardWriteCount,errMediaCount,errOtherCount,errSeekCount,errorCount where disktype='HardDisk'

Page 58: Exadata

You saw earlier how to execute this script from CellCLI. You can execute this script from DCLI as well, using the -x option, as shown below:

[celladmin@prolcel01 ~]# dcli -l root -g all_cells -x err.dclprolcel01: 34:0          0       0prolcel01: 34:1          0       0prolcel01: 34:2          0       0prolcel01: 34:3          0       0… output truncated …prolcel08: 20:6          0       0prolcel08: 20:7          0       0prolcel08: 20:8          0       1prolcel08: 20:9          0       0prolcel08: 20:10         0       0prolcel08: 20:11         0       0

Note how the error is shown in prolcel08 but it is buried under tons of other output, making it less readable and prone to overlook. Here is where the -r option (described in the last subsection) comes handy, to remove the

lines containing “active”, as shown below:

[celladmin@prolcel01 ~]# dcli -l root -g all_cells -r '.* 0' -x err.dcl.* 0: ['prolcel01', 'prolcel02', 'prolcel03', 'prolcel04', 'prolcel05', 'prolcel06', 'prolcel07', 'prolcel08']prolcel08: 0     1

This resultant output is short and simple.

SummaryIn summary, the dcli command allows you to execute commands (including cellcli) on other cell servers. The general structure of the command is 

# dcli <Option> <Command>

Where commands could be CellCLI commands, e.g. cellcli –e list cell, or Linux commands like vmstat 2 2.

Here are the most commonly used options and what they do.

Option Value Expected after the Option

Example Usage What It Does

-c Names of the cell(s)

-c dwhcel01,dwhcel02

Executes the commands only on those cells.

-g The file name where the cells are listed

-g all_cells Executes the command on the cells mentioned in the file all_cells.

-l The OS user that will execute the command

-l root The default user is celladmin; but you can use any other user to use for remote ssh execution. Make sure the user has ssh equivalency across all the

Page 59: Exadata

cells where you run this command

-n Not Applicable

-n This shows an abbreviated output instead of a long output from each command execution

-r A regular expression

-r [*. active] Suppresses the output that matches the regular expression

-t Not Applicable

-t Displays the cells where the command will run

-x A script with executable commands

-x list.dcl The script will be executed on the target cells

-k Names of the cell(s)

-k dwhcel01,dwhcel02

Establishes the ssh user equivalency

-f A filename -f run.dcl Copies the files to the other cells but does not execute them. It's useful for copying files and executing them later.

ConclusionThe commands CellCLI and DCLI are used to manage the Exadata Storage Server, and most likely are only commands that you are not familiar with as a DBA. In this installment you saw the most useful commands and their example usage. In the next and final installment, you will learn how to use some of these and Oracle Enterprise Manager Grid Control to check on the various components and get detailed reports.> Back to Series TOC

Arup Nanda ([email protected]) has been an Oracle DBA for more than 14 years, handling all aspects of database administration, from performance tuning to security and disaster recovery. He is an Oracle ACE Director and was Oracle Magazine's DBA of the Year in 2003.

Page 60: Exadata

Oracle Exadata Commands Reference

Part 4: Metrics and Reportingby Arup Nanda

Learn how to use various metrics and reports available to measure the efficiency and effectiveness of your Exadata Database Machine environment.> Back to Series TOC(The purpose of this guide is educational; it is not intended to replace official Oracle-provided manuals or other

documentation. The information in this guide is not validated by Oracle, is not supported by Oracle, and should only be used

at your own risk.)

To keep a handle on the Oracle Exadata Database Machine’s reliability and efficiency you will need to measure several health metrics. Monitoring these metrics is likely to be your most common activity, but even so you would probably want a heads-up when something is amiss. In this final installment, you will learn how to check these metrics, either via the CellCLI interface or Oracle Enterprise Manager, and receive alerts based on certain conditions.

Metrics Using CellCLIFirst, let’s see how to get the metrics through CellCLI. To check the current metrics, simply use the following command:

CellCLI> list metriccurrent

It will show all the metrics, which quickly becomes impossible to read. Instead you may want to focus on specific ones. For example, to display a specific metric called GD_IO_RQ_W_SM, use it as a modifier. (If you are wondering what this metric is about, you will learn that shortly.)

CellCLI> list metriccurrent gd_io_rq_w_sm         GD_IO_RQ_W_SM   PRODATA_CD_00_cell01    2,665 IO requests         GD_IO_RQ_W_SM   PRODATA_CD_01_cell01    5,710 IO requests         GD_IO_RQ_W_SM   PRODATA_CD_02_cell01    2,454 IO requests… output truncated …

The metric shows you the I/O requests to the different cell disks. While this may be good for a quick glance at some single parameter, it’s not super-useful for checking up on all the other metrics. To get a longer list of metrics, use the detail option.CellCLI> list metriccurrent gd_io_rq_w_sm detail         name:                   GD_IO_RQ_W_SM         alertState:             normal         collectionTime:         2011-05-26T11:25:33-04:00         metricObjectName:       PRODATA_CD_02_cell01         metricType:             Cumulative         metricValue:            2,456 IO requests         objectType:             GRIDDISK         name:                   GD_IO_RQ_W_SM         alertState:             normal         collectionTime:         2011-05-26T11:25:33-04:00         metricObjectName:       PRODATA_CD_03_cell01         metricType:             Cumulative         metricValue:            2,982 IO requests         objectType:             GRIDDISK         name:                   GD_IO_RQ_W_SM         alertState:             normal         collectionTime:         2011-05-26T11:25:33-04:00         metricObjectName:       PRODATA_CD_04_cell01         metricType:             Cumulative         metricValue:            1,631 IO requests

Page 61: Exadata

         objectType:             GRIDDISK… output truncated …

The output shows the name metric, the object on which this metric is defined (e.g. PRODATA_CD_02_cell01), the type of that object (e.g. grid disk), when the metric was collected, and the current value, among others.

The objects on which the metrics have been defined are wide ranging in types. For instance, here is the metric for rate of reception for I/O packets, which is defined on the object type Cell.CellCLI> list metriccurrent n_nic_rcv_sec detail         name:                   N_NIC_RCV_SEC         alertState:             normal         collectionTime:         2011-05-26T11:23:28-04:00         metricObjectName:       cell01         metricType:             Rate         metricValue:            3.7 packets/sec         objectType:             CELL

AttributesWhat other information does current metric show? To know that answer, use the following command.

CellCLI> describe metriccurrent                         name        alertState        collectionTime        metricObjectName        metricType        metricValue        objectType

Note the attribute called alertState, which is probably the most important for you. If the metric is within the limit defined as safe, the alertState will be set to normal. You can use this command to display the interesting attributes where the alert has been triggered, i.e. alertState is non-zero:

CellCLI> list metriccurrent attributes name,metricObjectName,metricType,metricValue,objectType -> where alertState != 'normal'

Alternatively you can select the current metrics on other types of objects as well. For instance, here is an example of checking the metrics on the interconnect.

CellCLI> list metriccurrent attributes name,metricObjectName,         metricType,metricValue,alertState -> where objectType = 'HOST_INTERCONNECT'                                                                     N_MB_DROP               proddb01.xxxx.xxxx      Cumulative      0.0 MB  normal         N_MB_DROP               proddb02.xxxx.xxxx      Cumulative      0.0 MB  normal         N_MB_DROP               proddb03.xxxx.xxxx      Cumulative      0.0 MB normal         N_MB_DROP               proddb04.xxxx.xxxx      Cumulative      0.0 MB  normal… outout truncated …

As always, refer to the describe command to learn about the order of the attributes, since the output does not have a heading. You can also skip naming the attribute and just use ALL, which will show all the attributes. Let’s see the metrics for a cell. Since we are running it on cell01, the metrics of only that cell is shown:

CellCLI> list metriccurrent attributes all where objectType =

'CELL'                                       

         CL_BBU_CHARGE           normal  2011-05-26T11:37:10-04:00       cell01  Instantaneous  

93.0 %                  CELL

Page 62: Exadata

         CL_BBU_TEMP             normal  2011-05-26T11:37:10-04:00       cell01  Instantaneous  

41.0 C                  CELL

         CL_CPUT                 normal  2011-05-26T11:37:10-04:00       cell01  Instantaneous   2.4

%                   CELL

         CL_FANS                 normal  2011-05-26T11:37:10-04:00       cell01  Instantaneous  

12                      CELL

         CL_MEMUT                normal  2011-05-26T11:37:10-04:00       cell01  Instantaneous   41

%                    CELL

         CL_RUNQ                 normal  2011-05-26T11:37:10-04:00       cell01  Instantaneous  

0.7                     CELL

         CL_TEMP                 normal  2011-05-26T11:37:10-04:00       cell01  Instantaneous  

25.0 C                  CELL

         IORM_MODE               normal  2011-05-26T11:36:34-04:00       cell01  Instantaneous  

0                       CELL

         N_NIC_NW                normal  2011-05-26T11:37:28-04:00       cell01  Instantaneous  

3                       CELL

         N_NIC_RCV_SEC           normal  2011-05-26T11:37:28-04:00       cell01  Rate            3.2

packets/sec         CELL

         N_NIC_TRANS_SEC         normal  2011-05-26T11:37:28-04:00       cell01  Rate            3.6

packets/sec         CELL

Let’s examine the meaning of these metrics in some detail:

CL_BBU_CHARGE Battery charge on the disk controller

CL_BBU_TEMP Temperature of the disk controller battery

CL_CPUT CPU utilization

CL_FANS Number of working fans

CL_FSUT %age utilization of file systems

CL-MEMUT %age of memory use

CL_RUNQ The run queue

CL_TEMP Temperature of the cell

IORM_MODE I/O Resource Management mode in effect

N_NIC_NW Number of interconnects *not* working

N_NIC_RCV_SEC Rate of reception of I/O packet

N_NIC_TRANS_SEC Rate of transmission of I/O packet

Speaking of current metrics, what are the names? You saw a few of them previously: N_MB_DROP, etc. To get a complete list, use the following command:CellCLI> list metricdefinition

         CD_IO_BY_R_LG

         CD_IO_BY_R_LG_SEC

         CD_IO_BY_R_SM

         CD_IO_BY_R_SM_SEC

         CD_IO_BY_W_LG

         CD_IO_BY_W_LG_SEC

         CD_IO_BY_W_SM

Page 63: Exadata

         CD_IO_BY_W_SM_SEC

         CD_IO_ERRS

         CD_IO_ERRS_MIN

         CD_IO_LOAD

         CD_IO_RQ_R_LG

         CD_IO_RQ_R_LG_SEC

         CD_IO_RQ_R_SM

         CD_IO_RQ_R_SM_SEC

         CD_IO_RQ_W_LG

         CD_IO_RQ_W_LG_SEC

         CD_IO_RQ_W_SM

         CD_IO_RQ_W_SM_SEC

         CD_IO_ST_RQ

         CD_IO_TM_R_LG

         CD_IO_TM_R_LG_RQ

         CD_IO_TM_R_SM

         CD_IO_TM_R_SM_RQ

         CD_IO_TM_W_LG

         CD_IO_TM_W_LG_RQ

         CD_IO_TM_W_SM

         CD_IO_TM_W_SM_RQ

         CG_FC_IO_BY_SEC

         CG_FC_IO_RQ

         CG_FC_IO_RQ_SEC

         CG_FD_IO_BY_SEC

         CG_FD_IO_LOAD

         CG_FD_IO_RQ_LG

         CG_FD_IO_RQ_LG_SEC

         CG_FD_IO_RQ_SM

         CG_FD_IO_RQ_SM_SEC

         CG_IO_BY_SEC

         CG_IO_LOAD

         CG_IO_RQ_LG

         CG_IO_RQ_LG_SEC

         CG_IO_RQ_SM

         CG_IO_RQ_SM_SEC

         CG_IO_UTIL_LG

         CG_IO_UTIL_SM

         CG_IO_WT_LG

         CG_IO_WT_LG_RQ

         CG_IO_WT_SM

         CG_IO_WT_SM_RQ

         CL_BBU_CHARGE

         CL_BBU_TEMP

         CL_CPUT

         CL_FANS

         CL_FSUT

         CL_MEMUT

         CL_RUNQ

         CL_TEMP

         CT_FC_IO_BY_SEC

         CT_FC_IO_RQ

         CT_FC_IO_RQ_SEC

         CT_FD_IO_BY_SEC

         CT_FD_IO_LOAD

         CT_FD_IO_RQ_LG

         CT_FD_IO_RQ_LG_SEC

         CT_FD_IO_RQ_SM

Page 64: Exadata

         CT_FD_IO_RQ_SM_SEC

         CT_IO_BY_SEC

         CT_IO_LOAD

         CT_IO_RQ_LG

         CT_IO_RQ_LG_SEC

         CT_IO_RQ_SM

         CT_IO_RQ_SM_SEC

         CT_IO_UTIL_LG

         CT_IO_UTIL_SM

         CT_IO_WT_LG

         CT_IO_WT_LG_RQ

         CT_IO_WT_SM

         CT_IO_WT_SM_RQ

         DB_FC_IO_BY_SEC

         DB_FC_IO_RQ

         DB_FC_IO_RQ_SEC

         DB_FD_IO_BY_SEC

         DB_FD_IO_LOAD

         DB_FD_IO_RQ_LG

         DB_FD_IO_RQ_LG_SEC

         DB_FD_IO_RQ_SM

         DB_FD_IO_RQ_SM_SEC

         DB_IO_BY_SEC

         DB_IO_LOAD

         DB_IO_RQ_LG

         DB_IO_RQ_LG_SEC

         DB_IO_RQ_SM

         DB_IO_RQ_SM_SEC

         DB_IO_UTIL_LG

         DB_IO_UTIL_SM

         DB_IO_WT_LG

         DB_IO_WT_LG_RQ

         DB_IO_WT_SM

         DB_IO_WT_SM_RQ

         FC_BYKEEP_OVERWR

         FC_BYKEEP_OVERWR_SEC

         FC_BYKEEP_USED

         FC_BY_USED

         FC_IO_BYKEEP_R

         FC_IO_BYKEEP_R_SEC

         FC_IO_BYKEEP_W

         FC_IO_BYKEEP_W_SEC

         FC_IO_BY_R

         FC_IO_BY_R_MISS

         FC_IO_BY_R_MISS_SEC

         FC_IO_BY_R_SEC

         FC_IO_BY_R_SKIP

         FC_IO_BY_R_SKIP_SEC

         FC_IO_BY_W

         FC_IO_BY_W_SEC

         FC_IO_ERRS

         FC_IO_RQKEEP_R

         FC_IO_RQKEEP_R_MISS

         FC_IO_RQKEEP_R_MISS_SEC

         FC_IO_RQKEEP_R_SEC

         FC_IO_RQKEEP_R_SKIP

         FC_IO_RQKEEP_R_SKIP_SEC

         FC_IO_RQKEEP_W

Page 65: Exadata

         FC_IO_RQKEEP_W_SEC

         FC_IO_RQ_R

         FC_IO_RQ_R_MISS

         FC_IO_RQ_R_MISS_SEC

         FC_IO_RQ_R_SEC

         FC_IO_RQ_R_SKIP

         FC_IO_RQ_R_SKIP_SEC

         FC_IO_RQ_W

         FC_IO_RQ_W_SEC

         GD_IO_BY_R_LG

         GD_IO_BY_R_LG_SEC

         GD_IO_BY_R_SM

         GD_IO_BY_R_SM_SEC

         GD_IO_BY_W_LG

         GD_IO_BY_W_LG_SEC

         GD_IO_BY_W_SM

         GD_IO_BY_W_SM_SEC

         GD_IO_ERRS

         GD_IO_ERRS_MIN

         GD_IO_RQ_R_LG

         GD_IO_RQ_R_LG_SEC

         GD_IO_RQ_R_SM

         GD_IO_RQ_R_SM_SEC

         GD_IO_RQ_W_LG

         GD_IO_RQ_W_LG_SEC

         GD_IO_RQ_W_SM

         GD_IO_RQ_W_SM_SEC

         IORM_MODE

         N_MB_DROP

         N_MB_DROP_SEC

         N_MB_RDMA_DROP

         N_MB_RDMA_DROP_SEC

         N_MB_RECEIVED

         N_MB_RECEIVED_SEC

         N_MB_RESENT

         N_MB_RESENT_SEC

         N_MB_SENT

         N_MB_SENT_SEC

         N_NIC_NW

         N_NIC_RCV_SEC

         N_NIC_TRANS_SEC

         N_RDMA_RETRY_TM

You can use any of these metrics in the metricType.

I am sure your next question is on the description of these metrics. To know more about each metric, issue the command with the name and the detail modifiers:CellCLI> list metricdefinition cl_cput detail         name:                   CL_CPUT         description:            "Cell CPU Utilization is the percentage of time over the                                   previous minute that the system CPUs were not idle (from /proc/stat)."         metricType:             Instantaneous         objectType:             CELL         unit:                   % 

Object TypeNot all metrics are available on all objects. For instance the temperature is a metric for cells but not the disks.

Page 66: Exadata

Note the objectType attribute in the output above; it shows the type of the object. You can learn about metrics on a specific type of object.First, see all the attributes you can choose:

CellCLI> describe metricdefinition                     name        description        metricType        objectType        persistencePolicy        unit

This will allow you to use attributes all keywords and let you understand the heading of the output.

CellCLI> list metricdefinition attributes all where objecttype='CELL'               

         CL_BBU_CHARGE    "Disk Controller Battery Charge"   

Instantaneous   CELL    %

         CL_BBU_TEMP      "Disk Controller Battery

Temperature"                                        Instantaneous   CELL   

         CL_CPUT          "Cell CPU Utilization is the percentage of time over the previous minute 

                           that the system CPUs were not idle (from

/proc/stat)."                      Instantaneous   CELL    %

         CL_FANS          "Number of working fans on the

cell"                                         Instantaneous   CELL    Number

         CL_MEMUT         "Percentage of total physical memory on the cell that is currently

used"     Instantaneous   CELL    %

         CL_RUNQ          "Average number (over the preceding minute) of processes in the Linux 

                           run queue marked running or uninterruptible (from

/proc/loadavg)."          Instantaneous   CELL    Number

         CL_TEMP          "Temperature (Celsius) of the server, provided by the

BMC"                   Instantaneous   CELL    C

         IORM_MODE        "I/O Resource Manager objective for the

cell"                                Instantaneous   CELL    Number

         N_NIC_NW         "Number of non-working

interconnects"                                        Instantaneous   CELL    Number

         N_NIC_RCV_SEC    "Total number of IO packets received by interconnects per

second"            Rate            CELL    packets/sec

         N_NIC_TRANS_SEC  "Total number of IO packets transmitted by interconnects per

second"         Rate            CELL    packets/sec

Using this command you can get the description on any metric. Some of the metric values are 0. Let’s see all metrics of Grid Disks which are bigger than 0:

CellCLI> list metriccurrent attributes all where objectType = 'GRIDDISK' -

> and metricObjectName = 'PRODATA_CD_09_cell01' -

> and metricValue > 0 

       GD_IO_BY_R_LG   normal  2011-05-26T11:38:34-04:00  PRODATA_CD_09_cell01    Cumulative     

3.9MB               GRIDDISK

       GD_IO_BY_R_SM   normal  2011-05-26T11:38:34-04:00  PRODATA_CD_09_cell01    Cumulative     

29.3MB              GRIDDISK

       GD_IO_BY_W_LG   normal  2011-05-26T11:38:34-04:00  PRODATA_CD_09_cell01    Cumulative     

797MB               GRIDDISK

       GD_IO_BY_W_SM   normal  2011-05-26T11:38:34-04:00  PRODATA_CD_09_cell01    Cumulative     

73.3MB              GRIDDISK

       GD_IO_RQ_R_LG   normal  2011-05-26T11:38:34-04:00  PRODATA_CD_09_cell01    Cumulative      4

IO requests       GRIDDISK

       GD_IO_RQ_R_SM   normal  2011-05-26T11:38:34-04:00  PRODATA_CD_09_cell01    Cumulative     

1,504 IO requests   GRIDDISK

       GD_IO_RQ_W_LG   normal  2011-05-26T11:38:34-04:00  PRODATA_CD_09_cell01    Cumulative     

Page 67: Exadata

3,230 IO requests   GRIDDISK

       GD_IO_RQ_W_SM   normal  2011-05-26T11:38:34-04:00  PRODATA_CD_09_cell01    Cumulative     

3,209 IO requests   GRIDDISK

Current metrics are like the pulse of the cell – they give you the health and effectiveness of the cell at that time.

Metrics HistorySometimes you may be interested in the history of the metrics. For instance you may want to get an idea about the trending of the values. Use the following command:

CellCLI> list metrichistory

It will get you everything and in fact the output may not be practical. Instead, you may want to examine metrics of a specific object only, since the metrics could be highly context relative. Here is an example where you can see the history of all the metric values for the cells.

CellCLI> list metrichistory where objectType = 'CELL'         IORM_MODE               cell01  0                       2011-05-26T12:59:39-04:00         CL_BBU_TEMP             cell01  41.0 C                  2011-05-26T13:00:10-04:00         CL_CPUT                 cell01  0.6 %                   2011-05-26T13:00:10-04:00         CL_FANS                 cell01  12                      2011-05-26T13:00:10-04:00         CL_MEMUT                cell01  41 %                    2011-05-26T13:00:10-04:00Don’t know what the columns are for? Just give the command describe metrichistory to see the attributes.

While the output shows the history of metrics, it’s a little difficult to parse. You may want to examine a single statistic, not a bunch of them. The following example shows the history of the temperature of the cell.

CellCLI> list metrichistory where objectType = 'CELL' -> and name = 'CL_TEMP'         CL_TEMP         cell01  25.0 C  2011-05-26T13:00:10-04:00         CL_TEMP         cell01  25.0 C  2011-05-26T13:01:10-04:00         CL_TEMP         cell01  25.0 C  2011-05-26T13:02:10-04:00         CL_TEMP         cell01  25.0 C  2011-05-26T13:03:10-04:00         CL_TEMP         cell01  25.0 C  2011-05-26T13:04:10-04:00         CL_TEMP         cell01  25.0 C  2011-05-26T13:05:10-04:00… output truncated …

Two of the often forgotten features of CellCLI is that you can invoke it with the –x option, which suppresses the banner, and the –n option, which suppresses the command line.

# cellcli -x -n -e "list metrichistory where objectType = 'CELL' and name = 'CL_TEMP'"         CL_TEMP         prodcel01       24.0 C  2011-05-27T02:00:47-04:00         CL_TEMP         prodcel01       24.0 C  2011-05-27T02:01:47-04:00         CL_TEMP         prodcel01       24.0 C  2011-05-27T02:02:47-04:00… output truncated …

This output can be taken straight to a spreadsheet program for charting or other types of analysis.

AlertsNow that you learned about metrics, let’s see how you can use them to alert you when something happens. The Exadata Database Machine comes with a lot of predefined alerts out of the box. To see them use the following command. To get more information on them, use the detail modifier.CellCLI> list alertdefinition detail         name:                   ADRAlert         alertShortName:         ADR         alertSource:            "Automatic Diagnostic Repository"

Page 68: Exadata

         alertType:              Stateless         description:            "Incident Alert"         metricName:                     name:                   HardwareAlert         alertShortName:         Hardware         alertSource:            Hardware         alertType:              Stateless         description:            "Hardware Alert"         metricName:                     name:                   StatefulAlert_CD_IO_ERRS_MIN         alertShortName:         CD_IO_ERRS_MIN         alertSource:            Metric         alertType:              Stateful         description:            "Threshold Alert"         metricName:             CD_IO_ERRS_MIN         name:                   StatefulAlert_CG_FC_IO_BY_SEC         alertShortName:         CG_FC_IO_BY_SEC         alertSource:            Metric         alertType:              Stateful         description:            "Threshold Alert"         metricName:             CG_FC_IO_BY_SEC… output truncated …

The alerts are based on metrics explained earlier. For instance, the alert StatefulAlert_CG_FC_IO_BY_SEC is based on the metric CG_FC_IO_BY_SEC. There are a few alerts which are not based on metrics. Here is how you can find them:

CellCLI> list alertdefinition attributes all where alertSource!='Metric'         ADRAlert                ADR             "Automatic Diagnostic Repository"       Stateless       "Incident Alert"                     HardwareAlert           Hardware        Hardware                                Stateless       "Hardware Alert"                     Stateful_HardwareAlert  Hardware        Hardware                                Stateful        "Hardware Stateful Alert"            Stateful_SoftwareAlert  Software        Software                                Stateful        "Software Stateful Alert"

The next step is to define an alert based on a threshold. Here is how you can define a threshold on the metric CD_IO_ERRS_MIN on database called PRODB. You want the alert to be fired when the value is more than 10.

CellCLI> create threshold cd_io_errs_min.prodb comparison=">", critical=10Threshold DB_IO_RQ_SM_SEC.PRODB successfully created

When an alert triggers, the cell takes the action on the notification schedule – it either sends an email or an SNMP trap, or both. Here is an example of an alert received via email. It shows that the cell failed to reach either the NTP or DNS server.

Page 69: Exadata

You can modify the threshold:

CellCLI> alter threshold DB_IO_RQ_SM_SEC.PRODB comparison=">", critical=100Threshold DB_IO_RQ_SM_SEC.PRODB successfully altered

And, when not needed, you can drop the threshold.

CellCLI> drop threshold DB_IO_RQ_SM_SEC.PRODB  Threshold DB_IO_RQ_SM_SEC.PRODDB successfully dropped

When you drop the threshold, the alert (if generated earlier) will be cleared and new alert is generated informing that the threshold was cleared. Here is an example email with that information.

Page 70: Exadata

Using Grid ControlAs you learned previously, you can administer the Exadata Database Machine in several ways: using the command-line tool CellCLI (or a number of cells at the same time by DCLI), using SRVCTL and CRSCTL, and via plain-old SQL commands. But the easiest approach, hands down, is to use Oracle Enterprise Manager Grid Control. The graphical interface simply makes the command interface intuitive and easy to read. If you already have a Grid Control infrastructure, it makes the decision very easy – you just add the Database Machine to it. If you don’t have Grid Control, perhaps you should seriously think about it now.

Setup

To manage the Database Machine via Grid Control, you need:

A plugin for storage server management for GC (download from http://www.oracle.com/technetwork/oem/grid-control/downloads/system-monitoring-connectors-082031.html)

An agent installed in each of the compute nodes of DBM. No agent need be installed on the storage cells.

To configure access to the cells from the Oracle Management Server console machine

Let’s examine each approach in detail. The plugin is a jarfile you download and keep on the OMS server. The grid control agents are installed on the database compute nodes the same way you would have done for a normal database server. You can either push the agents from the Grid Control console, or download the agent software on the database compute nodes and install there. During the installation you will be asked the address and port number of the OMS server, which will complete the installation.

The storage plug-in is different. This is not an agent on the storage cells; rather it’s installed as a part of the OMS server. So you need to download the plugin (which is a file with .jar extension) to the machine on which you launched the browser (typically your desktop), not the storage cells. After that, fire up the Grid Control browser and follow the steps shown below.

Page 71: Exadata

1. Go to the Setup screen, shown below: 

2. From the menu on the left, choose Management Plug-Ins.3. You need to import the plug-in to OMS. Click on the Import button.4. It will ask for a file name. Click on the button Browse, which will open the file explorer.5. Select the jarfile you just downloaded and click OK.6. The jarfile is actually an archive of different plug-ins inside. Choose the Exadata Storage Server

Plugin.7. Press the OK button.8. You will be asked to enter preferred credentials. Make sure the userids and passwords are correct.

9. Click on the icon marked Deploy.10. Click on the button Add Agents.11. Click on Next and then Finish.12. From the top bar menu (shown below) click on the hyperlink Agents.13. Select the agent where you deployed the system management plug-in.

14. Go to the Management Agents page.15. Select Oracle Exadata Storage Server target type from the drop down list.16. Click on Add.17. The screen will ask for the management IP of the storage server. Enter the IPs.

18. Click on the Test Connection button to ensure everything is working.19. Click on the OK button. 

That’s it; the storage servers are now ready to be monitored via Grid Control.

Basic Information

Once configured, your Exadata Database Machine shows up in Grid Control. The database compute nodes show up as normal database servers and the cluster shows up as a normal cluster database; there is nothing Exadata-specific in nature on those screens.

You may already be familiar with the normal Oracle RAC database management features on Grid Control, which applies to Exadata compute nodes as well. In this article we will assume you are familiar with the database management via Grid Control.

Page 72: Exadata

The real benefit comes in looking at the storage cells as an alternative to CellCLI. To check storage cells, go to Homepage > Targets >All Targets. In the Search dropdown box choose Oracle Exadata Storage Server and press Go. The resulting screen that comes up is similar to the picture shown below. Note the names have been erased.

This is a full rack Exadata so there are 14 cells, or storage servers named with a prefix _Cell01 through _Cell14. Clicking on each of the hyperlinked cell names takes you to the information screen of the respective cell (also known as Storage Server). Let’s click on cell#2. It brings up the cell details similar to a screen shown below:

Page 73: Exadata

This screen shows you three basic but very important metrics:

CPU Utilization %-age

Memory Utilization %-age

Temperature of the server

The screen gives only a glimpse of these three metrics for the last 24 hours or so. Note that memory utilization has been constant, which is pretty typical. Since this is a storage cell, the only software that runs here is Exadata Storage Server, not the database sessions which tend to fluctuate in memory usage. The temperature has been pretty steady but spiked a little at around 6AM. The CPU however may show a wide fluctuation because of the demand on the ES software as a result of cell offloading (described in detail inPart 1). If you click on CPU Utilization, you will see a dedicated screen for CPU utilization as shown below.

Page 74: Exadata

It shows a more detailed graph on the CPU busy-ness across a more granular scale. You can choose to get the refresh frequency of the metric by choosing from the drop-down menu View Data near the top right corner. Here I have chosen "Real Time: 5 Minute Refresh." It will get the data in the real time but refresh only every 5 minutes. That is why you will notice the graph is non-smooth. You can also choose a longer period such as “last 7 days” or even “last 31 days” but that information is no longer real time.

You can also choose the same information from Reports (described later in this installment). Clicking on Temperature and Memorygraphs will provide similar data.Cell Information

Since the cell contains hard drives, which are mechanical devices and generate a lot of heat, it will fail if the temperature is too high. The cells contain fans to force the hot air out and to reduce the temperature. There are 12 fans in each cell. Are they working? There are two power supplies in the cell for redundancy. Are both working? To know the answer to these questions and get other information on the cell, click on the hyperlink View Configuration to bring up a screen like the following.

The information shown here is pretty similar to what you would have seen in the list cell command in CellCLI. Let’s examine some of the interesting information on this screen and why you would like to know about them:

Status – of course you want to know if this cell is even online and servicing traffic.

Fan Count – number of working fans

Power Count – the number of working power supplies (there should be two)

SMTP Server – the email server that is used to send emails. Very important for alerts.

IP Address – IP of the cell

Interconnect Count – the number of cell interconnects. Default is three.

CPU Count – the number of CPU cores.

This is just a summary of the cell. The objective of the cell is to provide a storage platform for the Exadata Database Machine. So far, we have not seen anything about the actual storage in the cell. To know about the storage in the cells, look further down the page. There are five sections:

Grid disks

Page 75: Exadata

Cell disks

LUNs

Physical disks

I/O Resource Manager

Note the set of 5 hyperlinks toward the top of the screen that will direct you to the appropriate section: 

Instead of looking at the sections in the way they have been laid out, I suggest you look in the order they are created. For instance a cell contains physical disks, from which LUNs are created, from which cell disks are created, from which grid disks are created. Finally, IORM (I/O Resource Manager) is used on the disks. Let’s look at them in the same logical order.Physical Disks

To know about the physical disks available in the cell, this is the section to look at. Here is a partial screenshot.

If this looks familiar, it’s because this is formatted output from the CellCLI command LIST PHYSICALDISK DETAIL or LIST PHYSICALDISK ATTRIBUTES. The important columns here to note are:

LUN/s – the LUN which is created on this physical disk

Error Count – if there are any errors on these physical disks

Physical Interface – SAS for hard disks, ATA for flash disks. Some older Exadata Database Machines may have SATA hard disks and this column will show “sas”.

Size – The size in GB of the physical disks

Insert Time – the date and timestamp when these records were recorded in the repository.

LUNs

Next, let’s look at the LUNs in the cell by clicking on the Cell LUN Configuration hyperlink. 

Page 76: Exadata

Again, this is the graphical representation of the LIST LUN ATTRIBUTES command. Note that the flash drives

are also listed here, toward the bottom of the output. They are named with a prefix FD in the Cell Disk, unlike “CD” in case of regular SAS hard disks. LUNs are created from hard disk partitions, which are presented to the host (the storage cell) as a device. The device ID shows up here, along with the cell disks created on the LUN. The two columns that you should pay attention to are “Status” and “Error Count”.

If you see just 10 LUNs, click on the dropdown box and select all 28.Cell Disks

Cell disks are created from the physical disks. In CellCLI, you got the list by using the  LIST CELLDISK command. In Grid Control, clicking on the hyperlink Celldisk Configuration will give show the list of cell disks in a graphical manner, similar to one shown below. Like the previous section, click on the drop-down list to choose “All 28” to show all 28 cell disks on the same page. Again, you should pay attention to “Error Count” and “Status” columns.

Grid Disks

Grid Disks are created from the cell disks, and are used to build the ASM diskgroups. Obviously, these are the actual building blocks of the storage the ASM instance is aware of. Typically there is a one-to-one relationship between cell disk and grid disk, but if there is not one in your case, you may see less available space in your ASM diskgroup. Clicking on the Griddisk Configuration hyperlink brings up the screen similar to the following:

Page 77: Exadata

The key columns in this screen to look at are:

Status – if the grid disk is active. If inactive, it’s offline and ASM can’t use it.

Size – does the size sound right? Or was it created with less size from the cell disk? In that case the ASM instance will see that reduced size as well.

Error Count – have there been errors on this grid disk.

This brings us to the end of the basic building blocks of the storage cells, or storage servers. Using these graphical screens or the CellCLI commands you can get most of the information on these building blocks.

Now let’s get on to some of the more complex data needs: to gather the performance and management metrics on these components. For example, is the cell disk performing optimally, or are there any issues with the grid disk that may result in data loss?

Metrics

The best way to examine the metrics for these components is to explore them in the dedicated Metrics page in Grid Control. Note the set of hyperlinks toward the bottom of the page and click on All Metrics. It will bring up a screen as shown below.

Page 78: Exadata

Remember, metrics are gathered and stored in the OMS server and then presented as needed. Since they are stored in the OMS server, they affect the performance of the Exadata Database Machine little or none at all. The screen above shows the frequency these metrics are collected (15 minutes, 1 minute, etc.). Some of the metrics are real time (flash cache statistics for example); others do not have any schedule (such as error messages and alerts).  The collection frequency is displayed under the column “Collection Schedule”.

When the metrics are collected, they are uploaded to the OMS server’s database. The screen above also shows how soon the metrics are uploaded after collection (the column “Upload Interval”), along with the date and time the last upload occurred (“Last Upload”). 

Metrics can also be used to trigger alerts. For instance, if the cell offload efficiency falls below a certain threshold, you can trigger an alert. We will see how these alerts are set up later but the alerting threshold is shown in this screen as well, under the column “Threshold”.

Page 79: Exadata

The screen shows the metrics collected by Grid Control and the frequency of the collection. If you want, you can see all the sub categories and specific metric names by clicking on Expand All.Let’s focus on only one category – Category Statistics. Click on the “+” before the category to show the various metrics; you will see a screen similar to the following:

Click on Category Small IO Requests/Sec.

 

 Click on the eyeglass icon under the column “Details”.

Page 80: Exadata

 

Like all the other screens you saw earlier, you can choose a different refresh frequency by choosing the appropriate value from the drop-down menu “View Data”. Alternatively, you can choose historical records by selecting Last 24 hours, or Last 7 days, etc.

Reports

Reports are similar to screens reporting metrics but with some important differences. Some reports could be better for viewing certain metrics but the most valuable use of reports is to compare metrics across several components at a glance. 

From the main Enterprise Manager page, choose the Reports tab from the ribbon of tabs at the top. Click on the Report tab and scroll down several pages to a collection of reports named “Storage”, as shown below: 

This section shows all the metrics you might have seen earlier, but with a major difference: The reports are organized as independent entities, which can report the data on any component -- not driven from the details screen of the component itself. For instance, click on the hyperlink Celldisk Performance. It brings up a screen asking you to choose the cell disk you are interested in rather than choosing the cell disk screen first and then going to the performance metrics of that cell disk. You can choose a cell disk from the list by clicking on the flashlight icon (the list of values) and it brings up a screen as shown below. 

Page 81: Exadata

If you click on one of the hyperlinked cell disks Cell Disk, you will see the data for that cell disk.

Page 82: Exadata

Grid Disk Performance

As you learned in Part 2 of this series, the grid disks are built from the cell disks and then used for ASM diskgroups. Thus the performance of the grid disks will directly affect the performance of the ASM diskgroups, which in turn will affect database performance. So it’s important to keep a tab on the grid disk performance and examine it if you see any issues in the database. From the report menu shown earlier, go to Storage > CELL > Grid Disk Performance. It will ask you to pick a cell from the list. Choosing Cell 02 will bring up a screen similar to this.

The other important metric to examine is host interconnect performance:

Page 83: Exadata

Doing Comparisons

Sometimes you wonder how one component such as a cell disk is doing compared to other cell disks. The comparison is a very neat feature in the Grid Control. First get to the page of the metric you are interested in:

Click on the hyperlink just below Related Links shown as Compare Objects Celldisk Name/Cell Name/Realm Name.

Then click on the OK button. The comparison will appear in the screen similar to the one shown below:

Page 84: Exadata

The comparison shows that cell disk CD_00_cell01 has a higher write throughput compared to the other two.

Metrics on Realms

A realm is a set of components enclosed within a logical boundary. Realms are used to separate systems for different usage – for instance you may create a realm each for database A, database B, etc. By default there is a single realm. Since all the cells belong to the realm, metrics for a realm go across the cells and are not limited to the current cell alone.Let’s see an example. If you pull up the LUN Configuration report, you will see the following:

It shows 28 records since there are 28 LUNs in this cell. The scope of this report is the current cell, not the others. When you pull up the Realm LUN Configuration, it will show the LUNs of all the cells, as shown below.

Note the difference: 392 rows instead of 28. All the LUNs from all the cells have been shown here. 

Page 85: Exadata

Choosing realms for reports can allow you to get the picture for a specific realm immediately. If there is only one realm, then all the cells are pulled together in the report. Consider a very important report – Realm Performance, shown below.

Page 86: Exadata

It shows you the metrics across all the cells in that realm. This is an excellent way to quickly compare various cells to see if they are uniformly loaded.

On the same line of thought, you may want to compare performance of the components of the cells, not just the cells. Choosing Realm Celldisk Performance from the View Report dropdown menu shows the performance of the cell disks across all the cells, not just that cell.

Page 87: Exadata

However, with the sheer amount of data in the report (392 rows), it may not be as useful. A better option may be is to compare the Cell Disk #1 of all the cells. You can do that by using a filter in the Celldisk field.

This shows the performance of the first cell disks of all the cells. From the above output, it is clear that Cell 05 is seeing some action while the others are practically idle. Since you don’t operate at the cell level, it can be assumed that the data distribution may not be uniform. Cell #5 may happen to have the most number of storage blocks. 

ConclusionIn this installment you learned how to examine metrics from the storage cells using two different interfaces: the command-line tool CellCLI and Enterprise Manager Grid Control. Both have their own appeals. While the visual advantage of the Grid Control is undeniable, the command line tool may be useful in developing scripts and repeatable processes. The Exadata Database Machine comes with a lot of predefined alerts that can be

Page 88: Exadata

configured via thresholds to make sure they trigger when the metric values fall outside the pre-specified boundary conditions.

This brings us to the end of this series. I hope you developed enough skills in this series to make the transition from DBA to DMA easily and quickly!> Back to Series TOC

Arup Nanda ([email protected]) has been an Oracle DBA for more than 14 years, handling all aspects of database administration, from performance tuning to security and disaster recovery. He is an Oracle ACE Director and was Oracle Magazine's DBA of the Year in 2003.