Top Banner

of 220

N1 Grid Engine 6 Administration Guide

Apr 09, 2018

Download

Documents

arnuschky
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    1/220

    N1 Grid Engine 6 Administration

    Guide

    Sun Microsystems, Inc.4150 Network CircleSanta Clara, CA 95054U.S.A.

    Part No: 817567720May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    2/220

    Copyright 2005 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved.

    This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. Nopart of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any.Third-party software, including font technology, is copyrighted and licensed from Sun suppliers.

    Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S.and other countries, exclusively licensed through X/Open Company, Ltd.

    Sun, N1, Sun Microsystems, the Sun logo, docs.sun.com, AnswerBook, AnswerBook2, N1 and Solaris are trademarks or registered trademarks of SunMicrosystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks ofSPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by SunMicrosystems, Inc.

    The OPEN LOOK and Sun Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges thepioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds anon-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Suns licensees who implement OPEN LOOK GUIsand otherwise comply with Suns written license agreements.

    U.S. Government Rights Commercial software. Government users are subject to the Sun Microsystems, Inc. standard license agreement andapplicable provisions of the FAR and its supplements.

    DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, AREDISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.

    Copyright 2005 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. Tous droits rservs.

    Ce produit ou document est protg par un copyright et distribu avec des licences qui en restreignent lutilisation, la copie, la distribution, et ladcompilation. Aucune partie de ce produit ou document ne peut tre reproduite sous aucune forme, par quelque moyen que ce soit, sanslautorisation pralable et crite de Sun et de ses bailleurs de licence, sil y en a. Le logiciel dtenu par des tiers, et qui comprend la technologie relativeaux polices de caractres, est protg par un copyright et licenci par des fournisseurs de Sun.

    Des parties de ce produit pourront tre drives du systme Berkeley BSD licencis par lUniversit de Californie. UNIX est une marque dpose auxEtats-Unis et dans dautres pays et licencie exclusivement par X/Open Company, Ltd.

    Sun, N1, Sun Microsystems, le logo Sun, docs.sun.com, AnswerBook, AnswerBook2, et Solaris sont des mardques de fabrique ou des marquesdposes, de Sun Microsystems, Inc. aux Etats-Unis et dans dautres pays. Toutes les marques SPARC sont utilises sous licence et sont des marquesde fabrique ou des marques dposes de SPARC International, Inc. aux Etats-Unis et dans dautres pays. Les produits portant les marques SPARC sont bass sur une architecture dveloppe par Sun Microsystems, Inc.

    Linterface dutilisation graphique OPEN LOOK et Sun a t dveloppe par Sun Microsystems, Inc. pour ses utilisateurs et licencis. Sun reconnatles efforts de pionniers de Xerox pour la recherche et le dveloppement du concept des interfaces dutilisation visuelle ou graphique pour lindustriede linformatique. Sun dtient une licence non exclusive de Xerox sur linterface dutilisation graphique Xerox, cette licence couvrant galement leslicencis de Sun qui mettent en place linterface dutilisation graphique OPEN LOOK et qui en outre se conforment aux licences crites de Sun.

    CETTE PUBLICATION EST FOURNIE EN LETAT ET AUCUNE GARANTIE, EXPRESSE OU IMPLICITE, NEST ACCORDEE, Y COMPRIS DESGARANTIES CONCERNANT LA VALEUR MARCHANDE, LAPTITUDE DE LA PUBLICATION A REPONDRE A UNE UTILISATIONPARTICULIERE, OU LE FAIT QUELLE NE SOIT PAS CONTREFAISANTE DE PRODUIT DE TIERS. CE DENI DE GARANTIE NESAPPLIQUERAIT PAS, DANS LA MESURE OU IL SERAIT TENU JURIDIQUEMENT NUL ET NON AVENU.

    050701@12762

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    3/220

    Contents

    Preface 15

    1 Conguring Hosts and Clusters 19About Hosts and Daemons 20Changing the Master Host 21Conguring Shadow Master Hosts 21

    Shadow Master Host Requirements 22Shadow Master Hosts File 22Starting Shadow Master Hosts 23Conguring Shadow Master Hosts Environment Variables 23

    Conguring Hosts 24

    Conguring Execution Hosts With QMON 24Conguring Execution Hosts From the Command Line 30Conguring Administration Hosts With QMON 31Conguring Administration Hosts From the Command Line 32Conguring Submit Hosts With QMON 32Conguring Submit Hosts From the Command Line 34Conguring Host Groups With QMON 34Conguring Host Groups From the Command Line 36Monitoring Execution Hosts With qhost 37Invalid Host Names 38Killing Daemons From the Command Line 38

    Restarting Daemons From the Command Line 39Basic Cluster Conguration 40

    Displaying a Cluster Conguration With QMON 40Displaying the Global Cluster Conguration With QMON 41

    3

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    4/220

    Adding and Modifying Global and Host Con gurations With QMON 41Deleting a Cluster Con guration With QMON 42

    Displaying the Basic Cluster Con gurations From the Command Line 43Modifying the Basic Cluster Con gurations From the Command Line 43

    2 Con guring Queues and Queue Calendars 45Con guring Queues 45

    Con guring Queues With QMON 47Con guring General Parameters 49Con guring Execution Method Parameters 50Con guring the Checkpointing Parameters 51Con guring Parallel Environments 52

    Con guring Load and Suspend Thresholds 53Con guring Limits 55Con guring Complex Resource Attributes 56Con guring Subordinate Queues 57Con guring User Access Parameters 58Con guring Project Access Parameters 59Con guring Owners Parameters 60Con guring Queues From the Command Line 61

    Con guring Queue Calendars 63Con guring Queue Calendars With QMON 63Con guring Queue Calendars From the Command Line 65

    3 Con guring Complex Resource Attributes 67Complex Resource Attributes 67

    Con guring Complex Resource Attributes With QMON 68Assigning Resource Attributes to Queues, Hosts, and the Global Cluster 70Consumable Resources 74Con guring Complex Resource Attributes From the Command Line 86

    Load Parameters 87Default Load Parameters 87Adding Site-Speci c Load Parameters 87

    Writing Your Own Load Sensors 88

    4 Managing User Access 93Setting Up a User 94

    4 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    5/220

    Con guring User Access 95Con guring Manager Accounts 95

    Con guring Operator Accounts 97Con guring User Access Lists 98Con guring Users 101

    Dening Projects 103Dening Projects With QMON 104Dening Projects From the Command Line 106

    Using Path Aliasing 106Format of Path-Aliasing Files 107How Path-Aliasing Files Are Interpreted 108

    Con guring Default Requests 108Format of Default Request Files 109

    5 Managing Policies and the Scheduler 111Administering the Scheduler 111

    About Scheduling 112Scheduling Strategies 112Con guring the Scheduler 120Changing the Scheduler Con guration With QMON 123

    Administering Policies 127Con guring Policy-Based Resource Management With QMON 127Specifying Policy Priority 128

    Con guring the Urgency Policy 129Con guring Ticket-Based Policies 130Con guring the Share-Based Policy 135

    M How to Create Project-Based Share-Tree Scheduling 144Con guring the Functional Policy 147

    M How to Create User-Based, Project-Based, and Department-BasedFunctional Scheduling 150

    Con guring the Override Policy 151

    6 Managing Special Environments 155Con guring Parallel Environments 155

    Con guring Parallel Environments With QMON 156Con guring Parallel Environments From the Command Line 161Parallel Environment Startup Procedure 162Termination of the Parallel Environment 163

    5

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    6/220

    Tight Integration of Parallel Environments and Grid Engine Software 164Con guring Checkpointing Environments 165

    About Checkpointing Environments 166Con guring Checkpointing Environments With QMON 166Con guring Checkpointing Environments From the Command Line 168

    7 Other Administrative Tasks 171Gathering Accounting and Reporting Statistics 171

    Report Statistics (ARCo) 171Accounting and Usage Statistics ( qacct ) 177

    Backing Up the Grid Engine System Con guration 178Using Files and Scripts for Administration Tasks 179

    Using Files to Add or Modify Objects 179Using Files to Modify Queues, Hosts, and Environments 180Using Files to Modify a Global Con guration or the Scheduler 184

    8 Fine Tuning, Error Messages, and Troubleshooting 187Fine-Tuning Your Grid Environment 187

    Scheduler Monitoring 187Finished Jobs 188 Job Validation 188Load Thresholds and Suspend Thresholds 188Load Adjustments 189Immediate Scheduling 189Urgency Policy and Resource Reservation 189

    How the Grid Engine Software Retrieves Error Reports 190Consequences of Different Error or Exit Codes 191Running Grid Engine System Programs in Debug Mode 193

    Diagnosing Problems 195Pending Jobs Not Being Dispatched 195 Job or Queue Reported in Error State E 196

    Troubleshooting Common Problems 197

    9 Con guring DBWriter 203Setup 203

    Database System 203Database Server 204

    6 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    7/220

    Base Directory for Reporting Files 204Con guration 204

    Interval 204Pid 204PidCmd 204Continuous Mode 205Debug Level 205Reporting File 205Calculation of Derived Values 206

    Index 209

    7

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    8/220

    8 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    9/220

    Tables

    TABLE 81 Job-Related Error or Exit Codes 191TABLE 82 Parallel-Environment-Related Error or Exit Codes 191TABLE 83 Queue-Related Error or Exit Codes 192TABLE 84 Checkpointing-Related Error or Exit Codes 192

    9

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    10/220

    10 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    11/220

    Figures

    FIGURE 1 1 Execution Host Tab 25

    FIGURE 1 2 Attribute Selection Dialog Box 28FIGURE 1 3 Administration Host Tab 31FIGURE 1 4 Submit Host Tab 33FIGURE 1 5 Host Groups Tab 35FIGURE 1 6 Cluster Con guration Dialog Box 40FIGURE 2 1 Queue Con guration General Con guration Tab 48FIGURE 3 1 Complex Con guration Dialog Box 69FIGURE 3 2 Complex Con guration Dialog Box: virtual_free 76FIGURE 3 3 Add/Modify Exec Host: virtual_free 76FIGURE 4 1 Userset Tab 99FIGURE 4 2 Access List De nition Dialog Box 99FIGURE 4 3 Project Con guration Dialog Box 104FIGURE 5 1 Policy Con guration Dialog Box 128

    11

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    12/220

    12 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    13/220

    Examples

    EXAMPLE 1 1 Sample qhost Output 38

    EXAMPLE 3 1 qconf -sc Sample Output 86EXAMPLE 3 2 Load Sensor Bourne Shell Script 88EXAMPLE 4 1 Example of Path-Aliasing File 108EXAMPLE 4 2 Example of Default Request File 109EXAMPLE 5 1 Functional Policy Example 133EXAMPLE 5 2 ExampleA 143EXAMPLE 5 3 Example B 143EXAMPLE 7 1 Modifying the Migration Command of a Checkpoint Environment

    180EXAMPLE 7 2 Changing the Queue Type 182EXAMPLE 7 3 Modifying the Queue Type and the Shell Start Behavior 182EXAMPLE 7 4 Adding Resource Attributes 182EXAMPLE 7 5 Attaching a Resource Attribute to a Host 182EXAMPLE 7 6 Changing a Resource Value 182EXAMPLE 7 7 Deleting a Resource Attribute 182EXAMPLE 7 8 Adding a Queue to the List of Queues for a Checkpointing Environment

    182EXAMPLE 7 9 Changing the Number of Slots in a Parallel Environment 183EXAMPLE 7 10 Listing Queues 183EXAMPLE 7 11 Using qselect in qconf Commands 183EXAMPLE 7 12 Modifying the Schedule Interval 184

    13

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    14/220

    14 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    15/220

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    16/220

    I Chapter 5 provides full background information about the types of user policiesthatare available. The chapter provides instructions on how to match these policies tothe computing environment. Chapter 5 also describes how to con gure and modifythe scheduler.

    I Chapter 6 describes how the grid engine system ts in with parallel environments,and provides detailed instructions on how to con gure them. The chapter alsodescribes how to set up and use checkpointing environments.

    I Chapter 7 describes how to gather reporting and accounting statistics, how toautomatically back up your grid engine system con guration les, and how to useles and scripts to add or modify objects such as queues, hosts, and environments.

    I Chapter 8 describes some ways to ne-tune your grid engine system. It alsoexplains how the grid engine system retrieves error message and describes how torun the software in debug mode.

    I Chapter 9, DBWriter describes how you can modify the DBWriter portion of the

    ARCo feature.

    Note Some of the material in this guide appeared originally in the How-To sectionof the Sun Grid Engine project web site. Updated frequently, this web site is of specialvalue to administrators of the grid engine system and is well worth consulting.

    Related BooksOther books in the N1 Grid Engine 6 software documentation collection include:I N1 Grid Engine 6 Users GuideI N1 Grid Engine 6 Installation GuideI N1 Grid Engine 6 Release Notes

    Accessing Sun Documentation OnlineThe docs.sun.com

    SM

    Web site enables you to access Sun technical documentationonline. You can browse the docs.sun.com archive or search for a speci c book title orsubject. The URL is http://docs.sun.com .

    16 N1 Grid Engine 6Administration Guide May 2005

    http://docs.sun.com/http://docs.sun.com/http://docs.sun.com/
  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    17/220

    Ordering Sun DocumentationSun Microsystems offers select product documentation in print. For a list ofdocuments and how to order them, see Buy printed documentation athttp://docs.sun.com .

    Typographic ConventionsThe following table describes the typographic changes that are used in this book.

    TABLE P 1 Typographic Conventions

    Typeface or Symbol Meaning Example

    AaBbCc123 The names of commands, les, anddirectories, and on-screen computeroutput

    Edit your .login le.

    Use ls -a to list all les.

    machine_name% you havemail.

    AaBbCc123 What you type, contrasted with onscreencomputer output

    machine_name% su

    Password:

    AaBbCc123 Command-line placeholder: replace witha real name or value

    The command to remove a leis rm lename.

    AaBbCc123 Book titles, new terms or terms to beemphasized

    Read Chapter 6 in the UsersGuide.

    Do not save the le.

    Shell Prompts in Command ExamplesThe following table shows the default system prompt and superuser prompt for the Cshell, Bourne shell, and Korn shell.

    17

    http://docs.sun.com/http://docs.sun.com/http://docs.sun.com/
  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    18/220

    TABLE P 2 Shell Prompts

    Shell Prompt

    C shell prompt machine_name%

    C shell superuser prompt machine_name#

    Bourne shell and Korn shell prompt $

    Bourne shell and Korn shell superuser prompt #

    18 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    19/220

    CHAPTER 1

    Con guring Hosts and Clusters

    This chapter provides background information about con guring various aspects of

    the grid engine system. This chapter includes instructions for the following tasks:I Changing the Master Host on page 21I Con guring Shadow Master Hosts on page 21I Con guring Execution Hosts With QMON on page 24I Con guring Execution Hosts From the Command Line on page 30I Con guring Administration Hosts With QMON on page 31I Con guring Administration Hosts From the Command Line on page 32I Con guring Submit Hosts With QMON on page 32I Con guring Submit Hosts From the Command Line on page 34I Con guring Host Groups With QMON on page 34I Con guring Host Groups From the Command Line on page 36I Monitoring Execution Hosts With qhost on page 37I Killing Daemons From the Command Line on page 38I Restarting Daemons From the Command Line on page 39I Displaying a Cluster Con guration With QMON on page 40I Displaying the Global Cluster Con guration With QMON on page 41I Adding and Modifying Global and Host Con gurations With QMON on page 41I Deleting a Cluster Con guration With QMON on page 42I Displaying the Basic Cluster Con gurations From the Command Line on page

    43I Modifying the Basic Cluster Con gurations From the Command Line on page

    43

    19

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    20/220

    About Hosts and DaemonsGrid engine system hosts are classi ed into four groups, depending on whichdaemons are running on the system and on how the hosts are registered atsge_qmaster .I Master host. The master host is central for the overall cluster activity. The master

    host runs the master daemon sge_qmaster . sge_qmaster controls all gridengine system components such as queues and jobs. It also maintains tables aboutthe status of the components, about user access permissions and the like. Themaster host usually runs the scheduler sge_schedd . The master host requires nofurther con guration other than that performed by the installation procedure.For information about how to initially set up the master host, see How to Install

    the Master Host in N1 Grid Engine 6 Installation Guide. For information about howto con gure dynamic changes to the master host, see Con guring Shadow MasterHosts on page 21 .

    I Execution hosts. Execution hosts are nodes that have permission to run jobs.Therefore they host queue instances, and they run the execution daemonsge_execd . An execution host is initially set up by the installation procedure, asdescribed in How to Install Execution Hosts in N1 Grid Engine 6 InstallationGuide.

    I Administration hosts. Permission can be given to hosts other than the master hostto carry out any kind of administrative activity. Administrative hosts are set upwith the following command:qconf -ah hostname

    See the qconf (1) man page for details.I Submit hosts. Submit hosts allow for submitting and controlling batch jobs only. In

    particular, a user who is logged into a submit host can use qsub to submit jobs, canuse qstat to control the job status, or can run the graphical user interface QMON.Submit hosts are set up using the following command:qconf -as hostnameSee the qconf (1) man page for details.

    Note A host can belong to more than one class. The master host is by default anadministration host and a submit host.

    20 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    21/220

    Changing the Master HostBecause the spooling database cannot be located on an NFS-mounted le system, thefollowing procedure requires that the Berkeley DB RPC server be used for spooling.

    If you con gure spooling to a local le system, you must transfer the spoolingdatabase to a local le system on the new sge_qmaster host.

    To change the master host, do the following:

    1. On the current master host, stop the master daemon and the scheduler daemon bytyping the following command:qconf -ks -km

    2. Edit the sge-root/ cell/common/act_qmaster le according to the followingguidelines:

    a. In the act_qmaster le, replace the current host name with the new masterhost s name.This name should be the same as the name returned by the gethostnameutility. To get that name, type the following command on the new master host:sge-root/utilbin/$ARCH/gethostname

    b. Replace the old name in the act_qmaster le with the name returned by thegethostname utility.

    3. On the new master host, run the following script:sge-root/ cell/common/sge5

    This starts up sge_qmaster and sge_schedd on the new master host.

    Con guring Shadow Master HostsShadow master hosts are machines in the cluster that can detect a failure of the masterdaemon and take over its role as master host. When the shadow master daemondetects that the master daemon sge_qmaster has failed abnormally, it starts up anew sge_qmaster on the host where the shadow master daemon is running.

    Chapter 1 Conguring Hosts and Clusters 21

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    22/220

    Note If the master daemon is shut down gracefully, the shadow master daemon does

    not start up. If you want the shadow master daemon to take over after you shut downthe master daemon gracefully, remove the lock le that is located in the sge_qmasterspool directory. The default location of this spool directory issge-root/ cell/spool/qmaster .

    The automatic failover start of a sge_qmaster on a shadow master host takesapproximately one minute. Meanwhile, you get an error message whenever a gridengine system command is run.

    Note The le sge-root/ cell/common/act_qmaster contains the name of the hostactually running the sge_qmaster daemon.

    Shadow Master Host RequirementsTo prepare a host as a shadow master, the following requirements must be met:I The shadow master host must run sge_shadowd .I The shadow master host must share sge_qmaster s status information, job

    conguration, and queue con guration logged to disk. In particular, a shadowmaster host needs read/write root access to the master host s spool directory andto the directory sge-root/ cell/common .

    I Either the Berkeley DB RPC server or classic grid engine system spooling must beused for sge_qmaster spooling. For more information, see Database Server andSpooling Host in N1 Grid Engine 6 Installation Guide.

    I The shadow-master-hostname le must contain a line that de nes the host asshadow master host.

    As soon as these requirements are met, the shadow-master-host facility is activated forthis host. No restart of grid engine system daemons is necessary to activate the feature.

    Shadow Master Hosts FileThe shadow master host name le, sge-root/ cell/common/shadow_masters , contains

    the following:I The name of the primary master host, which is the machine where the master

    daemon sge_qmaster initially runsI The names of the shadow master hosts

    22 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    23/220

    The format of the shadow master hostname le is as follows:I The rst line of the le de nes the primary master hostI The following lines de ne the shadow master hosts, one host per line

    The order of the shadow master hosts is signi cant. The primary master host is therst line in the le. If the primary master host fails to proceed, the shadow masterde ned in the second line takes over. If this shadow master also fails, the shadowmaster de ned in the third line takes over, and so forth.

    Starting Shadow Master HostsIn order to start a shadow sge_qmaster , the system must be sure either that the oldsge_qmaster has terminated, or that it will terminate without performing actionsthat interfere with the newly-started shadow sge_qmaster .

    In very rare circumstances it might be impossible to determine that the oldsge_qmaster has terminated or that it will terminate. In such cases, an error messageis logged to the messages log le of the sge_shadowd s on the shadow master hosts.See Chapter 8 . Also, any attempts to open a tcp connection to a sge_qmasterdaemon permanently fail. If this occurs, make sure that no master daemon is running,and then restart sge_qmaster manually on any of the shadow master machines. SeeRestarting Daemons From the Command Line on page 39 .

    Con guring Shadow Master Hosts EnvironmentVariablesThere are three environment variables which affect the takeover time for a shadowmaster:I SGE_DELAY_TIME - This variable controls the interval in which sge_shadowd

    pauses if a takeover bid fails. This value is used only when there are multiplesge_shadowd instances and they are contending to be the master. (the default is600 seconds.)

    I SGE_CHECK_INTERVAL - This variable controls the interval in which thesge_shadowd checks the heartbeat le (60 seconds by default.)

    I SGE_GET_ACTIVE_INTERVAL - This variable controls the interval when asge_shadowd instance tries to take over when the heartbeat le has not changed.

    These variables interact in the following way.

    1. The master host updates the heartbeat le every 30 seconds.2. The sge_shadowd daemon checks for changes to heartbeat le every number of

    seconds de ned by the SGE_CHECK_INTERVAL variable. So, this value must begreater than 30 seconds.

    Chapter 1 Con guring Hosts and Clusters 23

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    24/220

    3. If the sge_shadowd daemon notices that the heartbeat le has been updatedupdated, it starts waiting again until it is once more time to check the heartbeat le.

    4. If the sge_shadowd daemon notices that the heartbeat le has not been updated,it waits for number of seconds de ned by the SGE_CHECK_INTERVAL variable toexpire. This step lets you make sure that the sge_shadowd daemon is not tooagressive in trying to takeover and allows the master host some leeway inupdating the heartbeat le.

    5. When the SGE_GET_ACTIVE_INTERVAL has expired, sge_shadowd daemontakes over if heartbeat le is still not updated.

    A reasonable con guration might be to set the SGE_CHECK_INTERVAL to be 45seconds and the SGE_GET_ACTIVE_INTERVAL to be 90 seconds. So, after about 2minutes, the take over will occur. If you want to check the operation of the shadowhost after you have con gured these environment variables you will have to pull outthe master host s network cable to simulate a failure.

    Con guring HostsN1 Grid Engine 6 software ( grid engine software) maintains object lists for all types ofhosts except for the master host. The lists of administration host objects and submithost objects indicate whether a host has administrative or submit permission. Theexecution host objects include other parameters. Among these parameters are the loadinformation that is reported by the sge_execd running on the host, and the loadparameter scaling factors that are de ned by the administrator.

    You can con gure host objects with QMONor from the command line.

    QMONprovides a set of host con guration dialog boxes that are invoked by clicking theHost Con guration button on the QMONMain Control window. The HostCon guration dialog box has four tabs:I Administration Host tab. See Figure 1 3.I Submit Host tab. See Figure 1 4.I Host Groups tab. See Figure 1 5.I Execution Host tab. See Figure 1 1.

    The qconf command provides the command-line interface for managing host objects.

    Con guring Execution Hosts With QMONBefore you con gure an execution host, you must rst install the software on theexecution host as described in How to Install Execution Hosts in N1 Grid Engine 6Installation Guide.

    24 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    25/220

    To con gure execution hosts, on the QMONMain Control window click the HostCon guration button, and then click the Execution Host tab. The Execution Host tablooks like the following gure:

    FIGURE 1 1 Execution Host Tab

    Note Administrative or submit commands are allowed from execution hosts only ifthe execution hosts are also declared to be administration or submit hosts. SeeCon guring Administration Hosts With QMON on page 31 and Con guring SubmitHosts With QMON on page 32 .

    The Hosts list displays the execution hosts that are already de ned.

    The Load Scaling list displays the currently con gured load-scaling factors for theselected execution host. See Load Parameters on page 87 for information aboutload parameters.

    The Access Attributes list displays access permissions. See Chapter 4 for informationabout access permissions.

    Chapter 1 Con guring Hosts and Clusters 25

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    26/220

    The Consumables/Fixed Attributes list displays resource availability for consumableand xed resource attributes associated with the host. See Complex ResourceAttributes on page 67 for information about resource attributes.

    The Reporting Variables list displays the variables that are written to the reporting lewhen a load report is received from an execution host. See Dening ReportingVariables on page 29 for information about reporting variables.

    The Usage Scaling list displays the current scaling factors for the individual usagemetrics CPU, memory, and I/O for different machines. Resource usage is reported bysge_execd periodically for each currently running job. The scaling factors indicatethe relative cost of resource usage on the particular machine for the user or projectrunning a job. These factors could be used, for instance, to compare the cost of asecond of CPU time on a 400 MHz processor to that of a 600 MHz CPU. Metrics thatare not displayed in the Usage Scaling window have a scaling factor of 1.

    Adding or Modifying an Execution HostTo add or modify an execution host, click Add or Modify. The Add/Modify Exec Hostdialog box appears.

    The Add/Modify Exec Host dialog box enables you to modify all attributes associatedwith an execution host. The name of an existing execution host is displayed in theHost eld.

    If you are adding a new execution host, type its name in the Host eld.

    De ning Scaling FactorsTo de ne scaling factors, click the Scaling tab.

    26 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    27/220

    The Load column of the Load Scaling table lists all available load parameters, and theScale Factor column lists the corresponding de nitions of the scaling. You can edit theScale Factor column. Valid scaling factors are positive oating-point numbers inxed-point notation or scienti c notation.

    The Usage column of the Usage Scaling table lists the current scaling factors for theusage metrics CPU, memory, and I/O. The Scale Factor column lists the correspondingde nitions of the scaling. You can edit the Scale Factor column. Valid scaling factorsare positive oating-point numbers in xed-point notation or scienti c notation.

    De ning Resource AttributesTo de ne the resource attributes to associate with the host , click theConsumables/Fixed Attributes tab.

    The resource attributes associated with the host are listed in the Consumables/FixedAttributes table.

    Use the Complex Con guration dialog box if you need more information about thecurrent complex con guration, or if you want to modify it. For details about complexresource attributes, see Complex Resource Attributes on page 67 .

    The Consumables/Fixed Attributes table lists all resource attributes for which a valueis currently de ned. You can enhance the list by clicking either the Name or the Valuecolumn name. The Attribute Selection dialog box appears, which includes all resourceattributes that are de ned in the complex.

    Chapter 1 Con guring Hosts and Clusters 27

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    28/220

    FIGURE 1 2 Attribute Selection Dialog Box

    To add an attribute to the Consumables/Fixed Attributes table, select the attribute,

    and then click OK.To modify an attribute value, double-click a Value eld, and then type a value.

    To delete an attribute, select the attribute, and then press Control-D or click mouse button 3. Click OK to con rm that you want to delete the attribute.

    De ning Access PermissionsTo de ne user access permissions to the execution host based on previouslycongured user access lists, click the User Access tab.

    To de ne project access permissions to the execution host based on previouslycongured projects, click the Project Access tab.

    28 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    29/220

    De ning Reporting VariablesTo de ne reporting variables, click the Reporting Variables tab.

    The Available list displays all the variables that can be written to the reporting lewhen a load report is received from the execution host.

    Select a reporting variable from the Available list, and then click the red right arrow toadd the selected variable to the Selected list.

    To remove a reporting variable from the Selected list, select the variable, and then clickthe left red arrow.

    Deleting an Execution HostTo delete an execution host, on the QMONMain Control window click the HostCon guration button, and then click the Execution Host tab.

    Chapter 1 Con guring Hosts and Clusters 29

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    30/220

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    31/220

    execution hosts, for example, in cron jobs, as the -Me option requires no manualinteraction.

    I qconf -se hostnameThe -se option (show execution host) shows the con guration of the speci edexecution host as de ned in host_conf .

    I qconf -sel

    The -sel option (show execution host list) displays a list of hosts that arecongured as execution hosts.

    Con guring Administration Hosts With QMONOn the QMONMain Control window, click the Host Con guration button. The HostCon guration dialog box appears, displaying the Administration Host tab. TheAdministration Host tab looks like the following gure:

    FIGURE 1 3 Administration Host Tab

    Chapter 1 Con guring Hosts and Clusters 31

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    32/220

    Note The Administration Host tab is displayed by default when you click the Host

    Con guration button for the rst time.

    Use the Administration Host tab to con gure hosts on which administrativecommands are allowed. The Host list displays the hosts that already haveadministrative permission.

    Adding an Administration HostTo add a new administration host, type its name in the Host eld, and then click Add,or press the Return key.

    Deleting an Administration HostTo delete an administration host from the list, select the host, and then click Delete.

    Con guring Administration Hosts From theCommand LineTo con gure administration hosts from the command line, type the followingcommand with appropriate arguments:

    % qconf arguments

    Arguments to the qconf command and their consequences are as follows:I qconf -ah hostname

    The -ah option (add administration host) adds the speci ed host to the list ofadministration hosts.

    I qconf -dh hostnameThe -dh option (delete administration host) deletes the speci ed host from the listof administration hosts.

    I qconf -sh

    The -sh option (show administration hosts) displays a list of all currentlycongured administration hosts.

    Con guring Submit Hosts With QMONTo con gure submit hosts, on the QMONMain Control window click the HostCon guration button, and then click the Submit Host tab. The Submit Host tab isshown in the following gure.

    32 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    33/220

    FIGURE 1 4 Submit Host Tab

    Use the Submit Host tab to declare the hosts from which jobs can be submitted,monitored, and controlled. The Host list displays the hosts that already have submitpermission.

    No administrative commands are allowed from submit hosts unless the hosts are alsodeclared to be administration hosts. See Con guring Administration Hosts WithQMON on page 31 for more information.

    Adding a Submit HostTo add a submit host, type its name in the Host eld, and then click Add, or press theReturn key.

    Deleting a Submit HostTo delete a submit host, select it, and then click Delete.

    Chapter 1 Con guring Hosts and Clusters 33

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    34/220

    Con guring Submit Hosts From the Command

    LineTo con gure submit hosts from the command line, type the following command withappropriate arguments:

    % qconf arguments

    The following options are available:I qconf -as hostname

    The -as option (add submit host) adds the speci ed host to the list of submithosts.

    I qconf -ds hostnameThe -ds option (delete submit host) deletes the speci ed host from the list of

    submit hosts.I qconf -ss

    The -ss option (show submit hosts) displays a list of the names of all currentlycongured submit hosts.

    Con guring Host Groups With QMONHost groups enable you to use a single name to refer to multiple hosts. You can groupsimilar hosts together in a host group. A host group can include other host groups aswell as multiple individual hosts. Host groups that are members of another host groupare subgroups of that host group.

    For example, you might de ne a host group called @bigMachines . This host groupincludes the following members:

    @solaris64@solaris32fangornbalrog

    The initial @ sign indicates that the name is a host group. The host group@bigMachines includes all hosts that are members of the two subgroups@solaris64 and @solaris32 . @bigMachines also includes two individual hosts,fangorn and balrog .

    On the QMONMain Control window, click the Host Con guration button. The HostCon guration dialog box appears.

    Click the Host Groups tab. The Host Groups tab looks like the following gure.

    34 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    35/220

    FIGURE 1 5 Host Groups Tab

    Use the Host Groups tab to con gure host groups. The Hostgroup list displays thecurrently con gured host groups. The Members list displays all the hosts that aremembers of the selected host group.

    Adding or Modifying a Host GroupTo add a host group, click Add. To Modify a host group, click Modify. TheAdd/Modify Host Group dialog box appears.

    Chapter 1 Con guring Hosts and Clusters 35

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    36/220

    If you are adding a new host group, type a host group name in the Hostgroup eld.The host group name must begin with an @ sign.

    If you are modifying an existing host group, the host group name is provided in theHostgroup eld.

    To add a host to the host group that you are con guring, type the host name in theHost eld, and then click the red arrow to add the name to the Members list. To add ahost group as a subgroup, select a host group name from the De ned Host Groups list,and then click the red arrow to add the name to the Members list.

    To remove a host or a host group from the Members list, select it, and then click thetrash icon.

    Click Ok to save your changes and close the dialog box. Click Cancel to close thedialog box without saving your changes.

    Deleting a Host GroupTo delete a host group, select it from the Hostgroup list, and then click Delete.

    Con guring Host Groups From the CommandLineTo con gure host groups from the command line, type the following command withappropriate options:

    % qconf options

    36 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    37/220

    The following options are available:I qconf -ahgrp [ host-group-name]

    The -ahgrp option (add host group) adds a new host group to the list of hostgroups. See the hostgroup (5) man page for a detailed description of theconguration format.

    I qconf -Ahgrp [ lename]The -Ahgrp option (add host group from le) displays an editor containing a hostgroup con guration de ned in lename. The editor is either the default vi editor oran editor corresponding to the EDITOR environment variable. The host group iscongured by changing the con guration and saving to disk.

    I qconf -dhgrp host-group-nameThe -dhgrp option (delete host group) deletes the speci ed host group from thelist of host groups. All entries in the host group con guration are lost.

    I qconf -mhgrp host-group-nameThe -mhgrp option (modify host group) displays an editor containing theconguration of the speci ed host group as template. The editor is either thedefault vi editor or an editor corresponding to the EDITOR environment variable.The host group con guration is modi ed by changing the template and saving todisk.

    I qconf -Mhgrp lenameThe -Mhgrp option (modify host group from le) uses the content of lename ashost group con guration template. The con guration in the speci ed le mustrefer to an existing host group. The con guration of this host group is replaced bythe le content.

    I qconf -shgrp host-group-nameThe -shgrp option (show host group) shows the con guration of the speci edhost group.

    I qconf -shgrp_tree host-group-nameThe -shgrp_tree option (show host group as tree) shows the con guration of thespeci ed host group and its sub-hostgroups as a tree.

    I qconf -shgrp_resolved host-group-nameThe -shgrp_resolved option (show host group with resolved host list) showsthe con guration of the speci ed host group with a resolved host list.

    I qconf -shgrplThe -shgrpl option (show host group list) displays a list of all host groups.

    Monitoring Execution Hosts With qhostUse the qhost command to retrieve a quick overview of the execution host status:

    % qhost

    Chapter 1 Con guring Hosts and Clusters 37

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    38/220

    This command produces output that is similar to the following example:

    EXAMPLE 1 1 Sample qhost OutputHOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS-------------------------------------------------------------------------------global - - - - - - -arwen aix43 1 - - - - -baumbart irix65 2 0.00 1.1G 91.5M 128.0M 0.0boromir hp11 1 - 128.0M - 256.0M -carc lx24-amd64 2 0.00 3.8G 989.8M 1.0G 0.0denethor aix51 1 4.54G - - - -durin lx24-x86 1 0.37 123.1M 46.5M 213.6M 26.6Meomer sol-sparc64 1 0.13 256.0M 248.0M 513.0M 93.0Mlolek tru64 1 0.02 1.0G 790.0M 1.0G 8.0Kmungo lx22-alpha 1 1.00 248.9M 78.8M 129.8M 2.5Mnori sol-x86 2 0.38 1023.0M 372.0M 512.0M 37.0Mpippin darwin 1 0.00 640.0M 264.0M 0.0 0.0smeagol hp11 1 0.35 512.0M 425.0M 1.0G 95.0M

    See the qhost (1) man page for a description of the output format and for moreoptions.

    Invalid Host NamesThe following is a list of host names that are invalid, reserved, or otherwise notallowed to be used:

    globaltemplatealldefaultunknownnone

    Killing Daemons From the Command LineTo kill grid engine system daemons from the command line, use one of the followingcommands:

    % qconf -ke[j] { hostname, ... | all}% qconf -ks% qconf -km

    38 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    39/220

    You must have manager or operator privileges to use these commands. See Chapter 4for more information about manager and operator privileges.

    I The qconf ke command shuts down the execution daemons. However, it doesnot cancel active jobs. Jobs that nish while no sge_execd is running on a systemare not reported to sge_qmaster until sge_execd is restarted. The job reportsare not lost, however.The qconf -kej command kills all currently active jobs and brings down allexecution daemons.Use a comma-separated list of the execution hosts you want to shut down, orspecify all to shut down all execution hosts in the cluster.

    I The qconf -ks command shuts down the scheduler sge_schedd .I The qconf -km command forces the sge_qmaster process to terminate.

    If you want to wait for any active jobs to nish before you run the shutdown

    procedure, use the qmod -dq command for each cluster queue, queue instance, orqueue domain before you run the qconf sequence described earlier. For informationabout cluster queues, queue instances, and queue domains, see Con guring Queues on page 45 .

    % qmod -dq { cluster-queue | queue-instance | queue-domain}

    The qmod -dq command prevents new jobs from being scheduled to the disabledqueue instances. You should then wait until no jobs are running in the queue instances before you kill the daemons.

    Restarting Daemons From the Command LineLog in as root on the machine on which you want to restart grid engine systemdaemons.

    Type the following commands to run the startup scripts:

    % sge-root/ cell/common/sgemaster% sge-root/ cell/common/sgeexecd

    These scripts looks for the daemons normally running on this host and then start thecorresponding ones.

    Chapter 1 Con guring Hosts and Clusters 39

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    40/220

    Basic Cluster Con gurationThe basic cluster con guration is a set of information that is con gured to re ect sitedependencies and to in uence grid engine system behavior. Site dependencies includevalid paths for programs such as mail or xterm . A global con guration is providedfor the master host as well as for every host in the grid engine system pool. Inaddition, you can con gure the system to use a con guration local to each host tooverride particular entries in the global con guration.

    The cluster administrator should adapt the global con guration and local hostcongurations to the site s needs immediately after the installation. The con gurationsshould be kept up to date afterwards.

    The sge_conf (5) man page contains a detailed description of the con gurationentries.

    Displaying a Cluster Con guration With QMONOn the QMONMain Control window, click the Cluster Con guration button. TheCluster Con guration dialog box appears.

    FIGURE 1 6 Cluster Con guration Dialog Box

    In the Host list, select the name of a host. The current con guration for the selectedhost is displayed under Con guration.

    40 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    41/220

    Displaying the Global Cluster Con guration With

    QMONOn the QMONMain Control window, click the Cluster Con guration button.In the Host list, select global .

    The con guration is displayed in the format that is described in the sge_conf (5) manpage.

    Adding and Modifying Global and HostCon gurations With QMONIn the Cluster Con guration dialog box ( Figure 1 6), select a host name or the nameglobal , and then click Add or Modify. The Cluster Settings dialog box appears.

    The Cluster Settings dialog box enables you to change all parameters of a globalconguration or a local host con guration.

    Chapter 1 Con guring Hosts and Clusters 41

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    42/220

    All elds of the dialog box are accessible only if you are modifying the globalconguration. If you modify a local host, its con guration is re ected in the dialog box. You can modify only those parameters that are feasible for local host changes.

    If you are adding a new local host con guration, the dialog box elds are empty.

    The Advanced Settings tab shows a corresponding behavior, depending on whetheryou are modifying a con guration or are adding a new con guration. The AdvancedSettings tab provides access to more rarely used cluster con guration parameters.

    When you nish making changes, click OK to save your changes and close the dialog box. Click Cancel to close the dialog box without saving changes.

    See the sge_conf (5) man page for a complete description of all cluster con gurationparameters.

    Deleting a Cluster Con guration With QMONOn the QMONMain Control window, click the Cluster Con guration button.In the Host list, select the name of a host whose con guration you want to delete, andthen click Delete.

    42 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    43/220

    Displaying the Basic Cluster Con gurations From

    the Command LineTo display the current cluster con guration, use the qconf -sconf command. Seethe qconf (1) man page for a detailed description.

    Type one of the following commands:

    % qconf -sconf% qconf -sconf global% qconf -sconf host

    I The qconf sconf and qconf sconf global commands are equivalent. Theydisplay the global con guration.

    I The qconf -sconf host command displays the speci ed local host sconguration.

    Modifying the Basic Cluster Con gurations Fromthe Command Line

    Note You must be an administrator to use the qconf command to change clustercongurations.

    Type one of the following commands:

    % qconf -mconf global% qconf -mconf host

    I The qconf -mconf global command modi es the global con guration.I The qconf -mconf host command modi es the local con guration of the speci ed

    execution host or master host.

    The qconf commands that are described here are examples of the many availableqconf commands. See the qconf (1) man page for others.

    Chapter 1 Con guring Hosts and Clusters 43

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    44/220

    44 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    45/220

    CHAPTER 2

    Con guring Queues and QueueCalendars

    This chapter provides background information about con guring queues and queue

    calendars. It also includes instructions for how to con gure them.The following is a list of speci c tasks for which instructions are included in thischapter.I Con guring Queues With QMON on page 47I Con guring Queues From the Command Line on page 61I Con guring Queue Calendars With QMON on page 63I Con guring Queue Calendars From the Command Line on page 65

    Con guring QueuesQueues are containers for different categories of jobs. Queues provide thecorresponding resources for concurrent execution of multiple jobs that belong to thesame category.

    In N1 Grid Engine 6, a queue can be associated with one host or with multiple hosts.Because queues can extend across multiple hosts, they are called cluster queues. Clusterqueues enable you to manage a cluster of execution hosts by means of a single clusterqueue con guration.

    Each host that is associated with a cluster queue receives an instance of that clusterqueue, which resides on that host. This guide refers to these instances as queueinstances. Within any cluster queue, you can con gure each queue instance separately.By conguring individual queue instances, you can manage a heterogeneous cluster ofexecution hosts by means of a single cluster queue con guration.

    45

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    46/220

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    47/220

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    48/220

    When you click Add, the Queue Con guration Add dialog box appears. When youclick Modify, the Modify queue-name dialog box appears. When the QueueCon guration dialog box appears for the rst time, it displays the GeneralCon guration tab.

    FIGURE 2 1 Queue Con guration General Con guration Tab

    If you are modifying an existing queue, the name of the queue is displayed in theQueue Name eld. The hosts where the queue instances reside are displayed in the

    Hostlist eld.If you are adding a new cluster queue, you must specify a queue name and the namesof the hosts on which the queue instances are to reside.

    In the Hostlist eld, you can specify the names of individual hosts. You can alsospecify the names of previously de ned host groups. Queue instances of this clusterqueue will reside on all individual hosts and on all members of the host groups youspecify, including all members of any host subgroups. For more information abouthost groups, see Con guring Host Groups With QMON on page 34 .

    The following 11 tabs for specifying parameter sets are available to de ne a queue:I General Con guration see Con guring General Parameters on page 49I Execution Method see Con guring Execution Method Parameters on page 50I Checkpointing see Con guring the Checkpointing Parameters on page 51I Parallel Environment see Con guring Parallel Environments on page 52I Load/Suspend Thresholds see Con guring Load and Suspend Thresholds

    on page 53I Limits see Con guring Limits on page 55

    48 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    49/220

    I Complex see Con guring Complex Resource Attributes on page 56I Subordinates see Con guring Subordinate Queues on page 57I User Access see Con guring User Access Parameters on page 58I Project Access see Con guring Project Access Parameters on page 59I Owners see Con guring Owners Parameters on page 60

    To set default parameters for the cluster queue, select @/ in the Attributes forHost/Hostgroup list, and then click the tab containing the parameters that you wantto set.

    Default parameters are set for all queue instances on all hosts listed under Hostlist.You can override the default parameter values on a host or a host group that youspecify. To set override parameters for a host or a host group, rst select the namefrom the Attributes for Host/Hostgroup list. Then click the tab containing theparameters that you want to set. The values of the parameters that you set overridethe cluster queue s default parameters on the selected host or host group.

    To set a host-speci c parameter, you must rst enable the parameter for con guration.Click the lock icon at the left of the parameter you want to set, and then change theparameter s value.

    The Refresh button loads the settings of other objects that were modi ed while theQueue Con guration dialog box was open.

    Click OK to register all queue con guration changes with sge_qmaster and close thedialog box. Click Cancel to close the dialog box without saving your changes.

    Con guring General Parameters

    To con gure general parameters, click the General Con guration tab. The GeneralCon guration tab is shown in Figure 2 1.

    You can specify the following parameters:I Sequence Nr. The sequence number of the queue.I Processors. A speci er for the processor set to be used by the jobs running in that

    queue. For some operating system architectures, this speci er can be a range, suchas 1-4,8,10, or just an integer identi er of the processor set. See thearc_depend_*.asc les in the doc directory of your N1 Grid Engine 6 softwaredistribution for more information.

    Caution Do not change this value unless you are certain that you need to change

    it.

    I tmp Directory. Temporary directory path.I Shell. Default command interpreter to use for running the job scripts.

    Chapter 2 Con guring Queues and Queue Calendars 49

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    50/220

    I Shell Start Mode. The mode in which to start the job script.I Initial State. The state in which a newly added queue comes up. Also, the state in

    which a queue instance is restored if the sge_execd running on the queueinstance host gets restarted.I Rerun Jobs. The queue s default rerun policy to be enforced on jobs that were

    aborted, for example, due to system crashes. The user can override this policyusing the qsub -r command or the Submit Job dialog box. See Extended JobExample in N1 Grid Engine 6 Users Guide.

    I Calendar. A calendar attached to the queue. This calendar de nes on-duty andoff-duty times for the queue.

    I Notify Time. The time to wait between delivery of SIGUSR1/SIGUSR2 noti cationsignals and suspend or kill signals.

    I Jobs Nice. The nice value with which to start the jobs in this queue. 0 means usethe system default.

    I Slots. The number of jobs that are allowed to run concurrently in the queue. Slotsare also referred to as job slots.

    I Type. The type of the queue and of the jobs that are allowed to run in this queue.Type can be Batch, Interactive, or both.

    See the queue_conf (5) man page for detailed information about these parameters.

    Con guring Execution Method ParametersTo con gure execution method parameters, click the Execution Method tab. TheExecution Method tab is shown in the following gure.

    50 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    51/220

    You can specify the following parameters:I Prolog. A queue-speci c prolog script. The prolog script is run with the same

    environment as the job before the job script is started.I Epilog. A queue-speci c epilog script. The epilog script is run with the same

    environment as the job after the job is nished.I Starter Method, Suspend Method, Resume Method, Terminate Method. Use these

    elds to override the default methods for applying these actions to jobs.See the queue_conf (5) man page for detailed information about these parameters.

    Con guring the Checkpointing ParametersTo con gure the checkpointing parameters, click the Checkpointing tab. TheCheckpointing tab is shown in the following gure.

    Chapter 2 Con guring Queues and Queue Calendars 51

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    52/220

    You can specify the following parameters:I MinCpuTime. The periodic checkpoint interval.I Referenced Ckpt Objects. A list of checkpointing environments associated with the

    queue.

    To reference a checkpointing environment from the queue, select the name of acheckpointing environment from the Available list, and then click the right arrow to

    add it to the Referenced list.

    To remove a checkpointing environment from the Referenced list, select it, and thenclick the left arrow.

    To add or modify checkpointing environments, click the button below the red arrowsto open the Checkpointing Con guration dialog box. For more information, seeCon guring Checkpointing Environments With QMON on page 166 .

    See the queue_conf (5) man page for detailed information about these parameters.

    Con guring Parallel EnvironmentsTo con gure parallel environments, click the Parallel Environment tab. The ParallelEnvironment tab is shown in the following gure.

    52 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    53/220

    You can specify the following parameter:I Referenced PE. A list of parallel environments associated with the queue.

    To reference a parallel environment from the queue, select the name of a parallelenvironment from the Available PEs list, and then click the right arrow to add it to theReferenced PEs list.

    To remove a checkpointing environment from the Referenced PEs list, select it, andthen click the left arrow.

    To add or modify parallel environments, click the button below the red arrows to openthe Parallel Environment Con guration dialog box. For more information, seeCon guring Parallel Environments With QMON on page 156 .

    See the queue_conf (5) man page for detailed information about this parameter.

    Con guring Load and Suspend Thresholds

    To con gure load and suspend thresholds, click the Load/Suspend Thresholds tab.The Load/Suspend Thresholds tab is shown in the following gure.

    Chapter 2 Con guring Queues and Queue Calendars 53

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    54/220

    You can specify the following parameters:I The Load Thresholds and the Suspend Thresholds tables, which de ne overload

    thresholds for load parameters and consumable resource attributes. See ComplexResource Attributes on page 67 .In the case of load thresholds, overload prevents the queue from receiving further jobs. In the case of suspend thresholds, overload suspends jobs in the queue inorder to reduce the load.The tables display the currently con gured thresholds.To change an existing threshold, select it, and then double-click the correspondingValue eld.To add new thresholds, click Load or Value. A selection list appears with all validattributes that are attached to the queue. The Attribute Selection dialog box isshown in Figure 1 2. To add an attribute to the Load column of the correspondingthreshold table, select an attribute, and then click OK.To delete an existing threshold, select it, and then type Control-D or click mouse button 3. You are prompted to con rm that you want to delete the selection.

    I Suspend interval. The time interval between suspension of other jobs in case thesuspend thresholds are still exceeded.

    I Jobs suspended per interval. The number of jobs to suspend per time interval inorder to reduce the load on the system that is hosting the con gured queue.

    See the queue_conf (5) man page for detailed information about these parameters.

    54 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    55/220

    Con guring Limits

    To con gure limits parameters, click the Limits tab. The Limits tab is shown in thefollowing gure.

    You can specify the following parameters:I Hard Limit and Soft Limit. The hard limit and the soft limit to impose on the jobs

    that are running in the queue.

    To change a value of a limit, click the button at the right of the eld whose value youwant to change. A dialog box appears where you can type either Memory or Timelimit values.

    Chapter 2 Con guring Queues and Queue Calendars 55

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    56/220

    See the queue_conf (5) and the setrlimit (2) man pages for detailed informationabout limit parameters and their interpretation for different operating systemarchitectures.

    Con guring Complex Resource AttributesTo con gure resource attributes, click the Complex tab. The Complex tab is shown inthe following gure.

    56 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    57/220

    You can specify the following parameters:I Consumables/Fixed Attributes. Value de nitions for selected attributes from the

    set of resource attributes that are available for this queue.The available resource attributes are assembled by default from the complex.Resource attributes are either consumable or xed. The de nition of a consumablevalue de nes a capacity managed by the queue. The de nition of a xed valuede nes a queue-speci c value. See Complex Resource Attributes on page 67 forfurther details.The attributes for which values are explicitly de ned are displayed in theConsumable/Fixed Attributes table. To change an attribute, select it, and thendouble-click the corresponding Value eld.To add new attribute de nitions, click Load or Value. The Attribute Selectiondialog box appears with a list of all valid attributes that are attached to the queue.The Attribute Selection dialog box is shown in Figure 1 2.To add an attribute to the Load column of the attribute table, select it, and thenclick OK.To delete an attribute, select it, and then press Control-D or click mouse button 3.You are prompted to con rm that you want to delete the attribute.

    See the queue_conf (5) page for detailed information about these attributes.

    Use the Complex Con guration dialog box to check or modify the current complexconguration before you attach user-de ned resource attributes to a queue or beforeyou detach them from a queue. To access the Complex Con guration dialog box, clickthe Complex Con guration button on the QMONMain Control window. See Figure 3 1for an example.

    Con guring Subordinate QueuesTo con gure subordinate queues, click the Subordinates tab. The Subordinates tab isshown in the following gure.

    Chapter 2 Con guring Queues and Queue Calendars 57

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    58/220

    Use the subordinate queue facility to implement high priority and low priority queuesas well as standalone queues.

    You can specify the following parameters:I Queue. A list of the queues that are subordinated to the con gured queue.

    Subordinated queues are suspended if the con gured queue becomes busy.Subordinated queues are resumed when the con gured queue is no longer busy.

    I Max Slots. For any subordinated queue, you can con gure the number of job slotsthat must be lled in the con gured queue to trigger a suspension. If no maximumslot value is speci ed, all job slots must be lled to trigger suspension of thecorresponding queue.

    See the queue_conf (5) man page for detailed information about these parameters.

    Con guring User Access ParametersTo con gure user access parameters, click the User Access tab. The User Access tab isshown in the following gure.

    58 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    59/220

    You can specify the following parameters:I Available Access Lists. The user access lists that can be included in the Allow

    Access list or the Deny Access list of the queue.Users or user groups belonging to access lists that are included in the Allow Accesslist have access to the queue. Users who are included in the Deny Access list cannotaccess the queue. If the Allow Access list is empty, access is unrestricted unlessexplicitly stated otherwise in the Deny Access list.

    To add or modify user access lists, click the button between the Available Access Listsand the Allow Access and Deny Access lists to open the User Con guration dialog box. For more information, see Con guring User Access Lists With QMON on page98.

    See the queue_conf (5) man page for detailed information about these parameters.

    Con guring Project Access ParametersTo con gure project access parameters, click the Project Access tab. The Project Accesstab is shown in the following gure.

    Chapter 2 Con guring Queues and Queue Calendars 59

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    60/220

    You can specify the following parameters:I Available Projects. The projects that are allowed access or denied access to the

    queue. Jobs submitted to a project belonging to the list of allowed projects have access tothe queue. Jobs that are submitted to denied projects are not dispatched to thequeue.

    To add or modify project access, click the button between the Available Projects listand the Allow Project Access and Deny Project Access lists to open the ProjectCon guration dialog box. For more information, see Dening Projects With QMONon page 104 .

    See the queue_conf (5) man page for detailed information about these parameters.

    Con guring Owners ParametersTo con gure owners parameters, click the Owners tab. The Owners tab is shown in thefollowing gure.

    60 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    61/220

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    62/220

    # qconf options

    The qconf command has the following options:I qconf -aq [ cluster-queue]

    The -aq option (add cluster queue) displays an editor containing a template forcluster queue con guration. The editor is either the default vi editor or an editorde ned by the EDITOR environment variable. If cluster-queue is speci ed, theconguration of this cluster queue is used as template. Con gure the cluster queue by changing the template and then saving it. See the queue_conf (5) man page fora detailed description of the template entries to change.

    I qconf -Aq lename

    The -Aq option (add cluster queue from le) uses the le lename to de ne acluster queue. The de nition le might have been produced by the qconf -sqqueue command.

    I qconf -cq queue[ ,...]The -cq option (clean queue) cleans the status of the speci ed cluster queues,queue domains, or queue instances to be idle and free from running jobs. Thestatus is reset without respect to the current status. This option is useful foreliminating error conditions, but you should not use it in normal operation mode.

    I qconf -dq cluster-queue[ ,...]

    The -dq option (delete cluster queue) deletes the cluster queues speci ed in theargument list from the list of available queues.

    I qconf -mq cluster-queue

    The -mq option (modify cluster queue) modi es the speci ed cluster queue. The-mq option displays an editor containing the con guration of the cluster queue to be changed. The editor is either the default vi editor or an editor de ned by theEDITOR environment variable. Modify the cluster queue by changing theconguration and then saving your changes.

    I qconf -Mq lenameThe -Mq option (modify cluster queue from le) uses the le lename to de ne themodi ed cluster queue con guration. The de nition le might have beenproduced by the qconf -sq queue command and subsequent modi cation.

    I qconf -sq [ queue[ ,...]]

    The -sq option (show queue) without arguments displays the default templatecluster queue, queue domain, or queue instance con guration. The -sq optionwith arguments displays the current con guration of the speci ed queues.

    I qconf -sql

    The -sql option (show cluster queue list) displays a list of all currently con guredcluster queues.

    The qconf command provides the following set of options that you can use to changespeci c queue attributes:

    62 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    63/220

    -aattr Add attributes-Aattr Add attributes from a le-dattr Delete attributes-Dattr Delete attributes listed in a le-mattr Modify attributes-Mattr Modify attributes from a le-rattr Replace attributes-Rattr Replace attributes from a le-sobjl Show list of con guration objects

    For a description of how to use these options and for some examples of their use, seeUsing Files to Modify Queues, Hosts, and Environments on page 180 . For detailedinformation about these options, see the qconf (1) man page.

    Con guring Queue CalendarsQueue calendars de ne the availability of queues according to the day of the year, theday of the week, or the time of day. You can con gure queues to change their status atspeci ed times. You can change the queue status to disabled, enabled, suspended, orresumed (unsuspended).

    The grid engine system enables you to de ne a site-speci c set of calendars, each ofwhich speci es status changes and the times at which the changes occur. Thesecalendars can be associated with queues. Each queue can attach a single calendar,thereby adopting the availability pro le dened in the attached calendar.

    The syntax of the calendar format is described in detail in the calendar_conf (5)man page. A few examples are given in the next sections, along with a description ofthe corresponding administration facilities.

    Con guring Queue Calendars With QMONIn the QMONMain Control window, click the Calendar Con guration button. TheCalendar Con guration dialog box appears.

    Chapter 2 Con guring Queues and Queue Calendars 63

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    64/220

    The Calendars list displays the available calendars.

    In the Calendars list, click the calendar con guration that you want to modify ordelete.

    Do one of the following:I To delete the selected calendar, click Delete.I To modify the selected calendar, click Modify.I To add access lists, click Add.

    In all cases, the Add/Modify Calendar dialog box appears.

    If you click Modify or Delete, the Calendar Name eld displays the name of theselected calendar. If you click Add, type the name of the calendar you are de ning.

    The Year and Week elds enable you to de ne the calendar events, using the syntaxdescribed in the calendar_conf (5) man page.

    The example of the calendar con guration shown in the previous gure is appropriatefor queues that should be available outside office hours and on weekends. In addition,the Christmas holidays are de ned to be handled like weekends.

    See the calendar_conf (5) man page for a detailed description of the syntax and formore examples.

    64 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    65/220

    By attaching a calendar con guration to a queue, the availability pro le dened by thecalendar is set for the queue. Calendars are attached in the General Con guration tabof the Modify queue-name dialog box. The Calendar eld contains the name of thecalendar to attach. The button next to the Calendar eld lists the currently con guredcalendars. See Con guring Queues on page 45 for more details about con guringqueues.

    Con guring Queue Calendars From the CommandLineTo con gure queue calendars from the command line, type the following commandwith appropriate options:

    % qconf options

    The following options are available:I qconf -acal calendar-name

    The -acal option (add calendar) adds a new calendar con guration namedcalendar-name to the cluster. An editor with a template con guration appears,enabling you to de ne the calendar.

    I qconf -Acal lename

    The -Acal option (add calendar from le) adds a new calendar con guration tothe cluster. The added calendar is read from the speci ed le.

    Chapter 2 Con guring Queues and Queue Calendars 65

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    66/220

    I qconf -dcal calendar-name [ ,...]

    The -dcal option (delete calendar) deletes the speci ed calendar.I qconf -mcal calendar-name

    The -mcal option (modify calendar) modi es an existing calendar con gurationnamed calendar-name. An editor opens calendar-name, enabling you to make changesto the de nition.

    I qconf -Mcal lenameThe -Mcal option (modify calendar from le) modi es an existing calendarconguration. The calendar to modify is read from the speci ed le.

    I qconf -scal calendar-nameThe -scal option (show calendar) displays the con guration for calendar-name.

    I qconf -scall

    The-scall option (show calendar list) displays a list of all con gured calendars.

    66 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    67/220

    CHAPTER 3

    Con guring Complex ResourceAttributes

    This chapter describes how to con gure resource attribute de nitions. Resource

    attribute de nitions are stored in an entity called the grid engine system complex. Inaddition to background information relating to the complex and its associatedconcepts, this chapter provides detailed instructions on how to accomplish thefollowing tasks:I Con guring Complex Resource Attributes With QMON on page 68I Setting Up Consumable Resources on page 75I Con guring Complex Resource Attributes From the Command Line on page 86I Writing Your Own Load Sensors on page 88

    Complex Resource AttributesThe complex con guration provides all pertinent information about the resourceattributes users can request for jobs with the qsub -l or qalter -l commands. Thecomplex con guration also provides information about how the grid engine systemshould interpret these resource attributes.

    The complex also builds the framework for the system s consumable resourcesfacility.The resource attributes that are de ned in the complex can be attached to the globalcluster, to a host, or to a queue instance. The attached attribute identi es a resourcewith the associated capability. During the scheduling process, the availability ofresources and the job requirements are taken into account. The grid engine system alsoperforms the bookkeeping and the capacity planning that is required to prevent

    oversubscription of consumable resources.Typical consumable resource attributes include:I Available free memoryI Unoccupied licenses of a software package

    67

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    68/220

    I Free disk spaceI Available bandwidth on a network connection

    Attribute de nitions in the grid engine complex de ne how resource attributes should be interpreted.

    The de nition of a resource attribute includes the following:I Name of the attributeI Shortcut to reference the attribute nameI Value type of the attribute, for example, STRING or TIMEI Relational operator used by the schedulerI Requestable ag, which determines whether users can request the attribute for a

    jobI Consumable ag, which identi es the attribute as a consumable resourceI Default request value that is taken into account for consumable attributes if jobs do

    not explicitly specify a request for the attributeI Urgency value, which determines job priorities on a per resource basis

    Use the QMONComplex Con guration dialog box, which is shown in Figure 3 1, tode ne complex resource attributes.

    Con guring Complex Resource Attributes WithQMON

    In the QMONMain Control window, click the Complex Con guration button. TheComplex Con guration dialog box appears.

    68 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    69/220

    FIGURE 3 1 Complex Con guration Dialog Box

    The Complex Con guration dialog box enables you to add, modify, or delete complexresource attributes.

    To add a new attribute, rst make sure that no line in the Attributes table is selected.In the elds above the Attributes table, type or select the values that you want, andthen click Add.

    Note If you want to add a new attribute and an existing attribute is selected, you

    must clear the selection. To deselect a highlighted attribute, hold down the Controlkey and click mouse button 1.

    You can add a new attribute by copying an existing attribute and then modifying it.Make sure that the attribute name and its shortcut are unique.

    To modify an attribute listed in the Attributes table, select it. The values of the selectedattribute are displayed above the Attributes table. Change the attribute values, andthen click Modify.

    To save con guration changes to a le, click Save. To load values from a le into thecomplex con guration, click Load, and then select the name of a le from the list thatappears.

    To delete an attribute in the Attribute table, select it, and then click Delete.

    See the complex (5) man page for details about the meaning of the rows and columnsin the table.

    Chapter 3 Con guring Complex Resource Attributes 69

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    70/220

    To register your new or modi ed complex con guration with sge_qmaster , clickCommit.

    Assigning Resource Attributes to Queues, Hosts,and the Global ClusterResource attributes can be used in the following ways:I As queue resource attributesI As host resource attributesI As global resource attributes

    A set of default resource attributes is already attached to each queue and host. Defaultresource attributes are built in to the system and cannot be deleted, nor can their type be changed.

    User-de ned resource attributes must rst be de ned in the complex before you canassign them to a queue instance, a host, or the global cluster. When you assign aresource attribute to one of these targets, you specify a value for the attribute.

    The following sections describe each attribute type in detail.

    Queue Resource AttributesDefault queue resource attributes are a set of parameters that are de ned in the queueconguration. These parameters are described in the queue_conf (5) man page.

    You can add new resource attributes to the default attributes. New attributes areattached only to the queue instances that you modify. When the con guration of aparticular queue instance references a resource attribute that is de ned in the complex,that queue con guration provides the values for the attribute de nition. For detailsabout queue con guration see Con guring Queues on page 45 .

    For example, the queue con guration value h_vmem is used for the virtual memorysize limit. This value limits the amount of total memory that each job can consume. Anentry in the complex_values list of the queue con guration de nes the totalavailable amount of virtual memory on a host or assigned to a queue. For detailedinformation about consumable resources, see Consumable Resources on page 74 .

    Host Resource AttributesHost resource attributes are parameters that are intended to be managed on a host basis.

    The default host-related attributes are load values. You can add new resourceattributes to the default attributes, as described earlier in Queue Resource Attributes on page 70 .

    70 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    71/220

    Every sge_execd periodically reports load to sge_qmaster . The reported loadvalues are either the standard load values such as the CPU load average, or the loadvalues de ned by the administrator, as described in Load Parameters on page 87 .

    The de nitions of the standard load values are part of the default host resourceattributes, whereas administrator-de ned load values require extending the hostresource attributes.

    Host-related attributes are commonly extended to include nonstandard loadparameters. Host-related attributes are also extended to manage host-related resourcessuch as the number of software licenses that are assigned to a host, or the availabledisk space on a host s local le system.

    If hostrelated attributes are associated with a host or with a queue instance on thathost, a concrete value for a particular host resource attribute is determined by one ofthe following items:I

    The queue con guration, if the attribute is also assigned to the queue con gurationI A reported load valueI The explicit de nition of a value in the complex_values entry of the

    corresponding host con guration. For details, see Con guring Hosts on page 24 .

    In some cases, none of these values are available. For example, say the value issupposed to be a load parameter, but sge_execd does not report a load value for theparameter. In such cases, the attribute is not de ned, and the qstat F commandshows that the attribute is not applicable.

    For example, the total free virtual memory attribute h_vmem is dened in the queueconguration as limit and is also reported as a standard load parameter. The totalavailable amount of virtual memory on a host can be de ned in thecomplex_values list of that host. The total available amount of virtual memoryattached to a queue instance on that host can be de ned in the complex_values listof that queue instance. Together with de ning h_vmem as a consumable resource, youcan efficiently exploit memory of a machine without risking memory oversubscription,which often results in reduced system performance that is caused by swapping. Formore information about consumable resources, see Consumable Resources on page74.

    Note Only the Shortcut, Relation, Requestable, Consumable, and Default columnscan be changed for the default resource attributes. No default attributes can bedeleted.

    Global Resource AttributesGlobal resource attributes are cluster-wide resource attributes, such as availablenetwork bandwidth of a le server or the free disk space on a network-wide availablele system.

    Chapter 3 Con guring Complex Resource Attributes 71

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    72/220

    Global resource attributes can also be associated with load reports if the correspondingload report contains the GLOBALidenti er, as described in Load Parameters on page 87 . Global load values can be reported from any host in the cluster. Noglobal load values are reported by default, therefore there are no default globalresource attributes.

    Concrete values for global resource attributes are determined by the following items:I Global load reports.I Explicit de nition in the complex_values parameter of the global host

    conguration. See Con guring Hosts on page 24 .I In association with a particular host or queue and an explicit de nition in the

    corresponding complex_values lists.

    Sometimes none of these cases apply. For example, a load value might not yet bereported. In such cases, the attribute does not exist.

    Adding Resource Attributes to the ComplexBy adding resource attributes to the complex, the administrator can extend the set ofattributes managed by thegrid engine system. The administrator can also restrict theinuence of user-de ned attributes to particular queues, hosts, or both.

    User-de ned attributes are a named collection of attributes with the correspondingde nitions as to how the grid engine software is to handle these attributes. You canattach one or more user-de ned attributes to a queue, to a host, or globally to all hostsin the cluster. Use the complex_values parameter for the queue con guration andthe host con guration. For more information, see Con guring Queues on page 45

    and Con guring Hosts on page 24 . The attributes de ned become available to thequeue and to the host, respectively, in addition to the default resource attributes.

    The complex_values parameter in the queue con guration and the hostconguration must set concrete values for user-de ned attributes that are associatedwith queues and hosts.

    For example, say the user-de ned resource attributes permas , pamcrash , andnastran , shown in the following gure, are de ned.

    72 N1 Grid Engine 6Administration Guide May 2005

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    73/220

    For at least one or more queues, add the resource attributes to the list of associateduser-de ned attributes as shown in the Complex tab of the Modify queue-name dialog box. For details on how to con gure queues, see Con guring Queues on page 45and its related sections.

    Then the displayed queue is con gured to manage up to 10 licenses of the softwarepackage permas . Furthermore, the attribute permas becomes requestable for jobs, asexpressed in the Available Resources list in the Requested Resources dialog box.

    Chapter 3 Con guring Complex Resource Attributes 73

  • 8/7/2019 N1 Grid Engine 6 Administration Guide

    74/220

    For details about how to submit jobs, see Chapter 3, Submitting Jobs, in N1 GridEngine 6 Users Guide.

    Alternatively, the user could submit jobs from the command line and could requestattributes as follows:

    % qsub -l pm=1 permas.sh

    Note You can use the pm shortcut instead of the full attribute name permas .

    Consequently, the only eligible queues for these jobs are the queues that are associatedwith the user-de ned resource attributes and that have permas licenses con guredand available.

    Consumable ResourcesConsumable resources provide an efficient way to manage limited resources such asavailable memory, free space on a le system, network bandwidth, or oating softwarelicenses. Consumable resources are also called consumables. The total available capacityof a consumable is de ned by the administrator. The consumption of thecorresponding resource is monitored by grid engine software internal bookkeeping.The grid engine system accounts for the consumption o