Top Banner
Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs
81

Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Dec 24, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Memory restriction, limits and

heterogeneous grids.

A case study.Txema Heredia

Or an example of how to adapt your policies to your needs

Page 2: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

DISCLAIMERWhat I am going to present is not either the

panacea nor has to adapt to nor solve immediately your cluster issues. This is just a brief description of the problems we faced and how did we use different SGE’s options

to handle them.Also, no animal was harmed in the making of

this powerpoint.

Page 3: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Our story

Page 4: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

“hey, let’s buy a cluster”

- my boss

Page 5: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

What did we need?

Page 6: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

What did we need?

•Users:

•biologists, not programmers

•Processes:

•user-made scripts

•single core biological software

Page 7: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

What did we NOT need?

•Nopes:

•threads / parallel programming (mostly)

•GPUs

•Ayes:

•thousands of single-core jobs

Page 8: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

And thus, our baby was born

Page 9: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.
Page 10: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Our cluster

•8 computing nodes

•8 cores

•8 Gb RAM

•1 front-end

Page 11: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Our cluster

•NFS

•Rocks cluster (CentOS)

•SGE

Page 12: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

First steps with SGE

Page 13: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

First steps with SGE

•1st try:

•One queue to rule them all

Page 14: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

First steps with SGE

•1st try:

•all.q queue

•free for all

Page 15: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

First steps with SGE

•1st try - conclusions:

•chaos reigned

•constant conflicts between users (specially time related)

•FIFO queuing

•swapping

Page 16: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

2nd try

•2nd try

•round-robin-like scheduling

•share tree/functional tickets

•split cluster by time usage:

•3 queues: fast / medium / slow

Page 17: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

2nd try

•fast:

•2 hours / 2 nodes

•medium:

•48 hours / 3 nodes

•slow:

•∞ hours / 3 nodes

Page 18: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

2nd try

•Conclusions:

•↓ chaos

•↓ user conflicts

•Still swapping

•High undersubscription of the cluster

Page 19: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

2nd try

•3 types of jobs

•Don’t need to coexist at the same time

•1 user → 1 type of job

•User knowledge

•Saturation of the unlimited queue

Page 20: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

2nd try•Queue tinkering:

•wallclock time

•number of hosts

•Better results, but not good enough:

•Waiting jobs & idle nodes

Page 21: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

2nd try

•There are 2 wars here:

•memory / swap

•splitting leads to undersubscription

Page 22: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

The memory war

Page 23: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Memory

•Buy more memory

•from 8x8Gb

•to 4x 32Gb, 3x 16Gb, 1x 8Gb

•This reduces our problem, but doesn’t fix it

Page 24: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Swap

•Swapping in a cluster is the root of all evil

Page 25: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Swap

•Complex attribute “h_vmem”

Page 26: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.
Page 27: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.
Page 28: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

h_core

h_rt ≠ h_cpu

h_fsize

h_rss

h_stack

h_data = h_vmem

Page 29: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

h_vmem

•h_vmem

•SIGKILL

•s_vmem

•SIGXCPU

•You can combine both

Page 30: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

h_vmem

•Requestable by default

•We want them to be consumable

•qmon / qconf -mc

Page 31: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

h_vmem

Page 32: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

h_vmem

•requestable = YES

•consumable = YES / JOB

•default = whatever you want

Page 33: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

h_vmem

•Only for parallel environment jobs:

•consumable = YES

•sge_shepherd memory = h_vmem*slots

•consumble = JOB

•sge_shepherd memory = h_vmem

Page 34: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

h_vmem

•default = 100M

•“everything” dies

•default = 6G

•“everything” works

Page 35: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

h_vmem

•Now we can limit the memory

•But we can still have swapping

Page 36: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

h_vmem

•Define h_vmem in each host

•qmon / qconf -me hostname

Page 37: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.
Page 38: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

h_vmem

•Exact memory:

•more secure

•Bigger memory:

•more margin

Page 39: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Memory

•From now on, any job submission must contain a memory request:

•qsub ... -l h_vmem=3G ...

Page 40: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

No more swapping!!

Page 41: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Undersubscription

Page 42: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Undersubscription

•Dual restriction:

•8 jobs/slots per node

•32 / 16 / 8 GB mem per node

•The minimum of both will apply

Page 43: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

32 Gb node

8 Gb node

Page 44: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

32 Gb node

8 Gb node

8Gb8Gb

1Gb1Gb 1Gb1Gb

1Gb1Gb 1Gb1Gb

1Gb1Gb

1Gb1Gb 1Gb1Gb

1Gb1Gb

7 slots free0 Gb free

0 slots free24 Gb free

Stupid scheduling

Page 45: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

32 Gb node

8 Gb node

8Gb8Gb

1Gb1Gb 1Gb1Gb

1Gb1Gb 1Gb1Gb

1Gb1Gb

1Gb1Gb 1Gb1Gb

1Gb1Gb

0 slots free0 Gb free

7 slots free24 Gb free

Smart scheduling

Page 46: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Smart scheduling

•We want each job to go to the node where it better fits.

Page 47: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

(another) DISCLAIMERThis is strictly for our case and needs. It may

appeal to you, or some ideas can inspire you, but it is not intended to be a step-by-

step solution for everyone.It is just an example of “things that can be

done”.

Page 48: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Smart scheduling

•Create 3 hostgroups:

•@32G, @16G and @8G

•Group nodes by memory

Page 49: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Smart scheduling

•Maximize the ratio memory/core:

•job <1Gb → 8Gb nodes

•1Gb < job < 2Gb → 16Gb nodes

•2Gb < job → 32Gb nodes

Page 50: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Smart scheduling

•3 different queues:

•all-32

•all-16

•all-8

•assign the corresponding hostgroup

Page 51: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Smart scheduling

•Same problem as before:

•Oversubscription of one queue

•Undersubscription of other queues

Page 52: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Sequence Numbers

Page 53: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Smart scheduling

•Preference for a given hostgroup

Page 54: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Smart scheduling

•all-32:

•@32G > @16G > @8G

•all-16:

•@16G > @32G > @8G

•all-8:

•@8G > @16G > @32G

Page 55: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Smart scheduling

•qmon → queue configuration → general configuration → Sequence Nr

•qconf -mq queuename

Page 56: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Smart scheduling@32GSeq Nr=0

@16GSeq Nr=1

@8GSeq Nr=2

all-32 queue

Page 57: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

32 Gb queue

16 Gb queue

8Gb queue

Waiting queue

?✗✓

?✗✓

?✗✓

Page 58: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Are we done?

Page 59: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Qsub wrapper

•Users already choose the memory

•Why ask for a queue?

•We can let the system do it

Page 60: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Qsub wrapper

•Wrapper script around qsubparse parameters searching for queue or memory requests

if ( no memory ) { memory = default }

if ( no queues ) {

if (memory < 1Gb) { queue = all-8 }

if (1Gb < memory < 2Gb) { queue = all-16 }

if (2Gb < memory ) { queue = all-32 }

}

qsub -q $queue parameters

Page 61: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Qsub wrapper

•You can add whatever you need

Page 62: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Qsub wrapper

•“home-made” parameters

•--slow / --fast

•allow access to 2 kind of special nodes

•instead of

•-q all-16@compute-1-*

Page 63: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Qsub wrapper

•One queue to rule them all

•but...

•No swap!!!

•No undersubscription!!!

Page 64: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Now the icing

Page 65: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.
Page 66: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Punishment

•System relies in “good behaviour”

•Teach users how to use it

•Prevent & “punish” bad usage

Page 67: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Punishment

•epilog script

•runs when the job finishes

•global: qconf -mconf

•or by queue

•/opt/gridengine/default/common/sge_epilog.sh

Page 68: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Punishment

•Check memory

•requested

•maxvmem

•log

•send an email

Page 69: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Punishment•no memory

•teaches how to request it properly

•too much memory

•tells and advises.

•reasonable memory

•no email

Page 70: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Punishment

•epilog writes a logfile

•cron process “punishes” or “rewards” users according to last day memory usage

Page 71: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Punishment

•Modify user’s shared ticket policy

•For each “bad” job:

•-10 tickets

•For each “good” job:

•+5 tickets

Page 72: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Punishment

•“bad users”

•delayed scheduling

•“good users”

•more priority

Page 73: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Control other resources

Page 74: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Shared disk

•NFS shared disk

•avoid filling it

•suspend all jobs before its too late

Page 75: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Shared disk

•New complex attribute: scratch_pct

•type = INT

•operation >=

•requestable = NO

•consumable = NO

•default = 0

Page 76: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Shared disk

•Load Report

• /opt/gridengine/default/common/sge_load_report.sh

Page 77: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Shared diskinfinite loop {

scratch=`df| grep scratch| awk '{print $4}' | grep % | sed 's/%//g'`

echo begin

echo "$myhost:scratch_pct:$scratch"

echo end

}

Page 78: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Shared disk

Page 79: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Shared disk

•whenever the disk gets to 97%

•all jobs freeze

Page 80: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Conclusions

•Combining SGE options give access to much more powerful configurations

Page 81: Memory restriction, limits and heterogeneous grids. A case study. Txema Heredia Or an example of how to adapt your policies to your needs.

Questions?Special thanks:

•Angel Carreño

•Carles Perarnau

•Marc Esteve

•Jordi Rambla

•Arcadi Navarro