Top Banner
> Solving Big problems with OS: Condor > Antonio Sanz ([email protected]) > 15 / Oct / 2012
69

Solving Big problems with Condor - II HPC Sysadmins Meeting

May 08, 2015

Download

Education

This is a long talk about the main features of Condor, and what tweaks we have added at I3A.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Solving Big problems with Condor - II HPC Sysadmins Meeting

> Solving Big problems with OS: Condor > Antonio Sanz ([email protected]) > 15 / Oct / 2012

Page 2: Solving Big problems with Condor - II HPC Sysadmins Meeting

2

> Antonio Sanz > I3A System Manager

> HERMES HPC cluster sysadmin > [email protected] > @antoniosanzalc

Page 3: Solving Big problems with Condor - II HPC Sysadmins Meeting

3

Page 4: Solving Big problems with Condor - II HPC Sysadmins Meeting

4 I’m no SGE guy …

Page 5: Solving Big problems with Condor - II HPC Sysadmins Meeting

5

Condor – Main features

Page 6: Solving Big problems with Condor - II HPC Sysadmins Meeting

6

Page 7: Solving Big problems with Condor - II HPC Sysadmins Meeting

7

Healthy project

Page 8: Solving Big problems with Condor - II HPC Sysadmins Meeting

8 Condor Basics

Heterogeneous computing

Page 9: Solving Big problems with Condor - II HPC Sysadmins Meeting

9

Job Surveillance

Page 10: Solving Big problems with Condor - II HPC Sysadmins Meeting

10

Requirements

Page 11: Solving Big problems with Condor - II HPC Sysadmins Meeting

11 Condor Basics

Fair use of resources

3. Sistemas de gestión de colas : Condor

Page 12: Solving Big problems with Condor - II HPC Sysadmins Meeting

12

Checkpoints

Page 13: Solving Big problems with Condor - II HPC Sysadmins Meeting

13 Condor Basics

Nested jobs (DAG)

Page 14: Solving Big problems with Condor - II HPC Sysadmins Meeting

14

Easy Licensing

Page 15: Solving Big problems with Condor - II HPC Sysadmins Meeting

15

… with Hadoop, MPI, OpenMP, GPU

Page 16: Solving Big problems with Condor - II HPC Sysadmins Meeting

16

Condor Flocking

Page 17: Solving Big problems with Condor - II HPC Sysadmins Meeting

17

Grid & Cloud Computing

Page 18: Solving Big problems with Condor - II HPC Sysadmins Meeting

18

VM Universe

Page 19: Solving Big problems with Condor - II HPC Sysadmins Meeting

19

Hooks & APIs

Page 20: Solving Big problems with Condor - II HPC Sysadmins Meeting

20 Condor Basics

Flexibility

Page 21: Solving Big problems with Condor - II HPC Sysadmins Meeting

21

How Condor works

How Condor works

Page 22: Solving Big problems with Condor - II HPC Sysadmins Meeting

22

Management

[Hello, Dave]

Page 23: Solving Big problems with Condor - II HPC Sysadmins Meeting

23

Compute

* Hey!. I’m a 64K one!.

* *

Page 24: Solving Big problems with Condor - II HPC Sysadmins Meeting

24 Condor Basics

Job list ClassAd

3. Sistemas de gestión de colas : Condor

Page 25: Solving Big problems with Condor - II HPC Sysadmins Meeting

25

Resource list ClassAd

Page 26: Solving Big problems with Condor - II HPC Sysadmins Meeting

26

Matchmaking

Page 27: Solving Big problems with Condor - II HPC Sysadmins Meeting

27 Condor Basics

Priority Management

Page 28: Solving Big problems with Condor - II HPC Sysadmins Meeting

28 Data

Transfer

Page 29: Solving Big problems with Condor - II HPC Sysadmins Meeting

29 Condor Basics

3. Sistemas de gestión de colas : Condor

Job running

Page 30: Solving Big problems with Condor - II HPC Sysadmins Meeting

30

Job Monitoring

Page 31: Solving Big problems with Condor - II HPC Sysadmins Meeting

31

Job End

Page 32: Solving Big problems with Condor - II HPC Sysadmins Meeting

32

Example

Page 33: Solving Big problems with Condor - II HPC Sysadmins Meeting

33

Hello, World !!

#!/bin/sh # I’m hola.sh echo Hola mundo desde `hostname` # # A Hello World .. In Condor! # # I’m hello.sub Universe = vanilla Executable = hola.sh Log = hola.log Output = hola.out Error = hola.err Queue

Page 34: Solving Big problems with Condor - II HPC Sysadmins Meeting

34 Lanzar el cálculo

condor_submit

4. Condor Basics – Un cálculo fácil

Page 35: Solving Big problems with Condor - II HPC Sysadmins Meeting

35 Lanzar el cálculo

condor_q

Page 36: Solving Big problems with Condor - II HPC Sysadmins Meeting

36

HERMES

I3A HPC cluster

Page 37: Solving Big problems with Condor - II HPC Sysadmins Meeting

37 Condor Basics

1500 executing jobs, 40000 in queue … Lookin’ good

Page 38: Solving Big problems with Condor - II HPC Sysadmins Meeting

38

Condor Tweaks

Page 39: Solving Big problems with Condor - II HPC Sysadmins Meeting

39

Propietary Resources

Page 40: Solving Big problems with Condor - II HPC Sysadmins Meeting

40

Dynamic Partitioning

Page 41: Solving Big problems with Condor - II HPC Sysadmins Meeting

41 Condor Basics

Long Jobs

Page 42: Solving Big problems with Condor - II HPC Sysadmins Meeting

42 Condor Basics

Short Jobs

Page 43: Solving Big problems with Condor - II HPC Sysadmins Meeting

43 Condor Basics

Big Jobs

Page 44: Solving Big problems with Condor - II HPC Sysadmins Meeting

44

Advanced Accounting

Page 45: Solving Big problems with Condor - II HPC Sysadmins Meeting

45

Dynamic Checkpointing

Page 46: Solving Big problems with Condor - II HPC Sysadmins Meeting

46

Condor_ssh

Interactive Access

Page 47: Solving Big problems with Condor - II HPC Sysadmins Meeting

47 Condor Basics GPU Integration

Page 48: Solving Big problems with Condor - II HPC Sysadmins Meeting

48

Extra Bonus

Future (always work in progress)

Page 49: Solving Big problems with Condor - II HPC Sysadmins Meeting

49

HA

Page 50: Solving Big problems with Condor - II HPC Sysadmins Meeting

50

Cgroups Isolation

Page 51: Solving Big problems with Condor - II HPC Sysadmins Meeting

51 Condor Basics

Hadoop Integration

3. Sistemas de gestión de colas : Condor

Page 52: Solving Big problems with Condor - II HPC Sysadmins Meeting

52

Green Computing

Page 53: Solving Big problems with Condor - II HPC Sysadmins Meeting

53 Condor Basics

3. Sistemas de gestión de colas : Condor

Nobody’s perfect ….

Page 54: Solving Big problems with Condor - II HPC Sysadmins Meeting

54

No MPI + Dynamic Partitioning Rellenado de trabajos HA Complicada

No MPI + Dynamic Partitioning (yet)

No slot wise preemption

HA tough as nails

Page 55: Solving Big problems with Condor - II HPC Sysadmins Meeting

55 Condor Basics

3. Sistemas de gestión de colas : Condor

Page 56: Solving Big problems with Condor - II HPC Sysadmins Meeting

56 Condor Basics

> Conclusiones

3. Sistemas de gestión de colas : Condor

Page 57: Solving Big problems with Condor - II HPC Sysadmins Meeting

57

Example

Page 58: Solving Big problems with Condor - II HPC Sysadmins Meeting

58

Antonio Sanz [email protected] @antoniosanzalc http://slideshare.net/ansanz

Slides here:

Fly like a bird with Condor Powerful. Flexible. Free.

Page 59: Solving Big problems with Condor - II HPC Sysadmins Meeting

59

Extra Bonus

Page 60: Solving Big problems with Condor - II HPC Sysadmins Meeting

60

I3A & Condor

Page 61: Solving Big problems with Condor - II HPC Sysadmins Meeting

61

Alzheimer & Dementia Diagnose

Page 62: Solving Big problems with Condor - II HPC Sysadmins Meeting

62

Tissue Modelling

Page 63: Solving Big problems with Condor - II HPC Sysadmins Meeting

63

Rare Diseases

Page 64: Solving Big problems with Condor - II HPC Sysadmins Meeting

64

Crash test simulations

Page 65: Solving Big problems with Condor - II HPC Sysadmins Meeting

65

Heart complete sim.

Page 66: Solving Big problems with Condor - II HPC Sysadmins Meeting

66 Communication Systems

Page 67: Solving Big problems with Condor - II HPC Sysadmins Meeting

67

Dynamic gaming AI

Page 68: Solving Big problems with Condor - II HPC Sysadmins Meeting

68

Autonomous robots

Page 69: Solving Big problems with Condor - II HPC Sysadmins Meeting

69

Antonio Sanz [email protected] @antoniosanzalc http://slideshare.net/ansanz

Slides here:

Fly like a bird with Condor Powerful. Flexible. Free.