Top Banner
1 The Condor DB Group Report Jiansheng Huang, Ameet Kini, Shrinivas Lakshmikant, Erik Paulson, Christine Reilly, Eric Robinson, Srinath Shankar, David DeWitt, Jeff Naughton
26

The Condor DB Group Report

Jan 07, 2016

Download

Documents

Iman

The Condor DB Group Report. Jiansheng Huang, Ameet Kini, Shrinivas Lakshmikant, Erik Paulson, Christine Reilly, Eric Robinson, Srinath Shankar, David DeWitt, Jeff Naughton. Overview. General overview of group projects (Naughton). Quill (Paulson). Condor DB Group. Overall task: - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Condor DB Group Report

1

The Condor DB Group Report

Jiansheng Huang, Ameet Kini, ShrinivasLakshmikant, Erik Paulson, Christine Reilly, Eric Robinson, Srinath Shankar, David DeWitt, Jeff Naughton

Page 2: The Condor DB Group Report

2

Overview

General overview of group projects (Naughton).

Quill (Paulson).

Page 3: The Condor DB Group Report

3

Condor DB Group

Overall task: Focus on data management aspects of

Condor Deliver prototypes of useful technology Explore, develop and evaluate technology

that may be useful to Condor down the road.

Page 4: The Condor DB Group Report

4

Projects other than Quill

Provenance in a Condor System. Statistical mining of log data to evaluate

system health. Interaction of user data placement, caching,

and workflow job scheduling. Job-machine matching in DB context. Condor functionality based on App-Server

technology. Recency and consistency in captured data.

Page 5: The Condor DB Group Report

5

Provenance and Condor

Christine Reilly ([email protected]). Provenance: information on how data was

produced. Observation: for each user job, Condor can record:

Which version of program(s) was used; Which version of data was used; When it was produced; What system it ran on (hardware, software.)

Questions: How much information should we gather? How much burden should we place on the system designer,

application programmer, or both?

Page 6: The Condor DB Group Report

6

Debugging through log mining

Srinivas Lakshmikant ([email protected]) Idea:

Record “events,” logically associated with entities. E.g., job entities start, get scheduled, run, terminate.

Find which entities have infrequent events. Find which entities lack frequent events.

Can you use this to detect problems? Early results suggest yes: finds and pinpoints

problems that might not be found otherwise. How can you increase the accuracy and

efficiency over naïve approaches?

Page 7: The Condor DB Group Report

7

Caching,Scheduling,Workflow

Srinath Shankar ([email protected]) Idea:

Cache input files and intermediate files on disks of pool machines;

Record where these files are cached; Schedule tasks in a workflow to minimize data

fetches/moves Result: potentially much greater throughput.

Page 8: The Condor DB Group Report

8

Job Matching in a DBMS

Ameet Kini ([email protected]) Idea: matching looks a lot like a DBMS

join. If machine and job data are already

stored in a DBMS, can we or should we use the DBMS to do the matching?

Answer: early results are promising but this is a non-trivial problem.

Page 9: The Condor DB Group Report

9

Recency of Quill Data

Jiansheng Huang ([email protected].) Problem: daemons report in at uncontrollable

and unpredictable times. Result: out of date and inconsistent data set. Can we provide the user with a concise

characterization of the recency of the sources relevant to a user query?

Note: surprisingly non-trivial to define what we mean by “relevant” in this setting.

Page 10: The Condor DB Group Report

10

App. Servers and Condor

Eric Robinson ([email protected]) Idea: applications servers provide a lot

of technology that appears useful in a Condor setting.

Approach: build prototype of some Condor functionality using these tools, evaluate the approach.

Page 11: The Condor DB Group Report

11

Moving on…

Further questions on these projects? Best bet is to contact student listed on each slide.

On to Quill portion of talk.

Page 12: The Condor DB Group Report

12

The Condor Quill

“Give me a condor's quill! Give me Vesuvius' crater for an ink stand. Friends, hold my arms! For in the mere act of penning my thoughts of this Leviathan, they weary me. . . To produce a mighty book you must choose a mighty theme.”

-Melville, Moby Dick

The Quill Developers

Page 13: The Condor DB Group Report

13

What is Quill?

A non-invasive method of storing a read-only version of the Condor operational data in a relational

database.

Page 14: The Condor DB Group Report

14

Job queue transaction

log

(job_queue.log)

Quill: In pictures

SchedD

QuillD

DBMSSchedD

Without QuillWith Quill

Job queue transaction

log

(job_queue.log)

Disk

Page 15: The Condor DB Group Report

15

Quill: Where we’ve been

First shipped in 6.7.11 (Sept 05) Now “over the fence” – Condor Team is

driving the 6.8 version Response from users very helpful! Lessons learned

Passive collection good DBMSes are full of surprises

Page 16: The Condor DB Group Report

16

Quill: Where we’d like to be

Shared databases Better job data Data from non-job sources More than just PostgreSQL DBMS Examples of usage

Page 17: The Condor DB Group Report

17

Quill in Condor 6.9.3

Development effort mostly complete Previous bullet points addressed Migration path for historical job data Out of the box changes for Quill users:

Horizontal and vertical schema for active jobs Jobs from multiple schedds in one database By default, no new historical data stored

Page 18: The Condor DB Group Report

18

Example tables

ScheddName Cluster Proc Owner JobStatus JobPrio Universe

north.cs.wisc.edu 23 2 epaulson IDLE 10 Vanilla

north.cs.wisc.edu 23 3 epaulson IDLE 10 Vanilla

south.cs.wisc.edu 13 2 jhuang RUN 5 Grid

north.cs.wisc.edu 13 2 miron HELD 30 Standard

ScheddName Cluster Proc Attr Value

north.cs.wisc.edu 23 2 WantIO TRUE

north.cs.wisc.edu 23 2 Group Database

north.cs.wisc.edu 23 3 Group Condor

south.cs.wisc.edu 13 2 Group Condor

Vertical Job Table

Horizontal Job Table

Page 19: The Condor DB Group Report

19

More job information

The lifecycle of the job would be nice to have Events like those in the “user log”

But, need more info than what’s in the job queue

Passive data collection works

Page 20: The Condor DB Group Report

20

Job queue.

log

Quill 6.9.3 diagram

SchedD

QuillD

DBMS Disk

event log

(new)

Schedd writes events to the new “Event” log, Quill daemon passively picks up the events and inserts them into the database.

For the schedd, event log contains userlog events and job history events

Page 21: The Condor DB Group Report

21

Examples

“Show me all the jobs that exited with a segfault that at some point ran on this machine”

“When my jobs get preempted, how long until they get matched again?”

“What is the average runtime for jobs for each different type of input file” SQL “GROUP by”

Page 22: The Condor DB Group Report

22

Collecting non-job information

SchedD

QuillD

DBMS Disk

event log

(new)

StartD

Negotiator

Page 23: The Condor DB Group Report

23

New information stored

StartD: Machine status Negotiator: Matches made Starter/Shadow: Files transferred Collector: “Submitter” ads All daemons: Generic Events, daemon

ads

Page 24: The Condor DB Group Report

24

The DBMSD

New daemon responsible for database housekeeping Only one needed per DBMS

Purges old data Three classes, independent thresholds

Resource: Machine classads Run: matches, job log events Job: condor_history information

Estimates size of database “Soft quota”, warn when exceeded

Page 25: The Condor DB Group Report

25

Multiple DBMS systems

Oracle supported Appears to need less maintenance

A nearly unified schema Main difference is large text fields Same binaries, DBMS type selectable via

configuration file

Page 26: The Condor DB Group Report

26

Example Usage

PHP web front end Good enough for some people Or, use as the basis for your own system

BoF on Thursday at 11:00am We’ll use the web front end to explain the

information Quill now stores