Top Banner
ACAT 2019 14.03.2019 Design Pattern for Analysis Automation on Interchangeable, Distributed Resources using Luigi Analysis Workows . Marcel Rieger, Martin Erdmann l aw luigi analysis workflow
44

Marcel Rieger, Martin Erdmann

Apr 16, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Marcel Rieger, Martin Erdmann

ACAT 2019

14.03.2019

Design Pattern for Analysis Automation on Interchangeable,

Distributed Resources using Luigi Analysis Workflows .

Marcel Rieger, Martin Erdmann

lawluigi analysis workflow

Page 2: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 2 Motivational questions

● Portability: Does the analysis depend on ...

■ where it runs?

■ where it stores data?

▻ Execution/storage should not dictate code design!

● Reproducibility: When a M.Sc. / PhD / Postdoc leaves, ...

■ can someone else run the analysis?

■ is there a loss of information? Is a new framework required?

▻ Dependencies often only exist in the physicists head!

● Preservation: After an analysis is published ...

■ are people investing time to preserve their work?

■ can it be repeated after O(years)?

▻ Daily working environment should provide preservation features out-of-the-box!

WLCG

?

?

?

?

Page 3: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 3 Landscape of HEP analyses

● Scale: measure of resource consumption and amount of data

● Complexity: measure of granularity and inhomogeneity of workloads

● Future analyses likely to be large and complex,

bottlenecks:

■ Undocumented structure & requirements between workloads, only exists in the physicist’s head

■ Bookkeeping of data, revisions, …

■ Manual execution/steering of jobs

■ Error-prone & time-consuming

→ Analysis workflow management essential for future measurements!

ScaleCom

plex

ity

Computing infrastructures

(WLCG)

Good scripts & code structure

Single machine, single command

Analysis workflow

management

Page 4: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 4 Abstraction: analysis workflows

● Workflow, decomposable into particular workloads

● Workloads related to each other by common interface

■ In/outputs define directed acyclic graph (DAG)

● Alter default behavior via parameters

● Computing resources

■ Run location (CPU, GPU, WLCG, …)

■ Storage location (local, dCache, EOS, …)

● Software environment

● Collaborative development and processing

● Reproducible intermediate and final results

Selection

Reconstruction

MVA Split

MVA MVA Evaluation

Inference

MVA Training

Weights

Example

CPU

GPU

→ Reads like a checklist for analysis workflow management

Page 5: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 5 Example: ttbb cross section measurement

vispa

e_bdt_eval

e_bdt_roc

e_hist

e_syst_bdt_plot_btag

e_syst_bdt_plot_mistag

e_syst_bdt_plot_jer

e_syst_bdt_plot_jes

e_syst_bdt_plot_pdf

e_syst_bdt_plot_pileup

mu_data_cms

mu_data_lumi

mu_data_merge

sfb_coefficients

mu_data_tuple

e_data_tuple

mu_tuple

e_tuple

mu_enrich_fill

mu_enrich_old_weighter

mu_syst_scaleUp_tuple

mu_syst_scaleDown_tuple

e_syst_scaleUp_bdt

e_syst_scaleDown_bdt

mu_syst_matchingUp_tuple

mu_syst_matchingDown_tuple

e_syst_matchingUp_bdt

e_syst_matchingDown_bdt

mu_syst_btagUp_tuple

mu_syst_btagDown_tuple

e_syst_btagUp_bdt

e_syst_btagDown_bdt

mu_syst_mistagUp_tuple

mu_syst_mistagDown_tuple

e_syst_mistagUp_bdt

e_syst_mistagDown_bdt

mu_syst_jerUp_tuple

mu_syst_jerDown_tuple

e_syst_jerUp_bdt

e_syst_jerDown_bdt

mu_syst_jesUp_tuple

mu_syst_jesDown_tuple

e_syst_jesUp_bdt

e_syst_jesDown_bdt

mu_syst_pdfUp_tuple

mu_syst_pdfDown_tuple

e_syst_pdfUp_bdt

e_syst_pdfDown_bdt

mu_syst_pileupUp_tuple

mu_syst_pileupDown_tuple

e_syst_pileupUp_bdt

e_syst_pileupDown_bdt

mu_qcd_tuple

sfb_plot_coefficients

pu_weight pu_weight_plot

e_data_cms

e_data_lumi

e_data_mergee_syst_btag_hist

e_syst_mistag_hist

e_syst_jer_hist

e_syst_jes_hist

e_syst_pdf_hist

e_syst_pileup_hist

e_syst_scale_hist

e_syst_matching_hist

e_plot

mu_chain

crab_mc

sfb_tuple

crab_mc_merge

crab_data crab_data_lumi

mu_enrich_old_chain

mu_syst_scaleUp_chain

mu_syst_scaleDown_chain

mu_syst_matchingUp_chain

mu_syst_matchingDown_chain

mu_syst_btagUp_chain

mu_syst_btagDown_chain

mu_syst_mistagUp_chain

mu_syst_mistagDown_chain

mu_syst_jerUp_chain

mu_syst_jerDown_chain

mu_syst_jesUp_chain

mu_syst_jesDown_chain

mu_syst_pdfUp_chain

mu_syst_pdfDown_chain

mu_syst_pileupUp_chain

mu_syst_pileupDown_chain

crab_data_merge

mu_qcd_chain

mc_nominal_merge

mu_bdt_train__ttbb_vs_all__nominal

mu_bdt_train__ttbb_vs_all__enriched

mu_bdt_train__ttb_vs_ttbb__nominal

mu_bdt_train__ttbb_vs_ttb__nominal

mu_bdt_train__ttlight_vs_rest__nominal

mu_hist

mu_enrichment_comparison_plot

mu_bdt_eval

mu_syst_bdt_plot_pileup

mu_plot

mu_plot_no_toppt

mu_plot_mcOnly

mu_plot_mc_2d

mu_plot_mcOnly_ttlight

mu_plot_events

sfb_chain

mu_syst_scaleUp_bdt

mu_syst_scaleDown_bdt

mu_syst_matchingUp_bdt

mu_syst_matchingDown_bdt

mu_syst_btagUp_bdt

mu_syst_btagDown_bdt

mu_syst_mistagUp_bdt

mu_syst_mistagDown_bdt

mu_syst_jerUp_bdt

mu_syst_jerDown_bdt

mu_syst_jesUp_bdt

mu_syst_jesDown_bdt

mu_syst_pdfUp_bdt

mu_syst_pdfDown_bdt

mu_syst_pileupUp_bdt

mu_syst_pileupDown_bdt

mu_qcd_bdt

mu_bdt_train__ttbb_vs_all__nominal_plot

mu_bdt_train__ttbb_vs_all__enriched_plot

mu_bdt_train__ttb_vs_ttbb__nominal_plot

mu_bdt_train__ttbb_vs_ttb__nominal_plot

mu_bdt_train__ttlight_vs_rest__nominal_plot

e_bdt_roc_plot

th_combined

th_batch_prep

mc_scale_merge

mc_matching_merge

mu_syst_plot_pdf_comparison

mu_syst_plot_pdf_stack

mu_syst_plot_pdf_shape

mu_qcd_fit

mu_qcd_ratio

mu_qcd_scatter

mu_qcd_plot_mc_2d

th_qcd

mu_syst_btag_hist

mu_syst_mistag_hist

mu_syst_jer_hist

mu_syst_jes_hist

mu_syst_pdf_hist

mu_syst_pileup_hist

mu_syst_scale_hist

mu_syst_matching_hist

mu_qcd_hist_sr

mu_qcd_hist_sb_nominal

mu_qcd_hist_sb_variation

mc_nominal_das mc_nominal_cms

mc_ttbaralt_das mc_ttbaralt_cms mc_ttbaralt_merge

mc_scale_das mc_scale_cms

mc_matching_das mc_matching_cms

mu_data_das

e_data_das

pu_calc

mu_syst_bdt_plot_scale

mu_syst_bdt_plot_matching

mu_syst_bdt_plot_mistag

mu_syst_bdt_plot_jer

mu_syst_bdt_plot_jes

mu_syst_bdt_plot_pdf

mu_bdt_plot_output

mu_bdt_roc

mu_qcd_stack

mu_qcd_comparison

mu_bdt_roc_plot

sfb_plot_control

Robert Fischer

Page 6: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 5 Example: ttbb cross section measurement

vispa

e_bdt_eval

e_bdt_roc

e_hist

e_syst_bdt_plot_btag

e_syst_bdt_plot_mistag

e_syst_bdt_plot_jer

e_syst_bdt_plot_jes

e_syst_bdt_plot_pdf

e_syst_bdt_plot_pileup

mu_data_cms

mu_data_lumi

mu_data_merge

sfb_coefficients

mu_data_tuple

e_data_tuple

mu_tuple

e_tuple

mu_enrich_fill

mu_enrich_old_weighter

mu_syst_scaleUp_tuple

mu_syst_scaleDown_tuple

e_syst_scaleUp_bdt

e_syst_scaleDown_bdt

mu_syst_matchingUp_tuple

mu_syst_matchingDown_tuple

e_syst_matchingUp_bdt

e_syst_matchingDown_bdt

mu_syst_btagUp_tuple

mu_syst_btagDown_tuple

e_syst_btagUp_bdt

e_syst_btagDown_bdt

mu_syst_mistagUp_tuple

mu_syst_mistagDown_tuple

e_syst_mistagUp_bdt

e_syst_mistagDown_bdt

mu_syst_jerUp_tuple

mu_syst_jerDown_tuple

e_syst_jerUp_bdt

e_syst_jerDown_bdt

mu_syst_jesUp_tuple

mu_syst_jesDown_tuple

e_syst_jesUp_bdt

e_syst_jesDown_bdt

mu_syst_pdfUp_tuple

mu_syst_pdfDown_tuple

e_syst_pdfUp_bdt

e_syst_pdfDown_bdt

mu_syst_pileupUp_tuple

mu_syst_pileupDown_tuple

e_syst_pileupUp_bdt

e_syst_pileupDown_bdt

mu_qcd_tuple

sfb_plot_coefficients

pu_weight pu_weight_plot

e_data_cms

e_data_lumi

e_data_mergee_syst_btag_hist

e_syst_mistag_hist

e_syst_jer_hist

e_syst_jes_hist

e_syst_pdf_hist

e_syst_pileup_hist

e_syst_scale_hist

e_syst_matching_hist

e_plot

mu_chain

crab_mc

sfb_tuple

crab_mc_merge

crab_data crab_data_lumi

mu_enrich_old_chain

mu_syst_scaleUp_chain

mu_syst_scaleDown_chain

mu_syst_matchingUp_chain

mu_syst_matchingDown_chain

mu_syst_btagUp_chain

mu_syst_btagDown_chain

mu_syst_mistagUp_chain

mu_syst_mistagDown_chain

mu_syst_jerUp_chain

mu_syst_jerDown_chain

mu_syst_jesUp_chain

mu_syst_jesDown_chain

mu_syst_pdfUp_chain

mu_syst_pdfDown_chain

mu_syst_pileupUp_chain

mu_syst_pileupDown_chain

crab_data_merge

mu_qcd_chain

mc_nominal_merge

mu_bdt_train__ttbb_vs_all__nominal

mu_bdt_train__ttbb_vs_all__enriched

mu_bdt_train__ttb_vs_ttbb__nominal

mu_bdt_train__ttbb_vs_ttb__nominal

mu_bdt_train__ttlight_vs_rest__nominal

mu_hist

mu_enrichment_comparison_plot

mu_bdt_eval

mu_syst_bdt_plot_pileup

mu_plot

mu_plot_no_toppt

mu_plot_mcOnly

mu_plot_mc_2d

mu_plot_mcOnly_ttlight

mu_plot_events

sfb_chain

mu_syst_scaleUp_bdt

mu_syst_scaleDown_bdt

mu_syst_matchingUp_bdt

mu_syst_matchingDown_bdt

mu_syst_btagUp_bdt

mu_syst_btagDown_bdt

mu_syst_mistagUp_bdt

mu_syst_mistagDown_bdt

mu_syst_jerUp_bdt

mu_syst_jerDown_bdt

mu_syst_jesUp_bdt

mu_syst_jesDown_bdt

mu_syst_pdfUp_bdt

mu_syst_pdfDown_bdt

mu_syst_pileupUp_bdt

mu_syst_pileupDown_bdt

mu_qcd_bdt

mu_bdt_train__ttbb_vs_all__nominal_plot

mu_bdt_train__ttbb_vs_all__enriched_plot

mu_bdt_train__ttb_vs_ttbb__nominal_plot

mu_bdt_train__ttbb_vs_ttb__nominal_plot

mu_bdt_train__ttlight_vs_rest__nominal_plot

e_bdt_roc_plot

th_combined

th_batch_prep

mc_scale_merge

mc_matching_merge

mu_syst_plot_pdf_comparison

mu_syst_plot_pdf_stack

mu_syst_plot_pdf_shape

mu_qcd_fit

mu_qcd_ratio

mu_qcd_scatter

mu_qcd_plot_mc_2d

th_qcd

mu_syst_btag_hist

mu_syst_mistag_hist

mu_syst_jer_hist

mu_syst_jes_hist

mu_syst_pdf_hist

mu_syst_pileup_hist

mu_syst_scale_hist

mu_syst_matching_hist

mu_qcd_hist_sr

mu_qcd_hist_sb_nominal

mu_qcd_hist_sb_variation

mc_nominal_das mc_nominal_cms

mc_ttbaralt_das mc_ttbaralt_cms mc_ttbaralt_merge

mc_scale_das mc_scale_cms

mc_matching_das mc_matching_cms

mu_data_das

e_data_das

pu_calc

mu_syst_bdt_plot_scale

mu_syst_bdt_plot_matching

mu_syst_bdt_plot_mistag

mu_syst_bdt_plot_jer

mu_syst_bdt_plot_jes

mu_syst_bdt_plot_pdf

mu_bdt_plot_output

mu_bdt_roc

mu_qcd_stack

mu_qcd_comparison

mu_bdt_roc_plot

sfb_plot_control

vispa

e_bdt_eval

e_bdt_roc

e_hist

e_syst_bdt_plot_btag

e_syst_bdt_plot_mistag

e_syst_bdt_plot_jer

e_syst_bdt_plot_jes

e_syst_bdt_plot_pdf

e_syst_bdt_plot_pileup

mu_data_cms

mu_data_lumi

mu_data_merge

sfb_coefficients

mu_data_tuple

e_data_tuple

mu_tuple

e_tuple

mu_enrich_fill

mu_enrich_old_weighter

mu_syst_scaleUp_tuple

mu_syst_scaleDown_tuple

e_syst_scaleUp_bdt

e_syst_scaleDown_bdt

mu_syst_matchingUp_tuple

mu_syst_matchingDown_tuple

e_syst_matchingUp_bdt

e_syst_matchingDown_bdt

mu_syst_btagUp_tuple

mu_syst_btagDown_tuple

e_syst_btagUp_bdt

e_syst_btagDown_bdt

mu_syst_mistagUp_tuple

mu_syst_mistagDown_tuple

e_syst_mistagUp_bdt

e_syst_mistagDown_bdt

mu_syst_jerUp_tuple

mu_syst_jerDown_tuple

e_syst_jerUp_bdt

e_syst_jerDown_bdt

mu_syst_jesUp_tuple

mu_syst_jesDown_tuple

e_syst_jesUp_bdt

e_syst_jesDown_bdt

mu_syst_pdfUp_tuple

mu_syst_pdfDown_tuple

e_syst_pdfUp_bdt

e_syst_pdfDown_bdt

mu_syst_pileupUp_tuple

mu_syst_pileupDown_tuple

e_syst_pileupUp_bdt

e_syst_pileupDown_bdt

mu_qcd_tuple

sfb_plot_coefficients

pu_weight pu_weight_plot

e_data_cms

e_data_lumi

e_data_mergee_syst_btag_hist

e_syst_mistag_hist

e_syst_jer_hist

e_syst_jes_hist

e_syst_pdf_hist

e_syst_pileup_hist

e_syst_scale_hist

e_syst_matching_hist

e_plot

mu_chain

crab_mc

sfb_tuple

crab_mc_merge

crab_data crab_data_lumi

mu_enrich_old_chain

mu_syst_scaleUp_chain

mu_syst_scaleDown_chain

mu_syst_matchingUp_chain

mu_syst_matchingDown_chain

mu_syst_btagUp_chain

mu_syst_btagDown_chain

mu_syst_mistagUp_chain

mu_syst_mistagDown_chain

mu_syst_jerUp_chain

mu_syst_jerDown_chain

mu_syst_jesUp_chain

mu_syst_jesDown_chain

mu_syst_pdfUp_chain

mu_syst_pdfDown_chain

mu_syst_pileupUp_chain

mu_syst_pileupDown_chain

crab_data_merge

mu_qcd_chain

mc_nominal_merge

mu_bdt_train__ttbb_vs_all__nominal

mu_bdt_train__ttbb_vs_all__enriched

mu_bdt_train__ttb_vs_ttbb__nominal

mu_bdt_train__ttbb_vs_ttb__nominal

mu_bdt_train__ttlight_vs_rest__nominal

mu_hist

mu_enrichment_comparison_plot

mu_bdt_eval

mu_syst_bdt_plot_pileup

mu_plot

mu_plot_no_toppt

mu_plot_mcOnly

mu_plot_mc_2d

mu_plot_mcOnly_ttlight

mu_plot_events

sfb_chain

mu_syst_scaleUp_bdt

mu_syst_scaleDown_bdt

mu_syst_matchingUp_bdt

mu_syst_matchingDown_bdt

mu_syst_btagUp_bdt

mu_syst_btagDown_bdt

mu_syst_mistagUp_bdt

mu_syst_mistagDown_bdt

mu_syst_jerUp_bdt

mu_syst_jerDown_bdt

mu_syst_jesUp_bdt

mu_syst_jesDown_bdt

mu_syst_pdfUp_bdt

mu_syst_pdfDown_bdt

mu_syst_pileupUp_bdt

mu_syst_pileupDown_bdt

mu_qcd_bdt

mu_bdt_train__ttbb_vs_all__nominal_plot

mu_bdt_train__ttbb_vs_all__enriched_plot

mu_bdt_train__ttb_vs_ttbb__nominal_plot

mu_bdt_train__ttbb_vs_ttb__nominal_plot

mu_bdt_train__ttlight_vs_rest__nominal_plot

e_bdt_roc_plot

th_combined

th_batch_prep

mc_scale_merge

mc_matching_merge

mu_syst_plot_pdf_comparison

mu_syst_plot_pdf_stack

mu_syst_plot_pdf_shape

mu_qcd_fit

mu_qcd_ratio

mu_qcd_scatter

mu_qcd_plot_mc_2d

th_qcd

mu_syst_btag_hist

mu_syst_mistag_hist

mu_syst_jer_hist

mu_syst_jes_hist

mu_syst_pdf_hist

mu_syst_pileup_hist

mu_syst_scale_hist

mu_syst_matching_hist

mu_qcd_hist_sr

mu_qcd_hist_sb_nominal

mu_qcd_hist_sb_variation

mc_nominal_das mc_nominal_cms

mc_ttbaralt_das mc_ttbaralt_cms mc_ttbaralt_merge

mc_scale_das mc_scale_cms

mc_matching_das mc_matching_cms

mu_data_das

e_data_das

pu_calc

mu_syst_bdt_plot_scale

mu_syst_bdt_plot_matching

mu_syst_bdt_plot_mistag

mu_syst_bdt_plot_jer

mu_syst_bdt_plot_jes

mu_syst_bdt_plot_pdf

mu_bdt_plot_output

mu_bdt_roc

mu_qcd_stack

mu_qcd_comparison

mu_bdt_roc_plot

sfb_plot_control

Robert Fischer

Page 7: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 6

● Python package for building complex pipelines

● Development started at Spotify, now open-source and community-driven

1. Workloads defined as Task classes

2. Tasks require other tasks & output Targets

3. Parameters customize tasks and

control behavior

● Web interface, error handling, command line tools, task history, collaborative features, …

● github.com/spotify/luigi

Building blocks

Page 8: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 7 Luigi in a nutshell

> python reco.py Reconstruction --dataset ttJets

Page 9: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 8 make-like execution system

● Luigi’s execution model is make-like

1. Create dependency tree for triggered task

2. Determine tasks to actually run:

- Walk through tree (top-down)

- For each path, stop when all output targets of a task exist

● Only processes what is really necessary

● Error handling & automatic re-scheduling

● Clear & scalable through simple structure

triggered task

required task

dependency

Page 10: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 9 Example trees

Page 11: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 9 Example trees

Work of a B.Sc. student after 2 weeks ❗

Page 12: Marcel Rieger, Martin Erdmann

lawluigi analysis workflow

Page 13: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 11 law - luigi analysis workflow

● law: layer on top of luigi (i.e. it does not replace luigi)

● Software design follows 2 primary goals:

1. Scalability on HEP infrastructure (but not limited to)

2. Decoupling of run locations, storage locations & software environments

▻ No fixation on dedicated resources ▻ All components interchangeable

● Provides a toolbox to follow an analysis design pattern

■ No constraint on language or data structures

→ Not a framework!

lawluigi analysis workflow

Run location

Analysis

Storage location

Software environment

Page 14: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 12 law features (1)

1. Job submission

■ Idea: submission built into tasks, no need to write extra code

■ Currently supported job systems: HTCondor, LSF, gLite, ARC, (CRAB)

▻ Backend not hard-coded, selectable at runtime

■ Mandatory features

▻ Automatic resubmission, dashboard interface

■ From the htcondor_at_cern example:

lawluigi analysis workflow

lxplus129:law_test > law run CreateChars --version v1 --poll-interval 0.5 --workflow htcondorINFO: [pid 30564] Worker Worker(host=lxplus129.cern.ch, username=mrieger) running CreateChars(branch=-1, start_branch=0, end_branch=26, version=v1)going to submit 26 htcondor job(s)submitted 1/26 job(s)submitted 26/26 job(s)14:35:40: all: 26, pending: 26 (+26), running: 0 (+0), finished: 0 (+0), retry: 0 (+0), failed: 0 (+0)...14:37:10: all: 26, pending: 0 (+0), running: 26 (+26), finished: 0 (+0), retry: 0 (+0), failed: 0 (+0)14:37:40: all: 26, pending: 0 (+0), running: 10 (-16), finished: 16 (+16), retry: 0 (+0), failed: 0 (+0)14:38:10: all: 26, pending: 0 (+0), running: 0 (+0), finished: 26 (+10), retry: 0 (+0), failed: 0 (+0)INFO: [pid 30564] Worker Worker(host=lxplus129.cern.ch, username=mrieger) done!

lxplus129:law_test >

Page 15: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 13 law features (2)

2. Remote targets

■ Idea: work with remote files as if they were local

■ Remote targets built on top of GFAL2 Python bindings

▻ Supports all WLCG protocols (dCache, XRootD, GridFTP, SRM, ...) + DropBox

▻ API identical to local targets

■ Mandatory features

▻ Automatic retries, local caching

■ Example: working with files on EOS

lawluigi analysis workflow

“FileSystem” configuration

● Base path prefixed to all paths using this “fs”

● Configurable per file operation (stat, listdir, ...)

● Protected against removal of directories above

Page 16: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 13 law features (2)

2. Remote targets

■ Idea: work with remote files as if they were local

■ Remote targets built on top of GFAL2 Python bindings

▻ Supports all WLCG protocols (dCache, XRootD, GridFTP, SRM, ...) + DropBox

▻ API identical to local targets

■ Mandatory features

▻ Automatic retries, local caching

■ Example: working with files on EOS

lawluigi analysis workflow

“FileSystem” configuration

● Base path prefixed to all paths using this “fs”

● Configurable per file operation (stat, listdir, ...)

● Protected against removal of directories above

Reading remote files (json)

Page 17: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 13 law features (2)

2. Remote targets

■ Idea: work with remote files as if they were local

■ Remote targets built on top of GFAL2 Python bindings

▻ Supports all WLCG protocols (dCache, XRootD, GridFTP, SRM, ...) + DropBox

▻ API identical to local targets

■ Mandatory features

▻ Automatic retries, local caching

■ Example: working with files on EOS

lawluigi analysis workflow

“FileSystem” configuration

● Base path prefixed to all paths using this “fs”

● Configurable per file operation (stat, listdir, ...)

● Protected against removal of directories above

Reading remote files (json)Conveniently reading remote files (json)

Page 18: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 13 law features (2)

2. Remote targets

■ Idea: work with remote files as if they were local

■ Remote targets built on top of GFAL2 Python bindings

▻ Supports all WLCG protocols (dCache, XRootD, GridFTP, SRM, ...) + DropBox

▻ API identical to local targets

■ Mandatory features

▻ Automatic retries, local caching

■ Example: working with files on EOS

lawluigi analysis workflow

“FileSystem” configuration

● Base path prefixed to all paths using this “fs”

● Configurable per file operation (stat, listdir, ...)

● Protected against removal of directories above

Reading remote files (json)Conveniently reading remote files (json)Conveniently reading remote files

Page 19: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 13 law features (2)

2. Remote targets

■ Idea: work with remote files as if they were local

■ Remote targets built on top of GFAL2 Python bindings

▻ Supports all WLCG protocols (dCache, XRootD, GridFTP, SRM, ...) + DropBox

▻ API identical to local targets

■ Mandatory features

▻ Automatic retries, local caching

■ Example: working with files on EOS

lawluigi analysis workflow

“FileSystem” configuration

● Base path prefixed to all paths using this “fs”

● Configurable per file operation (stat, listdir, ...)

● Protected against removal of directories above

Reading remote files (json)Conveniently reading remote files (json)Conveniently reading remote files

Page 20: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 13 law features (2)

2. Remote targets

■ Idea: work with remote files as if they were local

■ Remote targets built on top of GFAL2 Python bindings

▻ Supports all WLCG protocols (dCache, XRootD, GridFTP, SRM, ...) + DropBox

▻ API identical to local targets

■ Mandatory features

▻ Automatic retries, local caching

■ Example: working with files on EOS

lawluigi analysis workflow

“FileSystem” configuration

● Base path prefixed to all paths using this “fs”

● Configurable per file operation (stat, listdir, ...)

● Protected against removal of directories above

Reading remote files (json)Conveniently reading remote files (json)Conveniently reading remote files

Page 21: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 14 law features (3)

3. Environment sandboxing

■ Diverging software requirements between typical workloads is a great feature / challenge / problem

■ Introduce sandboxing:

▻ Run entire task in different environment

■ Existing sandbox implementations:

▻ Sub-shell with init file

▻ Docker images

▻ Singularity images

docker::imgA

docker::imgB

shell::myEnv.sh

Singularity

singularity::imgC

lawluigi analysis workflow

Page 22: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 15 law in action

> python reco.py Reconstruction --dataset ttJets

☐ luigi task ☐ law task ☐ Run on HTCondor ☐ Store on EOS ☐ Run in docker

Page 23: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 15 law in action

> python reco.py Reconstruction --dataset ttJets > law run Reconstruction --dataset ttJets

☐ luigi task ☐ law task ☐ Run on HTCondor ☐ Store on EOS ☐ Run in docker

Page 24: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 15 law in action

> python reco.py Reconstruction --dataset ttJets > law run Reconstruction --dataset ttJets > law run Reconstruction --dataset ttJets --workflow htcondor

☐ luigi task ☐ law task ☐ Run on HTCondor ☐ Store on EOS ☐ Run in docker

Page 25: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 15 law in action

> python reco.py Reconstruction --dataset ttJets > law run Reconstruction --dataset ttJets > law run Reconstruction --dataset ttJets --workflow htcondor

☐ luigi task ☐ law task ☐ Run on HTCondor ☐ Store on EOS ☐ Run in docker

Page 26: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 15 law in action

> python reco.py Reconstruction --dataset ttJets > law run Reconstruction --dataset ttJets > law run Reconstruction --dataset ttJets --workflow htcondor

☐ luigi task ☐ law task ☐ Run on HTCondor ☐ Store on EOS ☐ Run in docker

Page 27: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 16 Successful applications

● ttH analysis at CMS (JHEP 03 (2019) 026)

■ Large-scale:

▻ ∼100 TB of storage, ∼500k tasks

■ Complex:

▻ DNNs/BDTs/MEM

▻ ∼80 systematic variations

■ Distributed:

▻ 7 CEs, (GPU) clusters, local machines

▻ 2 SEs (dCache), local disk, Dropbox, CERNBox

■ Clear separation of duties within group

■ Entire analysis operable by everyone at any time

● DeepCSV + DeepJet b-tagging scale factors at CMS

● Multiple theses

......

......

2.2 The tt̄H Process at Hadron Colliders

Under the assumption that the Higgs boson decay occurs perpendicular to the direction of itsmotion, the spatial angle between the two jets in the observer’s reference frame at typical mo-menta of pH = 100 GeV amounts to f ⇡ 103�. Therefore, one can estimate that in the majorityof cases the two jets exhibit a sufficiently large spatial separation, allowing for their resolvedmeasurement and identification based on features of displaced secondary vertices.

Signal processes in which the Higgs boson decays into particles other than a pair of bottomquarks, such as H ! W+W� and H ! t+t�, are taken into account in the following. Despitetheir minor expected yield due to smaller branching ratios and different final-state signature,events of those processes can potentially pass phase space selection criteria and contribute tothe total number of tt̄H signal events.

Moreover, the decay of the tt̄ system is considered in the single-lepton and dilepton decaychannels (cf. Section 2.2.3). A corresponding leading-order Feynman diagram is presented inFig. 2.9a. It should be noted that more diagrams exist to describe tt̄H production. An exampleis the production of a pair of top quarks (cf. Section 2.2.3) where one top quark emits a Higgsboson. Similarly to generic tt̄ production, gluon-initiated processes have the largest contributionto the total tt̄H cross section at

ps = 13 TeV [33].

g

g

b̄n̄

q̄0, l+

t

t

t̄H

t̄ W�

W+

b

b

l�

q, n

(a)

g

g

b̄n̄

q̄0, l+

t

t

t̄g

t̄W�

W+

b

b

l�

q, n

(b)

Figure 2.9: Feynman diagrams showing tt̄H (H ! bb̄) (a) and tt̄+bb̄ production (b) in thesingle-lepton and dilepton tt̄ decay channels. Their final state is identical and despite differ-ent spin and color charge relations, the event topology is quite similar. It should be noted thatmore possible diagrams exist for both processes.

In total, the measurable final state consists of six jets and an isolated lepton in the single-lepton,and four jets and two isolated leptons with opposite charge in the dilepton channel, respectively.In both cases, four jets are supposed to originate from b-hadron decays and a significant amountof missing transverse energy is expected due to the non-detectable neutrinos. Given the highcombinatorial complexity due the number of jets and the typical detector resolution of jet ob-servables, the full reconstruction of the event is rather challenging. The net cross section is

stt̄H,bb̄,SL+DL = stt̄H · BRH!bb̄ · BRtt̄,SL+DL = 98.4 +6.9�9.9 fb, (2.50)

which corresponds to ⇠ 3500 produced events in the dataset recorded by the CMS detector in2016 with an integrated luminosity of 35.9 fb�1.

23

Page 28: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 17 Summary

● HEP analyses likely to increase in scale and complexity

■ Analysis workflow management essential

■ Need for toolbox providing a design pattern, not a framework

● Luigi is able to model even complex workflows

● Law adds convenience & scalability in the HEP context

● All information transparently encoded in tasks, targets & dependencies

● Aim for out-of-the-box preservation

● github.com/riga/law, law.readthedocs.io

{Singularity

WLCGlawluigi analysis workflowSoftware

environment

Run location

Storage location

Page 29: Marcel Rieger, Martin Erdmann

Backup

Page 30: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 19 Links

● law - luigi analysis workflow

■ Repository ☞ github.com/riga/law

■ Paper ☞ arXiv:1706.00955 (CHEP16 proceedings)

■ Documentation ☞ law.readthedocs.io (in preparation)

■ Minimal example ☞ github.com/riga/law/tree/master/examples/loremipsum

■ HTCondor example ☞ github.com/riga/law/tree/master/examples/htcondor_at_cern

■ Contact ☞ Marcel Rieger

● luigi - Powerful Python pipelining package (by Spotify)

■ Repository ☞ github.com/spotify/luigi

■ Documentation ☞ luigi.readthedocs.io

■ “Hello world!” ☞ github.com/spotify/luigi/blob/master/examples/hello_world.py

● Technologies

■ GFAL2 ☞ dmc.web.cern.ch/projects/gfal-2/home

■ Docker ☞ docker.com

■ Singularity ☞ singularity.lbl.gov

Page 31: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 20 order: structure external HEP data

● Pythonic class collection to order “soft”, external HEP data

■ physics processes & cross sections

■ campaigns & datasets

■ channels & categories

■ variables & systematics

● Some data could be centrally managed, some is analysis specific

● Run the example:

● Use as data backend: > law run Reconstruction --dataset ttH125_bb --...

github.com/riga/order

Page 32: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 21 Thoughts on HEP analyses

● What is a framework?

→ Bash scripts, python tools, crab configs, CMSSW modules, magic

→ Connections mostly exist in the physicists head

● Documentation?

→ Not the most beloved hobby in the physics community

● When a M.Sc. / PhD / Postdoc leaves ...

→ Can someone else run the analysis?

→ Is this information lost? Is a new framework required?

● Does execution dictate code design?

→ Does the analysis depend on where it runs?

● From my experience: ⅔ of time required for technicalities, ⅓ for physics

→ Physics output doubled if it was the other way round?

Page 33: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 22 Existing WMS: MC production

● Structure known in advance

● Workflows static & recurring

● One-dimensional design

● Special infrastructures

● Homogeneous software requirements

Generator Showering Simulation Digitization Reco

Tailored systems

→ Requirements for HEP analyses mostly orthogonal

Page 34: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 22 Existing WMS: MC production

● Structure known in advance

● Workflows static & recurring

● One-dimensional design

● Special infrastructures

● Homogeneous software requirements

Generator Showering Simulation Digitization Reco

Tailored systems Wishlist for end-user analyses

● Structure “iterative”, a-priori unknown

● Dynamic workflows, fast R&D cycles

● Tree design, arbitrary dependencies

● Incorporate existing infrastructure

● Use custom software, everywhere

→ Requirements for HEP analyses mostly orthogonal

Page 35: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 23 WMS comparison

→ Existing WMS highly specialized for designated use case→ Requirements for HEP analyses mostly orthogonal

Existing WMSe.g. MC Management Generic Analysis WMS

Development Processfinal objective

known in advanceiterative, final composition

a priori unknown

Workflow Structurechain structure,

mostly one-dimensionaltree structure,

arbitrarily branched

Evolutionstatic over time,

recurrent executiondynamic,

fast R&D cycles

Infrastructurespecially tailored,

e.g. storage systems, DBsincorporate existing,

quickly adapt to changes

Applicability tuned to particular use caseflexible, able to model every possible workflow

Page 36: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 24 Achievements

1. Toolbox providing building blocks for analyses

→ Design pattern, not a framework (no constraint on language or data structure)

→ Full decoupling of run locations, storage locations and software environments

2. All information transparently encoded in tasks, targets & dependencies

→ Results reproducible by developer, groups, collaboration, ...

→ Analysis preservation out-of-the-box

3. make-like execution across distributed resources

→ Reduces overhead of manual management

→ Improves cycle times & error-proneness

→ Changed paradigm from executing to defining an analysis → Move focus back to physics

Page 37: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 25 A typical example: ML workflow with uncertainties

Reconstruction

MVA Split

MVA MVA Evaluation

Inference

MVA Training

...

...

train test evaluate

weights

Nominal MC

Page 38: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 25 A typical example: ML workflow with uncertainties

Reconstruction

MVA Split

MVA MVA Evaluation

Inference

MVA Training

...

...

train test evaluate

weights

real data

Nominal MCData

Page 39: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 25 A typical example: ML workflow with uncertainties

Reconstruction

MVA Split

MVA MVA Evaluation

Inference

MVA Training

...

...

train test evaluate

weightsMC with systematic derived from nominal

sample

Nominal MCDataMC, Syst. I

Page 40: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 25 A typical example: ML workflow with uncertainties

Reconstruction

MVA Split

MVA MVA Evaluation

Inference

MVA Training

...

...

train test evaluate

weights

MC with systematic generated from

new events

Nominal MCDataMC, Syst. IMC, Syst. II

Page 41: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 26 luigi/law architecture

Load .dependencies

Task Tree(Workers)

Network Local Remote

User

Central Scheduler

Analysis & Task Classes

Input / OutputTargets

Workers Software & Images

Command-lineInterface

Register Tasks

Next task?

Read

LoadSubmit as job

Poll status

Write Read Write

1

Page 42: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19

Scenario A: file not cached yet

27 Local caching

Remote storage (e.g. eos)

Remote

Local machine

law/python process Local cache

PWD /tmp

" Need to access file “a.root” (has unique, path-dep. hash X)

Local requestRemote request

# File “a.root” with hash X in cache with latest mtime? → no

$ S

tat fil

e “a

.root

”%

Download “a.root”

& Return local path in cache

' Store “a.root” using hash X

( Work with local file ) Change mtime of file to value from stat (see $ )

Page 43: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19

Scenario B: file already cached

28 Local caching

Remote storage (e.g. eos)

Remote

Local machine

law/python process Local cache

PWD /tmp

" Need to access file “a.root” (has unique, path-dep. hash X)

Local requestRemote request

# File “a.root” with hash X in cache with latest mtime? → yes

$ S

tat fil

e “a

.root

% Return local path in cache' Work with local file

Page 44: Marcel Rieger, Martin Erdmann

Marcel Rieger - 14.3.19 29 CLI