Top Banner
Demo II Ø In class on 4/6 and 4/8 Ø 12 min per team q 10-min presentation + 2-min Q&A Ø Substantial progress towards final demo. Ø Submit on Canvas before the class of your presentation. q Slides q Video as backup 1
53

Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Mar 06, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Demo II

Ø In class on 4/6 and 4/8Ø 12 min per team

q 10-min presentation + 2-min Q&A

Ø Substantial progress towards final demo.

Ø Submit on Canvas before the class of your presentation.

q Slides

q Video as backup

1

Page 2: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Parallel Real-Time Systems for Latency-Critical Applications

Chenyang LuCSE 520S

Page 3: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Cyber-Physical Systems (CPS)

3

Ø Since the application interacts with the physical world, its computation must be completed under a time constraint.

Ø CPS are built from, and depend upon, the seamless integration of computational algorithms and physical components. [NSF]

Cyber-PhysicalBoundary

^ Robert L. and Terry L. Bowen Large Scale Structures Laboratory at Purdue University

Real-Time Hybrid Simulation (RTHS)

Page 4: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Parallelism Improves RTHS Accuracy

4

A RTHS simulates a nine stories building, with first story damper

Ø Previously, sequential processing power limits a rate of 575HzØ Parallel execution now allows a rate of 3000Hz

Page 5: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Parallelism Improves RTHS Accuracy

5

A RTHS simulates a nine stories building, with first story damper

Ø Previously, sequential processing power limits a rate of 575HzØ Parallel execution now allows a rate of 3000Hz

Ø Reduction in error for acceleration and displacement

Ø Parallelism increases accuracy via faster actuation and sensing

Sequential (575 Hz)Parallel (3000 Hz)

Time (sec)

Nor

mal

ized

Erro

r (%

)

Page 6: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Cyber-Physical Systems (CPS)

6

Cyber-PhysicalBoundary

Page 7: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Interactive Cloud Services (ICS)

Need to respond within100ms for users to find responsive*.

7

Search the web

* Jeff Dean et al. (Google) "The tail at scale." Communications of the ACM 56.2 (2013)

2nd phase ranking

Snippet generator

doc

Doc. index search

Response

Query

Page 8: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Interactive Cloud Services (ICS)

Need to respond within100ms for users to find responsive*.

E.g., web search, online gaming, stock trading etc.

8* Jeff Dean et al. (Google) "The tail at scale." Communications of the ACM 56.2 (2013)

Search the web

Page 9: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Real-Time Systems

The performance of the systems depends not only upon theirfunctional aspects, but also upon their temporal aspects.

Real-time performance:

1) Provide hard guarantee of meeting jobs’ deadlines (e.g. CPS)2) Optimize latency-related objectives for jobs (e.g. ICS)

9

coressingle multi-core machine

jobs Job 1 Job 2 Job 3

Page 10: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

New Generation of Real-Time Systems

Characteristics:

Ø New classes of applications with complex functionalitiesØ Increasing computational demand of each application

Ø Consolidating multiple applications onto a shared platform Ø Rapid increase in the number of cores per chip

Demand: leverage parallelism within the applications, to improve real-time performance and system efficiency

10

coressingle multi-core machine

jobs

Page 11: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

State of the Art

Ø Real-time systemsq Schedule multiple sequential jobs on a single core

q Schedule multiple sequential jobs on multiple cores

Ø Parallel runtime systemsq Schedule a single parallel jobq Schedule multiple parallel jobs to optimize fairness or throughput

Ø New: parallel real-time systems for latency-critical applications

11

Page 12: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Challenges for Parallel Real-Time Systems

12

Develop provably good and practically efficient

real-time systems for parallel applications

TheoryHow to provide real-timeperformance for multipleparallel jobs?

SystemsHow to build parallel real-time systems that are efficient and scalable?

Page 13: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Parallel Job – Directed Acyclic Graph (DAG)

Naturally captures programs generated

by parallel languages such as Cilk Plus, Thread Building Blocks and OpenMP.

Node: sequential computation

Edge: dependence between nodes

Work Ci : execution time on one core

13

Ci = 18Li = 9

Page 14: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Naturally captures programs generated

by parallel languages such as Cilk Plus, Thread Building Blocks and OpenMP.

Node: sequential computation

Edge: dependence between nodes

Work Ci : execution time on one core

Span (critical-path length) Li : execution time on ∞ cores

Parallel Job – Directed Acyclic Graph (DAG)

Ci = 18Li = 9

14

Page 15: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Parallel Real-Time Task Model

A task periodically releases DAG jobs with deadlines.

Di = 12 Di = 12

Task 1

Job 1 Job 2

deadline Di = period

15

Page 16: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Parallel Real-Time Task Model

A task periodically releases DAG jobs with deadlines.

Di = 12 Di = 12

deadline Di = periodworst-case span Liworst-case work Ci

Task 1

Job 1 Job 2

16

Page 17: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Parallel Real-Time Task Model

A task periodically releases DAG jobs with deadlines.

Multiple tasks scheduled on multi-core system.

Goal of system: guarantee all tasks can meet all their deadlines.

Di = 9

Di = 12 Di = 12

Di = 9

Task 1

Task 2

17

Page 18: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Federated Scheduling

For parallel tasks, FS has the best bound in term of schedulability

FS assigns ni dedicated cores to each parallel task

ni – the minimum #cores needed for a task to meet its deadline

cores

• deadline Di = period• worst-case span Li• worst-case work Ci

18

tasks

ni =Ci − LiDi − Li

⎢⎢

⎥⎥

Page 19: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Empirical ComparisonFS platform

Ø Middleware platform providing FS service in LinuxØ Work with GNU OpenMP runtime system

Ø Run OpenMP programs with minimum modification

Compare with our Global Earliest Deadline First platform (GEDF)

19

• Linux kernel 3.10.5 with LITMUSRT patch

• 16-core machine with 2 Intel Xeon E5-2687W processors• GCC version 4.6.3. with OpenMP

• Each data point has 100 task sets• Each task is randomly generated

with parallel for-loops

Page 20: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.2 0.4 0.6 0.8Normalized System Utilization

GEDF

FS

Empirical Comparison

20

• Linux kernel 3.10.5 with LITMUSRT patch

• 16-core machine with 2 Intel Xeon E5-2687W processors• GCC version 4.6.3. with OpenMP

• Each data point has 100 task sets• Each task is randomly generated

with parallel for-loopsHarder to schedule

Better performance

normalizedsystem

utilization

Fraction of Task Sets Missing Deadlines

=

Ci

Dii∑m

m: #cores

Page 21: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Empirical Comparison

21

• Linux kernel 3.10.5 with LITMUSRT patch

• 16-core machine with 2 Intel Xeon E5-2687W processors• GCC version 4.6.3. with OpenMP

• Each data point has 100 task sets• Each task is randomly generated

with parallel for-loopsHarder to schedule

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.2 0.4 0.6 0.8Normalized System Utilization

GEDF

FS

52% tasks sets become schedulable

under FS

Better performance

Fraction of Task Sets Missing Deadlines

Page 22: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Summary of Federated Scheduling

For parallel real-time systems with guarantee of meeting deadlines, Federated Scheduling has:Ø the best theoretical bound in term of schedulability

Ø better empirical performance compared to GEDF

RTHS has used FS platform to improve system performance

cores

22

tasks

The End?

Page 23: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Issue with the Classic System Model

The classic system model uses the worst-case work for analysis.

The worst-case work is significantly larger than the average work.àThe average system utilization is very low in practice.

To guarantee that all tasks can meet all deadlines at all cases.

10ms

core 1

core 2

core 3

100ms

0 400 40

Very rare casesWork 100ms

Most casesWork 10ms

core 1

core 2

core 3

23

Page 24: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Mixed-Criticality in Cars

Features with different criticality levels:q Safety-critical features

q Infotainment features

Display system with Car Navigation and Infotainment24

Page 25: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Toy Example of MC System

High criticality task deadline 40ms

Low criticality task deadline 40ms

80mscore 1

core 20 40

Most-case work 80ms

10ms

core 1

core 2

core 3

100ms

core 10 400 40

Worst-case work 100msVery rare cases

Most-case work 10msMost cases

25

Page 26: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Most-Case vs. Worst-Case Scenarios

Single-criticality systems: need to model worst-case scenario

core 1

core 2

core 3

100ms

Very rare cases

80ms

0 40

core 4

core 5

core 1

core 2

core 3

Most cases

core 4

core 5

26

10ms

80ms

0 40

Page 27: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

MC Model Improves Resource Efficiency

Mixed-criticality system:Provide different levels of real-time guarantees

100ms

Very rare cases:only guarantee that high-criticality tasks meet deadlines

core 1

core 2

core 3

Most cases: guarantee that both high and low-criticality tasks meet deadlines

。。。

400 440

overrun10ms

80ms

0 40

27

Page 28: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

MCFS Algorithm at a High Level

For each parallel task, calculate and assign:

(1) dedicated cores in typical-state

Low-Criticalitym cores

High-Criticality

High-Criticality

dedicatedcores intypical-state

Typical-state(most cases)

28

Page 29: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

MCFS Algorithm at a High Level

For each parallel task, calculate and assign:

(1) dedicated cores in typical-state(2) dedicated cores in critical-state

High-Criticality

Low-Criticality

Critical-state(rare case)

Typical-state(most cases)

m cores

High-Criticality

High-Criticality

High-Criticality

29

Page 30: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

MCFS Algorithm at a High Level

For each parallel task, calculate and assign: (0) virtual deadline(1) dedicated cores in typical state(2) dedicated cores in critical state

If a job has not completed by its virtual deadline, it transitions to critical-state.

High-Criticality

Low-Criticality

Critical-state(rare case)

Typical-state(most cases)

m cores

High-Criticality

High-Criticality

High-Criticality

Virtual deadline

30

Page 31: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

MCFS Algorithm at a High Level

For each parallel task, calculate and assign: (0) virtual deadline(1) dedicated cores in typical state(2) dedicated cores in critical state

If a job has not completed by its virtual deadline, it transitions to critical-state.

High-Criticality

Low-Criticality

Critical-state(rare case)

Typical-state(most cases)

m cores

High-Criticality

High-Criticality

High-Criticality

Virtual deadline

31

MCFS jointly assigns virtual deadlines and cores to maximize utilization while guaranteeing task deadlines.

Page 32: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

MCFS ImplementationIn typical-state, MCFS assigns dedicated cores to all tasks.

coresLinux

MCFS

OpenMPRuntime

OpenMPRuntime

OpenMPRuntime

Low-CriticalityHigh-Criticality High-Criticality

HC thread

LC thread 32

Page 33: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

MCFS ImplementationIn critical-state, MCFS increases cores assigned to high-crit. tasks.

coresLinux

MCFS

OpenMPRuntime

OpenMPRuntime

OpenMPRuntime

Low-CriticalityHigh-Criticality High-Criticality

HC thread

more HC thread 33

Page 34: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

MCFS ImplementationPut additional HC threads to sleep on higher priority

coresLinux

MCFS

OpenMPRuntime

OpenMPRuntime

OpenMPRuntime

Low-CriticalityHigh-Criticality High-Criticality

HC thread

Sleeping HC thread LC thread 34

Page 35: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Empirical Evaluations

35

0 1 2 3 4

100%

80%

60%

40%

20%

0%• Linux with RT_PREEMPT patch version 4.1.7-rt8

• 16-core machine with 2 Intel Xeon E5-2687W processors• GCC version 4.6.3. with OpenMP

• Each data point has 100 task sets• Each task is randomly generated

with parallel for-loops

Page 36: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Empirical Evaluations

0 1 2 3 4

100%

80%

60%

40%

20%

0%

36

• Linux with RT_PREEMPT patch version 4.1.7-rt8

• 16-core machine with 2 Intel Xeon E5-2687W processors• GCC version 4.6.3. with OpenMP

• Each data point has 100 task sets• Each task is randomly generated

with parallel for-loops

Page 37: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Issue with the Analysis of Parallel Jobs

37

workerthreads

centralizedqueue

Centralized greedy scheduler

Ø Threads get work (nodes) from a centralized queue

Implicit assumption of parallel real-time scheduling theory:when a thread (core) is allowed to work on a job, it must be able to find the available nodes immediately

Bottleneck for scalabilityof large scale systems

(within bounded time)

Page 38: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Issue with the Analysis of Parallel Jobs

38

workerthreads

localqueues

randomlysteal

workerthreads

centralizedqueue

Centralized greedy schedulerØ Threads get work (nodes)

from a centralized queue

Randomized work-stealingØ Threads usually get work locally;

Ø If local queue is empty, it steals randomly from another queue

Predictable

Scalable Good scalabilityDoes not scale well

Unbounded worst-caseBounded worst-case

Page 39: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Empirical Comparisons

Randomized work-stealing for large-scale soft real-time system?

Ø FS Implementations (with scheduling overheads incorporated):q FSCG with centralized greedy scheduler in GNU OpenMP

q FSWS with randomized work-stealing in GNU Cilk Plus

39

• Linux with RT_PREEMPT patch version r14

• 32-core machine with 4 Intel Xeon E5-4620 processors• GCC 5.1 with OpenMP, Cilk Plus

• Each data point is one task set• Each task is randomly generated

using benchmark program Heat

Page 40: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Empirical Comparisons

Randomized work-stealing for large-scale soft real-time system?

40

20% 30% 40% 50% 56% 62% 71% 83%

Percentage of Utilization

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

De

ad

lin

e M

iss

Ra

tio

RTWSRTCG

Higher load

Better performance

FSCG and FSWS

Ø Same computationØ Same resources

Ø Only difference:internal scheduling of parallel tasks

FSWSFSCG

Page 41: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Empirical Comparisons

Randomized work-stealing for large-scale soft real-time system?

41

20% 30% 40% 50% 56% 62% 71% 83%

Percentage of Utilization

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

De

ad

lin

e M

iss

Ra

tio

RTWSRTCG

Better performance

FSCG and FSWS

Ø Same computationØ Same resources

Ø Only difference:internal scheduling of parallel tasks

FSWSFSCG

Page 42: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Empirical Comparisons

Randomized work-stealing for large-scale soft real-time system?

42

20% 30% 40% 50% 56% 62% 71% 83%

Percentage of Utilization

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

De

ad

lin

e M

iss

Ra

tio

RTWSRTCG

Better performance

The benefit of scalability in work-stealing dominates the increased

variation in parallel execution times.

FSCG and FSWS

Ø Same computationØ Same resources

Ø Only difference:internal scheduling of parallel tasks

FSWSFSCG

Page 43: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Outline

Ø Contributions

Ø System Guaranteed to Meet Deadlines for Parallel Jobs in CPS

Ø System Optimized to Meet Target Latency for ICS

Ø Future Work

43

Search the web

Page 44: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

System for Interactive Cloud Services

Online system: do not know when jobs arrive

Objective: optimize latency-related objectives for the servicee.g. , average latency, max latency

44

Search the web

Page 45: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

System for Interactive Cloud Services

Online system: do not know when jobs arrive

Objective: maximize the number of jobs that meet a target latency T

45

2nd phase ranking

Snippet generator

doc

Doc. index search

Query

Aggregator

Aggregator

Page 46: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Workload Distribution Has a Long Tail

46

Job Sequential Execution Time (ms)(work)

Bing searchworkload

Ø Large jobs must run in parallel to meet target latency

Ø Always run large jobs in full parallelism?

Target latency

Page 47: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Parallelize Large Jobs According to Load

Tail-Control Strategy: when load is low, run all jobs in parallel; when load is high, run large jobs sequentially.

Latency = Processing Time + Waiting time

At low load: processing time dominates latency

At high load:waiting time dominates latency

time

core 1

core 2

core 3

Miss 0 request

core 1

core 2

core 3time

Miss 1 request

47

target

target

Page 48: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

The Inner Workings of Tail-Control

We implement tail-control algorithm in the runtime system of Intel Thread Building Block and evaluate on Bing search workload.

48

Page 49: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

We implement tail-control algorithm in the runtime system of Intel Thread Building Block and evaluate on Bing search workload.

The Inner Workings of Tail-Control

Target Latency

49

default work-stealing≥

Page 50: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

The Inner Workings of Tail-Control

We implement tail-control algorithm in the runtime system of Intel Thread Building Block and evaluate on Bing search workload.

Target Latency

50

default work-stealing≥

Page 51: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

The Inner Workings of Tail-Control

We implement tail-control algorithm in the runtime system of Intel Thread Building Block and evaluate on Bing search workload.

Target Latency

51

default work-stealing≥

Page 52: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

Conclusion

Exploit the untapped efficiency in parallel computing platforms and drastically improve the real-time performance of applications.

Ø System Guaranteed to Meet Deadlines for CPSq Develop provably good schedulers for parallel applications

q Incorporate real-time scheduling into parallel runtime systemq Improve system efficiency by dealing with uncertainty in jobs

q Address system scalability issue due to internal scheduling

Ø System Optimized to Meet Target Latency for ICSq Design and implement strategy to optimize real-time performance

52

Page 53: Online Teaching - Washington University in St. Louislu/cse520s/slides/parallel.pdf · Online Teaching ØLectures are delivered live over Zoom at class time. qAlso recorded for offline

ReferencesØ J. Li, J-J Chen, K. Agrawal, C.Lu, C.D. Gill and A. Saifullah, Analysis of Federated and Global Scheduling for

Parallel Real-Time Tasks, Euromicro Conference on Real-Time Systems (ECRTS), 2014.

Ø J. Li, S. Dinh, K. Kieselbach, K. Agrawal, C. Gill and C. Lu, Randomized Work Stealing for Large Scale Soft Real-time Systems, IEEE Real-Time Systems Symposium (RTSS), 2016.

Ø J. Li, D. Ferry, S. Ahuja, K. Agrawal, C. Gill and C. Lu, Mixed-Criticality Federated Scheduling for Parallel Real-Time Tasks, IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), 2016.

Ø J. Li, Y. He, S. Elnikety, K.S McKinley, K. Agrawal, A. Lee and C. Lu, Work Stealing for Interactive Services to Meet Target Latency, ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), 2016.

53