Architecting Solutions for the Manycore Future Talbott Crowell ThirdM
May 10, 2015
Architecting Solutions for the Manycore Future
Talbott CrowellThirdM
This talk will focus solution architects toward thinking about parallelism when designing applications and solutions Threads vs. Tasks using TPL
LINQ vs. PLINQ
Object Oriented vs. Functional Programming
This talk will also compare programming languages, how languages differ when dealing with manycore programming, and the different advantages to these languages.
Abstract
manycore
Patrick Gelsinger, Intel VP February 2001, San Francisco, CA
2001 IEEE International Solid-State Circuits Conference (ISSCC)
If scaling continues at present pace, by 2005, high speed processors would have power density of nuclear reactor, by 2010, a rocket nozzle, and by 2015, surface of sun.
Intel stock dropped 8% on the next day
“Business as usual will not work in the future.”
The Power Wall: CPU Clock Speed
From Katherine Yelick’s “Multicore: Fallout of a Hardware Revolution”
Multicore->
Manycore->
Single core->
In 1966, Gordon Moore predicted exponential growth in number of transistors per chip based on the trend from 1959 to 1965
Clock frequencies continued to increase exponentially until they hit the power wall in 2004 at around 3 to 4 GHz 1971, Intel 4004 (first single-chip CPU) – 740 kHz
1978, Intel 8086 (orgin of x86) – 4.77 MHz
1985, Intel 80386DX – 16 MHz
1993, Pentium P5 – 66 MHz
1998, Pentium II – 450 MHz
2001, Pentium II (Tualatin) – 1.4 GHz
2004, Pentium 4F – 3.6 GHz 2008, Core i7 (Extreme) – 3.3 GHz
Intel is now doubling cores along with other improvements to continue to scale
Effect of the Power WallThis trend
continues even today
The Power Wall
Enter Manycore
Manycore, What is it?
Manycore, Why should I care?
Manycore, What do we do about it? Frameworks
Task Parallel Library (Reactive Extensions and .NET 4) Languages, paradigms, and language extensions
F#, functional programming, LINQ, PLINQ Tools
Visual Studio 2010 Tools for Concurrency
Agenda: Manycore Future
What is Manycore?
Single core: 1 processor on a chip die (1 socket) Many past consumer and server CPU’s (some current CPU’s for lightweight
low power devices)
Including CPU’s that support hyperthreading, but this is a grey area
Multicore: 2 to 8 core processors per chip/socket AMD Athlon 64 X2 (first dual-core desktop CPU released in 2005)
Intel Core Duo, 2006 (32 bit, dual core, for laptops only) Core Solo was a dual core chip with one that doesn’t work
Intel Core 2 (not multicore, instead a brand for 64 bit arch) Core 2 Solo (1 core) Core 2 Duo (2 cores) Core 2 Quad (4 cores)
Manycore: more than 8 cores per chip Currently prototypes and R&D
Manycore, What is it?
High-end Servers 2001-2004 IBM Servers 2001 - IBM POWER4 PowerPC for AS/400 and RS/6000 “world's first non-
embedded dual-core processor” Sun Servers 2004 - UltraSpark IV – “first multicore SPARC processor”
Desktops/Laptops 2005-2006 AMD Athlon 64 X2 (Manchester) May 2005 “first dual-core desktop CPU” Intel Core Duo, Jan 2006
Intel Pentium (Allendale) dual core Jan 2007
Windows Servers 2006 Intel Xeon (Paxville) dual core Dec 2005
AMD Opteron (Denmark) dual core March 2006
Intel Itanium 2 (Montecito) dual core July 2006
Sony Playstation 3 – 2006 9 core Cell Processor (only 8 operational) - Cell architecture jointly developed by Sony,
Toshiba, and IBM
Multicore trends from servers to gaming consoles
Power Mac G5 - Mid 2003 2 x 1 core (single core) IBM PowerPC 970
Mac Pro - Mid 2006 2 x 2 core (dual core) Intel Xeon (Woodcrest)
Mac Pro - Early 2008 2 x 4 core (quad core) Intel Xeon (Harpertown)
In 5 years number of cores doubled twice on Apple’s high end graphics workstation From 2 to 4 to 8
Macintosh multicore trend
The chip is just designed for research efforts at the moment, according to an Intel spokesperson.
"There are no product plans for this chip. We will never sell it so there won't be a price for it," the Intel spokesperson noted in an e-mail. "We will give about a hundred or more to industry partners like Microsoft and academia to help us research software development and learn on a real piece of hardware, [of] which nothing of its kind exists today." http://redmondmag.com/articles/2009/12/04/intel-unveils-48-core-cloud-
computer-chip.aspx
Microsoft said it had already put SCC into its development pipeline so it could exploit it in the future. http://news.bbc.co.uk/2/hi/technology/8392392.stm
48 Core Single-chip Cloud Computer (SCC)
Why should I care?(about Manycore)
Hardware is changing Programming needs to change to take advantage
of new hardware
Concurrent Programming
Paradigm Shift Designing applications
Developing applications
Manycore, Why should I care?
“The computer industry is once again at a crossroads. Hardware concurrency, in the form of new manycore processors, together with growing software complexity, will require that the technology industry fundamentally rethink both the architecture of modern computers and the resulting software development paradigms.” Craig Mundie
Chief Research and Strategy OfficerMicrosoft CorporationJune 2008 First paragraph of the Forward of Joe Duffy’s preeminent
tome “Concurrent Programming on Windows”
Concurrent Programming
Excerpt from Mark Reinhold’s Blog post: November 24, 2009 The free lunch is over.
Multicore processors are not just coming—they’re here.
Leveraging multiple cores requires writing scalable parallel programs, which is incredibly hard.
Tools such as fork/join frameworks based on work-stealing algorithms make the task easier, but it still takes a fair bit of expertise and tuning.
Bulk-data APIs such as parallel arrays allow computations to be expressed in terms of higher-level, SQL-like operations (e.g., filter, map, and reduce) which can be mapped automatically onto the fork-join paradigm.
Working with parallel arrays in Java, unfortunately, requires lots of boilerplate code to solve even simple problems.
Closures can eliminate that boilerplate.
“It’s time to add them to Java.” http://blogs.sun.com/mr/entry/closures
“There’s not a moment to lose!”
Herb Sutter 2005 Programs are not doubling in speed every couple
of years for free anymore
We need to start writing code to take advantage of many cores
Currently painful and problematic to take advantage of many cores because of shared memory, locking, and other imperative programming techniques
“The Free Lunch Is Over”
Is this just hype?
Another Y2K scare?
Fact: CPU’s are changing
Programmers will learn to exploit new architectures
Will you be one of them?
Wait and see? You could just wait and let the tools catch up so you
don’t have to think about it. Will that strategy work?
Should you be concerned?
Just tools or frameworks will not solve the manycore problem alone
Imperative programming by definition has limitations scaling in a parallel way Imperative programming (C, C++, VB, Java, C#)
Requires locks and synchronization code to handle shared memory read/write transactions
Not trivial Difficult to debug
Tools and frameworks may help, but will require different approach to the problem (a different paradigm) to really take advantage of the tools
The Core Problem
Some frameworks are designed to be single threaded, such as ASP.NET Best practices for ASP.NET applications recommend
avoiding spawning new threads
ASP.NET and IIS handle the multithreading and multiprocessing to take advantage of the many processors (and now many cores) on Web Servers and Application Servers
Will this best practice remain true? Even when server CPU’s have hundreds or
thousands of cores?
Will it affect all programmers?
What do we do about it?
(How do we prepare for Manycore)
Identify where the dependencies are
Identify where you can parallelize
Understand the tools, techniques, and approaches for solving the pieces
Put them together to understand overall performance POC – Proof of Concept
Test, test, test
Performance goals up front
Understand Problem Domain
Frameworks Task Parallel Library (TPL)
Reactive Extensions for .NET 3.5 (Rx) Used to be called Parallel Extensions or PFx
Baked into .NET 4
Programming paradigms, languages, and language extensions Functional programming
F#
LINQ and PLINQ
Tools Visual Studio 2010 Tools for Concurrency
Manycore, What do we do about it?
Parallelism vs. Concurrency
Task vs. Data Parallelism
Parallel Programming Concepts
Concurrency or Concurrent computing Many independent requests
Web Server, works on multi-threaded single core CPU Separate processes that may be executed in parallel
More general than parallelism
Parallelism or Parallel computing Processes are executed in parallel simultaneously
Only possible with multiple processors or multiple cores
Yuan Lin: compares to black and white photography vs. color, one is not a superset of the other http://www.touchdreams.net/blog/2008/12/21/more-on-concurrency-vs-parallelism/
Parallelism vs. Concurrency
Task Parallelism (aka function parallelism and control parallelism) Distributing execution processes
(threads/functions/tasks) across different parallel computing nodes (cores)
http://msdn.microsoft.com/en-us/library/dd537609(VS.100).aspx
Data Parallelism (aka loop-level parallelism) Distributing data across different parallel computing
nodes (cores)
Executing same command over every element in a data structure
http://msdn.microsoft.com/en-us/library/dd537608(VS.100).aspx
Task vs. Data Parallelism
See MSDN for .NET 4, Parallel Programming, Data/Task Parallelism
Task Parallel Libarary
Parallel Programming in the .NET Framework 4 Beta 2 - TPL
Reference System.Threading Use Visual Studio 2010 or .NET 4
For Visual Studio 2008 Download unsupported version for .NET 3.5 SP1 from Reactive
Extensions for .NET (Rx) http://msdn.microsoft.com/en-us/devlabs/ee794896.as
px
Create a “Task”
How to use the TPL
FileStream fs = new FileStream(fileName, FileMode.CreateNew);
var task = Task.Factory.FromAsync(fs.BeginWrite, fs.EndWrite, bytes, 0, bytes.Length, null);
Use Task class
Task Parallelism with the TPL
// Create a task and supply a user delegate // by using a lambda expression.var taskA = new Task(() =>
Console.WriteLine("Hello from taskA."));
// Start the task.taskA.Start();
// Output a message from the calling thread.Console.WriteLine("Hello from the calling thread.");
Task<TResult>
Getting return value from a Task
Task<double>[] taskArray = new Task<double>[]{ Task<double>.Factory.StartNew(() => DoComputation1()),
// May be written more conveniently like this: Task.Factory.StartNew(() => DoComputation2()), Task.Factory.StartNew(() => DoComputation3())};
double[] results = new double[taskArray.Length];for (int i = 0; i < taskArray.Length; i++) results[i] = taskArray[i].Result;
Task resembles new thread or ThreadPool work item, but higher level of abstraction
Tasks provide two primary benefits over Threads: More efficient and scalable use of system
resources
More programmatic control than is possible with a thread or work item
Tasks vs. Threads
Behind the scenes, tasks are queued to the ThreadPool ThreadPool now enhanced with algorithms (like hill-climbing)
that determine and adjust to the number of threads that maximizes throughput.
Tasks are relatively lightweight You can create many of them to enable fine-grained parallelism. To complement this, widely-known work-stealing algorithms are
employed to provide load-balancing..
Tasks and the framework built around them provide a rich set of APIs that support waiting, cancellation, continuations, robust exception handling, detailed status, custom scheduling, and more.
Tasks
Instead of:
Use:
Data Parallelism with the TPL
for (int i = 0; i < matARows; i++) { for (int j = 0; j < matBCols; j++) { ... }}
Parallel.For(0, matARows, i => { for (int j = 0; j < matBCols; j++) { ... }}); // Parallel.For
Use Tasks not Threads
Use Parallel.For in Data Parallelism scenarios
Or… Use Async Workflosw from F#, covered later
Use PLINQ, covered later
TPL Summary
Functional Programming
1930’s: lambda calculus (roots)
1956: IPL (Information Processing Language) “the first functional language”
1958: LISP “a functional flavored language”
1962: APL (A Programming Language)
1973: ML (Meta Language)
1983: SML (Standard ML)
1987: Caml (Categorical Abstract Machine Language ) and Haskell
1996: OCaml (Objective Caml)
2005: F# introduced to public by Microsoft Research
2010: F# is “productized” in the form of Visual Studio 2010
Functional programming has been around a long time (over 50 years)
Most functional languages encourage programmers to avoid side effects
Haskell (a “pure” functional language) restricts side effects with a static type system
A side effect Modifies some state
Has observable interaction with calling functions
Has observable interaction with the outside world
Example: a function or method with no return value
Functional programming is safe
1 2 3 4 5 6 7 8 9 10 110
2
4
6
8
10
12
C#NirvanaHaskellLINQ
Safety
Usefulness
C#, VB, Java LINQ Nirvana
Haskell
Language Evolution (Simon Payton-Jones)
http://channel9.msdn.com/posts/Charles/Simon-Peyton-Jones-Towards-a-Programming-Language-Nirvana/
C#, VB, Java, C are imperative programming languages. Very useful but can change the state of the world at anytime creating side effects.
Nirvana! Useful
and Safe
Haskell is Very Safe, but not very useful. Used heavily in research and academia, but rarely in business.
F#
When a function changes the state of the program Write to a file (that may be
read later)
Write to the screen
Changing values of variables in memory (global variables or object state)
Side Effect
Compare SQL to your favorite imperative programming language If you write a statement to store and query your
data, you don’t need to specify how the system will need to store the data at a low level Example: table partitioning
LINQ is an example of bringing functional programming to C# and VB through language extensions
Functional Programming
Use lots of processes
Avoid side effects
Avoid sequential bottlenecks
Write “small messages, big computations” code
Efficient Multicore Programming
Source: Joe Armstrong’s “Programming Erlang, Software for a Concurrent World”Section 20.1 “How to Make Programs Run Efficiently on a Multicore CPU”
F#
Functional language developed by Microsoft Research By Don Syme and his team, who productized Generics
Based on OCaml (influenced by C# and Haskell)
History 2002: F# language design started
2005 January: F# 1.0.1 releases to public Not a product. Integration with VS2003 Works in .NET 1.0 through .NET 2.0 beta, Mono
2005 November: F# 1.1.5 with VS 2005 RTM support
2009 October: VS2010 Beta 2, CTP for VS2008 & Non-Windows users
2010: F# is “productized” and baked into VS 2010
What is F#
Multi-Paradigm Functional Programming
Imperative Programming
Object Oriented Programming
Language Oriented Programming
F# is not just Functional
Parallel Computing and PDC09
Parallel Pattern Library
Resource Manager
Task Scheduler
Task Parallel Library
Parallel LINQ
Threads
Native Concurrency Runtime
Managed Libraries
ThreadPool
Data
Stru
ctu
res
Data
Str
uctu
res
Tools
AsyncAgentsLibrary
UMS Threads
MicrosoftResearch
Visual Studio 2010
ParallelDebugger Windows
Profiler Concurrency
Analysis
Race Detection
Fuzzing
AxumVisual F#
Managed Languages
Rx
Native Libraries
Managed Concurrency Runtime
DryadLINQ
Key: Research / Incubation Visual Studio 2010 / .NET 4 Windows 7 / Server 2008 R2
HPC ServerOperating System
Functional programming has been around a long time Not new
Long history
Functional programming is safe A concern as we head toward manycore and cloud
computing
Functional programming is on the rise
Why another language?
Parallel Programming in the .NET Framework 4 Beta 2 - PLINQ
“F# is, technically speaking, neutral with respect to concurrency - it allows the programmer to exploit the many different techniques for concurrency and distribution supported by the .NET platform” F# FAQ: http://bit.ly/FSharpFAQ
Functional programming is a primary technique for minimizing/isolating mutable state
Asynchronous workflows make writing parallel programs in a “natural and compositional style”
F# and Multi-Core Programming
Interactive Scripting Good for prototyping
Succinct = Less code
Type Inference Strongly typed, strict (no dynamic typing)
Automatic generalization (generics for free)
Few type annotations
1st class functions (currying, lazy evaluations)
Pattern matching
Key Characteristics of F#
Concurrent Programming with F#
Luke Hoban at PDC 2009F# Program Manager
http://microsoftpdc.com/Sessions/FT20
Demo – Imperative sumOfSquares
Difficult to turn existing sequential code into parallel code Must modify large
portions of code to use threads explicitly
Using shared state and locks is difficult Careful to avoid race
conditions and deadlocks
Two Problems Parallelizing Imperative Code
http://www.manning.com/petricek/petricek_meapch1.pdf
Demo – Recursive sumOfSquares
Declarative programming style Easier to introduce parallelism into existing code
Immutability by default Can’t introduce race conditions
Easier to write lock-free code
Functional Programming
Demo – Functional sumOfSquares
From Seq to PSeq Matthew Podwysocki’s Blog
http://weblogs.asp.net/podwysocki/archive/2009/02/23/adding-parallel-extensions-to-f.aspx
Adding Parallel Extensions to F# for VS2010 Beta 2 Talbott Crowell’s Developer Blog
http://talbottc.spaces.live.com/blog/cns!A6E0DA836D488CA6!396.entry
Parallel Extensions to F#
Demo – Parallel sumOfSquares
Asynchronous Workflows
Control.MailboxProcessor
Task Based Programming using TPL
Reactive Extensions “The Reactive Extensions can be used from any .NET language.
In F#, .NET events are first-class values that implement the IObservable<out T> interface. In addition, F# provides a basic set of functions for composing observable collections and F# developers can leverage Rx to get a richer set of operators for composing events and other observable collections. ”
S. Somasegar, Senior Vice President, Developer Division http://blogs.msdn.com/somasegar/archive/2009/11/18/reactive-extensions-for-net-rx.aspx
F# Parallel Programming Options
Problem Resize a ton of images
Demo of Image Processor
for file in files do use image = Image.FromFile(file) use smallImage = ResizeImage(image) let destFileName = DestFileName("s1", file) smallImage.Save(destFileName)
let files = Directory.GetFiles(@"C:\images\original")
Asynchronous Workflowslet FetchAsync(file:string) = async { use stream = File.OpenRead(file) let! bytes = stream.AsyncRead(int stream.Length) use memstream = new MemoryStream(bytes.Length) memstream.Write(bytes, 0, bytes.Length) use image = Image.FromStream(memstream) use smallImage = ResizeImage(image) let destFileName = DestFileName("s2", file) smallImage.Save(destFileName) }
let tasks = [for file in files -> FetchAsync(file)]let parallelTasks = Async.Parallel tasksAsync.RunSynchronously parallelTasks
Tomas PetricekUsing Asynchronous Workflows
http://tomasp.net/blog/fsharp-webcast-async.aspx
LINQLanguage-Integrated Query
LINQ declaratively specify what you want done not how you want it done
Versus:
LINQ
var source = Enumerable.Range(1, 10000);
var evenNums = from num in source where Compute(num) > 0 select num;
var source = Enumerable.Range(1, 10000);
var evenNums = new List<int>();
foreach (var num in source) if (Compute(num) > 0) evenNums.Add(num);
If I put a counter in Compute(num)?
What will happen?
var source = Enumerable.Range(1, 10000);
var evenNums = from num in source where Compute(num) > 0 select num;
private static int Compute(int num) { counter++; if (num % 2 == 0) return 1; return 0;}
PLINQ(Parallel LINQ)
Parallel Programming in the .NET Framework 4 Beta 2 - PLINQ
LINQ declaratively specify what you want done not how you want it done
PLINQ Declaratively specify “As Parallel”
Under the hood, the framework will implement “the how” using TPL and threads.
PLINQ = Parallel LINQ
var source = Enumerable.Range(1, 10000);
var evenNums = from num in source where Compute(num) > 0 select num;
var source = Enumerable.Range(1, 10000);
var evenNums = from num in source.AsParallel() where Compute(num) > 0 select num;
ParallelEnumerable Operator Description
AsParallel() The entry point for PLINQ. Specifies that the rest of the query should be parallelized, if it is possible.
AsSequential() Specifies that the rest of the query should be run sequentially, as a non-parallel LINQ query.
AsOrdered() Specifies that PLINQ should preserve the ordering of the source sequence for the rest of the query, or until the ordering is changed, for example by the use of an orderby (Order By in Vlsual Basic) clause.
AsUnordered() Specifies that PLINQ for the rest of the query is not required to preserve the ordering of the source sequence.
WithCancellation() Specifies that PLINQ should periodically monitor the state of the provided cancellation token and cancel execution if it is requested.
WithDegreeOfParallelism() Specifies the maximum number of processors that PLINQ should use to parallelize the query.
WithMergeOptions() Provides a hint about how PLINQ should, if it is possible, merge parallel results back into just one sequence on the consuming thread.
WithExecutionMode() Specifies whether PLINQ should parallelize the query even when the default behavior would be to run it sequentially.
ForAll() A multithreaded enumeration method that, unlike iterating over the results of the query, enables results to be processed in parallel without first merging back to the consumer thread.
Aggregate() overload An overload that is unique to PLINQ and enables intermediate aggregation over thread-local partitions, plus a final aggregation function to combine the results of all partitions.
System.Linq.ParallelEnumerable
AsParallel()
The entry point for PLINQ. Specifies that the rest of the query should be parallelized, if it is possible.
Visual Studio 2010Tools for Concurrency
Steven Toub at PDC 2009Senior Program Manager on the Parallel Computing
Platform
http://microsoftpdc.com/Sessions/P09-09
Views enable you to see how your multi-threaded application interacts with Itself
Hardware
Operating System
Other processes on the host computer
Provides graphical, tabular and textual data Shows the temporal relationships between
the threads in your program the system as a whole
Concurrency Visualizer in Visual Studio 2010
Performance bottlenecks
CPU underutilization
Thread contention
Thread migration
Synchronization delays
Areas of overlapped I/O
and other info…
Use Concurrency Visualizer to Locate
Concurrency VisualizerHigh level of Contentions during Async
CPU View
Threads View (Parallel Performance)
Cores View
Views
CPU View
• Async uses more of the CPU(s)/cores• Sync uses 1 CPU/core
Threads View
Full test
Close up of Sync
Close up of Async
Core View
Tomas Petricek - F# Webcast (III.) - Using Asynchronous Workflows http://tomasp.net/blog/fsharp-webcast-async.aspx
Luke Hoban - F# for Parallel and Asynchronous Programming http://microsoftpdc.com/Sessions/FT20
More info on Asychrounous Workflows
The Landscape of Parallel Computing Research: A View from Berkeley 2.0 by David Patterson http://science.officeisp.net/ManycoreComputingW
orkshop07/Presentations/David%20Patterson.pdf
Parallel Dwarfs http://paralleldwarfs.codeplex.com/
More Research
“The architect as we know him today is a product of the Renaissance.” (1)
“But the medieval architect was a master craftsman (usually a mason or a carpenter by trace), one who could build as well as design, or at least ‘one trained in that craft even if he had ceased to ply his axe and chisel’(2).” (1)
“Not only is he hands on, like the agile architect, but we also learn from Arnold that the great Gothic cathedrals of Europe were built, not with BDUF, but with ENUF”
(1). Dana Arnold, Reading Architectural History, 2002
(2). D. Knoop & G. P. Jones, The Medieval Mason, 1933
(3). Architects: Back to the future?, Ian Cooper 2008
The Architect
http://codebetter.com/blogs/ian_cooper/archive/2008/01/02/architects-back-to-the-future.aspx
Thank you. Questions?Architecting Solutions for the
Manycore Future
Talbott Crowell
ThirdM.com
http://talbottc.spaces.live.com
Twitter: @Talbott and @fsug
visit us at http://fsug.org