Communicating State Machines

Post on 23-Jun-2015

840 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

A presentation given at the Programming Languages Meetup in San Francisco (Jun 10, 2014). Computation is about communicating state machines, but the message is lost in the endless debates on threads vs. events, iterators vs.. reactive approaches. There are lightweight coroutine and thread options available in all major mainstream languages, which help combine the easy sequential thread programming, with performance of event-oriented code. You can have it all.

Transcript

Communicating State Machines

sriram srinivasan!sriram@malhar.net

www.malhar.net/sriram

Programming Languages Meetup, San Francisco, June 10, 2014

• Fundamental building block of computation

• Communicating State Machines model

• Synchronous and Asynchronous composition

• Hierarchical State Machines specification

• (Edward A. Lee and Pravin Varaiya, Structure and Interpretation of Signals and Systems, LeeVaraiya.org)

State machines

• Distributed Systems

• Hardware interfaces

• Components of a memory hierarchy

• Stream producers and consumers

• Parsers and Lexers

• Filesystem and tree walker.

• Networking stack and Socket consumer

• Bidirectional communication

CSMs are Ubiquitous

C++ Boost.struct Active : sc::simple_state< Active, StopWatch, Stopped > { public: typedef sc::transition< EvReset, Active > reactions; ! Active() : elapsedTime_( 0.0 ) {} double ElapsedTime() const { return elapsedTime_; } double & ElapsedTime() { return elapsedTime_; } private: double elapsedTime_; }; struct Running : sc::simple_state< Running, Active >

{ public: typedef sc::transition< EvStartStop, Stopped > reactions; ! Running() : startTime_( std::time( 0 ) ) {} ~Running() { context< Active >().ElapsedTime() += std::difftime( std::time( 0 ), startTime_ ); } private: std::time_t startTime_; };Ugh!

• Language used in TinyOS to program wireless motes

nesC

• Components with bidirectional interfaces

• Separate configuration to stitch together components

nesC Bidirectional Interfacesinterface StdControl { command result_t init(); } !interface Timer { command result_t start(char type, uint32_t interval); command result_t stop(); event result_t fired(); } !interface Send { command result_t send(TOS_Msg *msg, uint16_t length); event result_t sendDone(TOS_Msg *msg, result_t success); } !interface Device { command result_t getData(); event result_t dataReady(uint16_t data); }

nesC Implementationmodule ChirpM { provides interface StdControl; uses interface Device; uses interface Timer; uses interface Send; implementation { uint16_t sensorReading; command result_t StdControl.init() { return call Timer.start(TIMER_REPEAT, 1000); } event result_t Timer.fired() { call Device.getData(); return SUCCESS; } event result_t Device.dataReady(uint16_t data) { sensorReading = data; ... send message with data in it ... return SUCCESS; } } }

StdControl

ChirpMTimer Device Send

nesC configurationChirpC

configuration ChirpC { provides interface StdControl; } implementation { components ChirpM, BarometerC, RadioAnnouncerC; ! StdControl = ChirpM.StdControl ChirpM.Timer -> HWTimer.Timer ChirpM.Device -> Barometer ChirpM.Send -> RadioAnnouncerC }

StdControl

ChirpMTimer Device Send

StdControl

Timer

HWTimer

Device

Barometer

Send

RadioAnnouncerC

Why aren’t more systems structured this way?

Synchronous Communication

Stacks as State Machinesvoid readMsgs( socket) { numMsgsRead = 0 while (true) { msg = readMsg(socket) dispatch(msg) log(numMsgsRead++) } } void readMsg(socket) { len := readLen(socket) readBody(len) } void readLen(socket) { byte[4] len for i = 0 .. 4 { len[i] = readByte(socket) } return len }

• Thread of control

• Control plane = Call Chain (each frame remembers its pc)

• Sequential flow of control defines hidden states

• Functions define major states

• Data plane = Vars local in each frame

• Blocking semantics == synchronous (lock-step) communication

• readByte and dispatch interact with network

• Easy API; that’s why Posix and most db calls are synchronous

State machine in Erlangbark() -> io:format("Dog says: BARK! BARK!~n"), receive pet -> wag_tail(); _ -> io:format("Dog is confused~n"), bark() after 2000 -> bark() end. !wag_tail() -> io:format("Dog wags its tail~n"), receive pet -> sit(); _ -> io:format("Dog is confused~n"), wag_tail() after 30000 -> bark() end.

sit() -> io:format("Dog is sitting. Gooooood boy!~n"), receive squirrel -> bark(); _ -> io:format("Dog is confused~n"), sit() end.

• Tail-call optimization renderschange trivial

Credit: http://learnyousomeerlang.com/finite-state-machines

• Problem: Obtain leaves from a tree one at a time

Leaves from a Tree

• Problem: Obtain leaves from a tree one at a time

Leaves from a Tree

• Problem: Obtain leaves from a tree one at a time

• Two interacting state machines:

• Producer: tree, Consumer: user code that acts on the leaves.

• Pull solution: Iterators

• Convenient for clients

• for leaf in tree: print leaf.name

• Push solution: Functional approach

• Tree pushes data to visitors or user-defined functions

• tree.visit( myfunc )

• Ideally: Duals of each other

• In practice: Duel with each other

Leaves from a Tree

Pull Solution: Iteratorsclass Node: … def __iter__(self): return Iter(self) !class Iter: def __init__(self, root): self.nxt = root.first_leaf() self.prev = None def next(self): nxt = self.nxt if nxt: # First time entry into iterator self.nxt = None self.prev = nxt return nxt

(contd).

prev = self.prev if prev.sibling: nxt = prev.sibling.first_leaf() else: # explore cousins .. children of parent's siblings parent = prev.parent while parent: uncle = parent.sibling if uncle: nxt = uncle.first_leaf() break else: parent = parent.parent # continue loop if nxt: self.prev = nxt # for next iter return nxt else: raise StopIteration

• Consumer code drives iteration

• Producer code (iterable) needs to save state between iterations

Push solution

class Node: … def leaves(self, callback): if self.is_leaf(): callback(self) else: for c in self.children: c.leaves(callback) ! if self.sibling: self.sibling.leaves(callback)

def cb(node): print node.name !tree.leaves(cb)

• Consumer side:

• Callback hell

• Visitor pattern is an abomination

• Does not have flow-control between events

• Producer side:

• drives iteration

• stack for storing recursive state

• Allows async consumers to deliver events

Consumer

Producer

Push: Consumer-side troubleexports.processJob = function(options, next) { db.getUser(options.userId, function(err, user) { if (error) return next(err); db.updateAccount(user.accountId, options.total, function(err) { if (err) return next(err); http.post(options.url, function(err) { if (err) return next(err); next(); }); });  }); };

def sameFringe(treeA, treeB): itreeA = iter(treeA) itreeB = iter(treeB) while 1: nodeA = itreeA.next() nodeB = itreeB.next() if node A .name != nodeB.name: return False ….

Callback Hell !Sequential chain of events verbose to express !Inversion of control

Concurrent Traversals trivial in Pull approach

Generators/Coroutines

Generators: Concurrent Stackso = odds() !print o.next() print o.next() print o.next() !# Print infinite stream for n in odds() : print n

def odds(): i = 1 while True: yield i i += 2

for leaf in tree.leaves(): print leaf.name

class Tree: def leaves(self): if self.is_leaf(): yield self else: for c in self.children: for leaf in c.leaves(): yield leaf

• Generators/Coroutines are simply a compiler transformation of threaded to event-driven code on same kernel thread

• Flow of control alternates between consumer and producer

• Cheap user-level tasks with explicit cooperative scheduling

• Scheduler calls next()

• Task calls yield() whenever necessary

• Wrapped in an abstraction called Fiber

• Ruby: Fiber.yield, Javascript: function*, yield/yield*

• Symmetric vs Asymmetric coroutines

• Lazy streams — Infinite streams on demand

Generators

Asynchrony and Multiprocessing

• All threads have the same fixed size set at creation time: usually set to worst case

• Kernel Thread context switching is expensive (in μs)

• Preemption at any time ==> Save all registers: 16 general purpose registers, PC, SP, segment registers, 16 XMM registers, FP coprocessor state, X AVX registers, all MSRs

• TLB flushes, cache invalidation, crossing kernel protection boundary

• Even cooperative yields are expensive.

• A kernel thread is a precious resource. Can’t block it.

• No, not for IO-bound code, says Paul Tyma

Why can’t we just use Kernel Threads?

> ulimit –s 8192

Threads vs Events Debate

• But horrible user-programming model

• libuv, libasync, EventMachine (Ruby), netty (Java)

• User-code must not block, not call other I/O operations

Event-driven I/O is faster

Netty inversion of controlio.netty.handler.codec.DecoderException: java.lang.RuntimeException: No packet with id 78 at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:263) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:131) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:173) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323) at io.netty.handler.codec.ByteToMessageDecoder.handlerRemoved(ByteToMessageDecoder.java:109) at io.netty.channel.DefaultChannelPipeline.callHandlerRemoved0(DefaultChannelPipeline.java:524) at io.netty.channel.DefaultChannelPipeline.callHandlerRemoved(DefaultChannelPipeline.java:518) at io.netty.channel.DefaultChannelPipeline.remove0(DefaultChannelPipeline.java:348) at io.netty.channel.DefaultChannelPipeline.remove(DefaultChannelPipeline.java:319) at io.netty.channel.DefaultChannelPipeline.remove(DefaultChannelPipeline.java:296) at org.spigotmc.netty.LegacyDecoder.decode(LegacyDecoder.java:38) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:232) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:131) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323) at io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:149) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:785) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:100) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:478) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:447) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:341) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: No packet with id 78 at org.spigotmc.netty.Protocol$ProtocolDirection.createPacket(Protocol.java:272) at org.spigotmc.netty.PacketDecoder.decode(PacketDecoder.java:44) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:232)

Let’s compromise. Let’s both be unhappy.

• All I/O handled by special I/O event loop in separate thread

• Can’t do I/O in callback

• Cannot block

• Handed off to a task on a separate thread pool

• Task cannot block there either; limited threads in thread pool

• Hand-rolled continuations

Current mainstream

Netty

public  class  WriteTimeOutHandler  extends  ChannelOutboundHandlerAdapter  {      @Override      public  void  write(ChannelHandlerContext  ctx,  Object  msg,  ChannelPromise  promise)  {          ctx.write(msg,  promise);  !        if  (!promise.isDone()  {              ctx.executor().schedule(new  WriteTimeoutTask(promise),  30,  TimeUnit.SECONDS);            }      }  }  

Ugh.

Functional Reactive ProgramminggetDataFromNetwork() .skip(10) .take(5) .map({ s -> return s + " transformed" }) .subscribe({ println "onNext => " + it })

• Reactive extensions .NET, RxJava, Scala

• Asynchronous stream. A chain of transformers ending with a callback.

• Effectively with the same kinds of restrictions:

• No blocking, worry about thread context (“can I write to a socket”)

• Pretty sequential code.

• Millions of Threads.

• Block when we want to.

• Receive and Send to other SMs anywhere.

• Receive from multiple sources

• Speed and lightness of Event-driven solutions

Can we have it all?

• kilim.malhar.net

• Bytecode transformer for coroutines/generators and lightweight tasks

• s/Thread/Task/

• s/run()/execute() throws Pausable/

• All functions that may block annotated as “throws Pausable”

• Use typed mailboxes to communicate

• Bytecode transformation of Java code.

• Offline or at class load time

Ta da! Kilim

Kilim Performance vs. Threads

Kilim Server Performance vs. Jetty

• Lightweight threads — C layout, small dynamic stacks

• Multiplex on channel I/O — CSP’s alt operator.

• Fast context switching — three registers to save and restore (PC, SP and DX)

• Syntactic lightness

• Language and idioms fit in my L1 cache

• Closures, Duck-typing

• 0-sized channels == true synchronous lock-step

• What I want: Some aspects of Swift/Rust!

What I like about Go

Gopackage main func main() { ch := make(chan int) ! go func() { // producer i := 1 for { ch <– i i += 2 } }() ! for { // consumer println(<–ch) } }

Gofunc main() { // Listen and accept loop tcpaddr, err := net.ResolveTCPAddr("tcp", "localhost:9999") check(err) tcp_acceptor, err := net.ListenTCP("tcp", tcpaddr) check(err) fmt.Println("Listening on ", tcp_acceptor.Addr()) ! for true { tcp_conn, err := tcp_acceptor.AcceptTCP() check(err) go serve(tcp_conn) } }

func serve(conn *net.TCPConn) { for true { dec := gob.NewDecoder(conn) //var msg Msg var data string //err := dec.Decode(&msg) err := dec.Decode(&data) check(err) println("Server: Rcvd ", data) //println("Server: Rcvd ", msg.Data, "from", msg.From) …. }

• Compiler transformation of ‘go’ blocks into event-driven code

• All blocking calls must be made directly inside a go block

• Channel receives and sends cannot be made in a called function

• In general, all approaches relying only on compiler transformations leak abstractions. Need Go/Erlang like deep runtime support

Clojure core.async

• Threaded style is easy to write and understand

• Actors are not internally concurrent; no internal data races.

• Undesirable combination: Aliasing + Mutability

• Either aliased+immutable — clojure approach

• Unaliased+mutable — KIlim, Rust, Go approach.

• Isolate actor state, and exchange messages. Rust’s linear type system is wonderful.

• Go mantra: Share by communicate, not communicate by sharing

• No more threads vs. events debates. You can have it all

• Erlang, Go, Rust, Kilim for Java, Akka for Scala, F#

Takeaways

top related