Communicating State Machines sriram srinivasan [email protected] www.malhar.net/sriram Programming Languages Meetup, San Francisco, June 10, 2014
Jun 23, 2015
Communicating State Machines
sriram [email protected]
www.malhar.net/sriram
Programming Languages Meetup, San Francisco, June 10, 2014
• Fundamental building block of computation
• Communicating State Machines model
• Synchronous and Asynchronous composition
• Hierarchical State Machines specification
• (Edward A. Lee and Pravin Varaiya, Structure and Interpretation of Signals and Systems, LeeVaraiya.org)
State machines
• Distributed Systems
• Hardware interfaces
• Components of a memory hierarchy
• Stream producers and consumers
• Parsers and Lexers
• Filesystem and tree walker.
• Networking stack and Socket consumer
• Bidirectional communication
CSMs are Ubiquitous
C++ Boost.struct Active : sc::simple_state< Active, StopWatch, Stopped > { public: typedef sc::transition< EvReset, Active > reactions; ! Active() : elapsedTime_( 0.0 ) {} double ElapsedTime() const { return elapsedTime_; } double & ElapsedTime() { return elapsedTime_; } private: double elapsedTime_; }; struct Running : sc::simple_state< Running, Active >
{ public: typedef sc::transition< EvStartStop, Stopped > reactions; ! Running() : startTime_( std::time( 0 ) ) {} ~Running() { context< Active >().ElapsedTime() += std::difftime( std::time( 0 ), startTime_ ); } private: std::time_t startTime_; };Ugh!
• Language used in TinyOS to program wireless motes
nesC
• Components with bidirectional interfaces
• Separate configuration to stitch together components
nesC Bidirectional Interfacesinterface StdControl { command result_t init(); } !interface Timer { command result_t start(char type, uint32_t interval); command result_t stop(); event result_t fired(); } !interface Send { command result_t send(TOS_Msg *msg, uint16_t length); event result_t sendDone(TOS_Msg *msg, result_t success); } !interface Device { command result_t getData(); event result_t dataReady(uint16_t data); }
nesC Implementationmodule ChirpM { provides interface StdControl; uses interface Device; uses interface Timer; uses interface Send; implementation { uint16_t sensorReading; command result_t StdControl.init() { return call Timer.start(TIMER_REPEAT, 1000); } event result_t Timer.fired() { call Device.getData(); return SUCCESS; } event result_t Device.dataReady(uint16_t data) { sensorReading = data; ... send message with data in it ... return SUCCESS; } } }
StdControl
ChirpMTimer Device Send
nesC configurationChirpC
configuration ChirpC { provides interface StdControl; } implementation { components ChirpM, BarometerC, RadioAnnouncerC; ! StdControl = ChirpM.StdControl ChirpM.Timer -> HWTimer.Timer ChirpM.Device -> Barometer ChirpM.Send -> RadioAnnouncerC }
StdControl
ChirpMTimer Device Send
StdControl
Timer
HWTimer
Device
Barometer
Send
RadioAnnouncerC
Why aren’t more systems structured this way?
Synchronous Communication
Stacks as State Machinesvoid readMsgs( socket) { numMsgsRead = 0 while (true) { msg = readMsg(socket) dispatch(msg) log(numMsgsRead++) } } void readMsg(socket) { len := readLen(socket) readBody(len) } void readLen(socket) { byte[4] len for i = 0 .. 4 { len[i] = readByte(socket) } return len }
• Thread of control
• Control plane = Call Chain (each frame remembers its pc)
• Sequential flow of control defines hidden states
• Functions define major states
• Data plane = Vars local in each frame
• Blocking semantics == synchronous (lock-step) communication
• readByte and dispatch interact with network
• Easy API; that’s why Posix and most db calls are synchronous
State machine in Erlangbark() -> io:format("Dog says: BARK! BARK!~n"), receive pet -> wag_tail(); _ -> io:format("Dog is confused~n"), bark() after 2000 -> bark() end. !wag_tail() -> io:format("Dog wags its tail~n"), receive pet -> sit(); _ -> io:format("Dog is confused~n"), wag_tail() after 30000 -> bark() end.
sit() -> io:format("Dog is sitting. Gooooood boy!~n"), receive squirrel -> bark(); _ -> io:format("Dog is confused~n"), sit() end.
• Tail-call optimization renderschange trivial
Credit: http://learnyousomeerlang.com/finite-state-machines
• Problem: Obtain leaves from a tree one at a time
Leaves from a Tree
• Problem: Obtain leaves from a tree one at a time
Leaves from a Tree
• Problem: Obtain leaves from a tree one at a time
• Two interacting state machines:
• Producer: tree, Consumer: user code that acts on the leaves.
• Pull solution: Iterators
• Convenient for clients
• for leaf in tree: print leaf.name
• Push solution: Functional approach
• Tree pushes data to visitors or user-defined functions
• tree.visit( myfunc )
• Ideally: Duals of each other
• In practice: Duel with each other
Leaves from a Tree
Pull Solution: Iteratorsclass Node: … def __iter__(self): return Iter(self) !class Iter: def __init__(self, root): self.nxt = root.first_leaf() self.prev = None def next(self): nxt = self.nxt if nxt: # First time entry into iterator self.nxt = None self.prev = nxt return nxt
(contd).
prev = self.prev if prev.sibling: nxt = prev.sibling.first_leaf() else: # explore cousins .. children of parent's siblings parent = prev.parent while parent: uncle = parent.sibling if uncle: nxt = uncle.first_leaf() break else: parent = parent.parent # continue loop if nxt: self.prev = nxt # for next iter return nxt else: raise StopIteration
• Consumer code drives iteration
• Producer code (iterable) needs to save state between iterations
Push solution
class Node: … def leaves(self, callback): if self.is_leaf(): callback(self) else: for c in self.children: c.leaves(callback) ! if self.sibling: self.sibling.leaves(callback)
def cb(node): print node.name !tree.leaves(cb)
• Consumer side:
• Callback hell
• Visitor pattern is an abomination
• Does not have flow-control between events
• Producer side:
• drives iteration
• stack for storing recursive state
• Allows async consumers to deliver events
Consumer
Producer
Push: Consumer-side troubleexports.processJob = function(options, next) { db.getUser(options.userId, function(err, user) { if (error) return next(err); db.updateAccount(user.accountId, options.total, function(err) { if (err) return next(err); http.post(options.url, function(err) { if (err) return next(err); next(); }); }); }); };
def sameFringe(treeA, treeB): itreeA = iter(treeA) itreeB = iter(treeB) while 1: nodeA = itreeA.next() nodeB = itreeB.next() if node A .name != nodeB.name: return False ….
Callback Hell !Sequential chain of events verbose to express !Inversion of control
Concurrent Traversals trivial in Pull approach
Generators/Coroutines
Generators: Concurrent Stackso = odds() !print o.next() print o.next() print o.next() !# Print infinite stream for n in odds() : print n
def odds(): i = 1 while True: yield i i += 2
for leaf in tree.leaves(): print leaf.name
class Tree: def leaves(self): if self.is_leaf(): yield self else: for c in self.children: for leaf in c.leaves(): yield leaf
• Generators/Coroutines are simply a compiler transformation of threaded to event-driven code on same kernel thread
• Flow of control alternates between consumer and producer
• Cheap user-level tasks with explicit cooperative scheduling
• Scheduler calls next()
• Task calls yield() whenever necessary
• Wrapped in an abstraction called Fiber
• Ruby: Fiber.yield, Javascript: function*, yield/yield*
• Symmetric vs Asymmetric coroutines
• Lazy streams — Infinite streams on demand
Generators
Asynchrony and Multiprocessing
• All threads have the same fixed size set at creation time: usually set to worst case
• Kernel Thread context switching is expensive (in μs)
• Preemption at any time ==> Save all registers: 16 general purpose registers, PC, SP, segment registers, 16 XMM registers, FP coprocessor state, X AVX registers, all MSRs
• TLB flushes, cache invalidation, crossing kernel protection boundary
• Even cooperative yields are expensive.
• A kernel thread is a precious resource. Can’t block it.
• No, not for IO-bound code, says Paul Tyma
Why can’t we just use Kernel Threads?
> ulimit –s 8192
Threads vs Events Debate
• But horrible user-programming model
• libuv, libasync, EventMachine (Ruby), netty (Java)
• User-code must not block, not call other I/O operations
Event-driven I/O is faster
Netty inversion of controlio.netty.handler.codec.DecoderException: java.lang.RuntimeException: No packet with id 78 at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:263) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:131) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:173) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323) at io.netty.handler.codec.ByteToMessageDecoder.handlerRemoved(ByteToMessageDecoder.java:109) at io.netty.channel.DefaultChannelPipeline.callHandlerRemoved0(DefaultChannelPipeline.java:524) at io.netty.channel.DefaultChannelPipeline.callHandlerRemoved(DefaultChannelPipeline.java:518) at io.netty.channel.DefaultChannelPipeline.remove0(DefaultChannelPipeline.java:348) at io.netty.channel.DefaultChannelPipeline.remove(DefaultChannelPipeline.java:319) at io.netty.channel.DefaultChannelPipeline.remove(DefaultChannelPipeline.java:296) at org.spigotmc.netty.LegacyDecoder.decode(LegacyDecoder.java:38) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:232) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:131) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323) at io.netty.handler.timeout.ReadTimeoutHandler.channelRead(ReadTimeoutHandler.java:149) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:337) at io.netty.channel.DefaultChannelHandlerContext.fireChannelRead(DefaultChannelHandlerContext.java:323) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:785) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:100) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:478) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:447) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:341) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.RuntimeException: No packet with id 78 at org.spigotmc.netty.Protocol$ProtocolDirection.createPacket(Protocol.java:272) at org.spigotmc.netty.PacketDecoder.decode(PacketDecoder.java:44) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:232)
Let’s compromise. Let’s both be unhappy.
• All I/O handled by special I/O event loop in separate thread
• Can’t do I/O in callback
• Cannot block
• Handed off to a task on a separate thread pool
• Task cannot block there either; limited threads in thread pool
• Hand-rolled continuations
Current mainstream
Netty
public class WriteTimeOutHandler extends ChannelOutboundHandlerAdapter { @Override public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise promise) { ctx.write(msg, promise); ! if (!promise.isDone() { ctx.executor().schedule(new WriteTimeoutTask(promise), 30, TimeUnit.SECONDS); } } }
Ugh.
Functional Reactive ProgramminggetDataFromNetwork() .skip(10) .take(5) .map({ s -> return s + " transformed" }) .subscribe({ println "onNext => " + it })
• Reactive extensions .NET, RxJava, Scala
• Asynchronous stream. A chain of transformers ending with a callback.
• Effectively with the same kinds of restrictions:
• No blocking, worry about thread context (“can I write to a socket”)
• Pretty sequential code.
• Millions of Threads.
• Block when we want to.
• Receive and Send to other SMs anywhere.
• Receive from multiple sources
• Speed and lightness of Event-driven solutions
Can we have it all?
• kilim.malhar.net
• Bytecode transformer for coroutines/generators and lightweight tasks
• s/Thread/Task/
• s/run()/execute() throws Pausable/
• All functions that may block annotated as “throws Pausable”
• Use typed mailboxes to communicate
• Bytecode transformation of Java code.
• Offline or at class load time
Ta da! Kilim
Kilim Performance vs. Threads
Kilim Server Performance vs. Jetty
• Lightweight threads — C layout, small dynamic stacks
• Multiplex on channel I/O — CSP’s alt operator.
• Fast context switching — three registers to save and restore (PC, SP and DX)
• Syntactic lightness
• Language and idioms fit in my L1 cache
• Closures, Duck-typing
• 0-sized channels == true synchronous lock-step
• What I want: Some aspects of Swift/Rust!
What I like about Go
Gopackage main func main() { ch := make(chan int) ! go func() { // producer i := 1 for { ch <– i i += 2 } }() ! for { // consumer println(<–ch) } }
Gofunc main() { // Listen and accept loop tcpaddr, err := net.ResolveTCPAddr("tcp", "localhost:9999") check(err) tcp_acceptor, err := net.ListenTCP("tcp", tcpaddr) check(err) fmt.Println("Listening on ", tcp_acceptor.Addr()) ! for true { tcp_conn, err := tcp_acceptor.AcceptTCP() check(err) go serve(tcp_conn) } }
func serve(conn *net.TCPConn) { for true { dec := gob.NewDecoder(conn) //var msg Msg var data string //err := dec.Decode(&msg) err := dec.Decode(&data) check(err) println("Server: Rcvd ", data) //println("Server: Rcvd ", msg.Data, "from", msg.From) …. }
• Compiler transformation of ‘go’ blocks into event-driven code
• All blocking calls must be made directly inside a go block
• Channel receives and sends cannot be made in a called function
• In general, all approaches relying only on compiler transformations leak abstractions. Need Go/Erlang like deep runtime support
Clojure core.async
• Threaded style is easy to write and understand
• Actors are not internally concurrent; no internal data races.
• Undesirable combination: Aliasing + Mutability
• Either aliased+immutable — clojure approach
• Unaliased+mutable — KIlim, Rust, Go approach.
• Isolate actor state, and exchange messages. Rust’s linear type system is wonderful.
• Go mantra: Share by communicate, not communicate by sharing
• No more threads vs. events debates. You can have it all
• Erlang, Go, Rust, Kilim for Java, Akka for Scala, F#
Takeaways