Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.

Multi-cellular paradigm

The molecular level can support self-replication (and self-repair).

But we also need cells that can be designed to fit the specific application and at the same time able to support bio-inspired mechanisms for self-replication and fault tolerance.

Cellular differentiation Cells adapt their physical

structure to fit the “application”

Can circuits/processors do the same? Physically? No Logically? Yes, but…

Can they do it easily (dare we say, automatically)?

Conventional processors

Fetch, decode and control unit

Instruction encoding Instructions encode both the operation and the

operands. For example, in the MIPS architecture

Instruction encoding in “real life”

The Arithmetic and Logic Unit

Common processor components

State of the art computing

Bio-inspired processors However, none of these “standard” architectures is quite flexible

enough to implement many of the behaviours required for bio-inspired computing

Needed: adaptable cellular architectureThat is, a processor architecture that is

Customizable Compact Powerful Easy to design and modify Amenable to evolution and learning

Possible solution: MOVE architectures

The MOVE paradigm

One single instruction : move Data displacements trigger

operations Architecture based around

data ≠ operation centric Regular structure : functional

units + data network Scalable and modular

architecture

Example: Sum of two values

Conventional architecture:add R1, R2, R3;

MOVE architecture: move O(Fxxx), I1(Fsum)

move O(Fyyy), I2(Fsum)move O(Fsum), I(Fzzz)

Example – add operation

Cellular differentiation

Main features: Only one instruction (OK, maybe two) that MOVEs data to

and from the CUs and FUs (dataflow architecture) Conventional fetch/decode mechanism – compatible with

bio-inspired mechanisms No pipeline: computation carried out in specialized

functional units (FU) Communication carried out in specialized communication

units (CU)

Cellular differentiation

Main advantages: Can be easily customized by introducing application-specific functional and communication units. Perfectly fits the requirements of systolic arrays (arbitrarily complex communication patterns). The introduction of custom components does not affect the assembler language, the code

structure, the fetch and decode units, or the transport bus.

Genotype Layer

Phenotype Layer

Example – Automatic Synthesis

Application-specific (parallel) functions

Developmental algorithm

Genetic code

Mapping Layer

Example – Automatic Synthesis

Phenotype Layer

Mapping Layer

Genotype Layer

Totipotent Cell

Example – Automatic SynthesisTotipotent CellProgrammable Logic

Example – Automatic SynthesisProgrammable Logic

Cellular Array

What kind of applications can take advantage of this kind of system?

Complex "real-world" streaming applications computation is carried out sequentially can be represented by a DAG of computation nodes each node processes data locally then forwards

them to the next node in the graph

Applications

×+ ÷≠ FFT +

×

DCTIN OUT

READ DCT QNTZ CMPR WRT

Example: JPEG

Specialized MOVE functional units can be designed for each of these steps

IN OUT

Programmable substrate

×+ ÷≠ FFT +

×

DCT

Context

IN OUT

Problem: task or resource allocation – i.e. how do we map the graph nodes to the array?

Specifically: dynamic allocation

Self-Scaling Stream Processing

Source

Funct A

Funct B

Funct C

JoinFunct AFunct AFunct A

Funct CFunct

AFunct A

Funct CFunct

A

Funct C

SSSP The MJPEG application consists of a four-stage

computation pipeline. The data to be compressed are composed of 192 bytes corresponding to an 8x8 array of pixels using 24-bit colour.

The maximum rate achievable (determined by the input rate) is of 700 packets per second - roughly 1 MBit/second. With a single pipeline, the performance tops at about 60 packets per second.

SSSP

When performance peaks, the average output rate is of 675 packets per second (out of a maximum of 700): this technique allows to multiply the throughput by a factor of 11 using 28 processors.

Next lecture What kind of tools have to be developed to

implement a complete system? How do we determine optimal Fus for a given

application? Idea: let’s use evolution!

Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.

Documents

operation slide

ifzzz slide

mips architecture slide

logic unit slide

art computing slide

control unit slide

real life slide

modular architecture