Multi-cellular paradigm The molecular level can support self- replication (and self-repair). But we also need cells that can be designed to fit the specific application and at the same time able to support bio- inspired mechanisms for self- replication and fault tolerance.
27
Embed
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Multi-cellular paradigm
The molecular level can support self-replication (and self-repair).
But we also need cells that can be designed to fit the specific application and at the same time able to support bio-inspired mechanisms for self-replication and fault tolerance.
Cellular differentiation Cells adapt their physical
structure to fit the “application”
Can circuits/processors do the same? Physically? No Logically? Yes, but…
Can they do it easily (dare we say, automatically)?
Conventional processors
Fetch, decode and control unit
Instruction encoding Instructions encode both the operation and the
operands. For example, in the MIPS architecture
Instruction encoding in “real life”
The Arithmetic and Logic Unit
Common processor components
State of the art computing
Bio-inspired processors However, none of these “standard” architectures is quite flexible
enough to implement many of the behaviours required for bio-inspired computing
Needed: adaptable cellular architectureThat is, a processor architecture that is
Customizable Compact Powerful Easy to design and modify Amenable to evolution and learning
Possible solution: MOVE architectures
The MOVE paradigm
One single instruction : move Data displacements trigger
operations Architecture based around
data ≠ operation centric Regular structure : functional
units + data network Scalable and modular
architecture
Example: Sum of two values
Conventional architecture:add R1, R2, R3;
MOVE architecture: move O(Fxxx), I1(Fsum)
move O(Fyyy), I2(Fsum)move O(Fsum), I(Fzzz)
Example – add operation
Cellular differentiation
Main features: Only one instruction (OK, maybe two) that MOVEs data to
and from the CUs and FUs (dataflow architecture) Conventional fetch/decode mechanism – compatible with
bio-inspired mechanisms No pipeline: computation carried out in specialized
functional units (FU) Communication carried out in specialized communication
units (CU)
Cellular differentiation
Main advantages: Can be easily customized by introducing application-specific functional and communication units. Perfectly fits the requirements of systolic arrays (arbitrarily complex communication patterns). The introduction of custom components does not affect the assembler language, the code
structure, the fetch and decode units, or the transport bus.
Genotype Layer
Phenotype Layer
Example – Automatic Synthesis
Application-specific (parallel) functions
Developmental algorithm
Genetic code
Mapping Layer
Example – Automatic Synthesis
Phenotype Layer
Mapping Layer
Genotype Layer
Totipotent Cell
Example – Automatic SynthesisTotipotent CellProgrammable Logic
Example – Automatic SynthesisProgrammable Logic
Cellular Array
What kind of applications can take advantage of this kind of system?
Complex "real-world" streaming applications computation is carried out sequentially can be represented by a DAG of computation nodes each node processes data locally then forwards
them to the next node in the graph
Applications
×+ ÷≠ FFT +
×
DCTIN OUT
READ DCT QNTZ CMPR WRT
Example: JPEG
Specialized MOVE functional units can be designed for each of these steps
IN OUT
Programmable substrate
×+ ÷≠ FFT +
×
DCT
Context
IN OUT
Problem: task or resource allocation – i.e. how do we map the graph nodes to the array?
Specifically: dynamic allocation
Self-Scaling Stream Processing
Source
Funct A
Funct B
Funct C
JoinFunct AFunct AFunct A
Funct CFunct
AFunct A
Funct CFunct
A
Funct C
SSSP The MJPEG application consists of a four-stage
computation pipeline. The data to be compressed are composed of 192 bytes corresponding to an 8x8 array of pixels using 24-bit colour.
The maximum rate achievable (determined by the input rate) is of 700 packets per second - roughly 1 MBit/second. With a single pipeline, the performance tops at about 60 packets per second.
SSSP
When performance peaks, the average output rate is of 675 packets per second (out of a maximum of 700): this technique allows to multiply the throughput by a factor of 11 using 28 processors.
Next lecture What kind of tools have to be developed to
implement a complete system? How do we determine optimal Fus for a given