Maseeh College of Engineering and Computer Science 06/16/22 1 ERA: Architectures for Inference Dan Hammerstrom Electrical And Computer Engineering
Dec 16, 2015
Maseeh College of Engineering and Computer Science04/18/23 1
ERA: Architectures for Inference
Dan Hammerstrom
Electrical And Computer Engineering
Maseeh College of Engineering and Computer ScienceHammerstrom04/18/23 2
Intelligent Computing
In spite of the transistor bounty of Moore’s law, there is a large class of problems that computers still do not solve well
These problems involve the transformation of data across the boundary between the real world and the digital world
They occur wherever a computer is sampling and acting on real world data, which includes almost all embedded computing applications
Our lack of general solutions to these problems, outside of specialized niches, constitutes a significant barrier to computer usage and huge potential markets
Maseeh College of Engineering and Computer ScienceHammerstrom04/18/23 3
These are difficult problems that require computers to find complex structures and relationships through space and time in massive quantities of low precision, ambiguous, noisy data
AI pursued solutions in this area, but ran into scaling problems among other things
Artificial Neural Networks (ANN) extended computational intelligence in a number of important ways, primarily by adding the ability to incrementally learn and adapt
However, ANNs also had trouble scaling, and they were often difficult to apply to many problems
Traditional Rule Based Knowledge systems are now evolving into probabilistic structures where inference becomes the key computation, generally based on Bayes’ rule
Maseeh College of Engineering and Computer ScienceHammerstrom04/18/23 4
Bayesian Networks
We now have Bayesian networks A major contribution to this effort was the work of Judea Pearl
Pearl, J., Probabilistic Reasoning In Intelligent Systems – Networks of Plausible Inference, Morgan Kaufman, 1988 & 1997
These systems are far less brittle and they also more faithfully model aspects of animal behavior
Animals learn from their surroundings and appear to do a kind of probabilistic inference from learned knowledge as they interact with their environment
Bayesian nets express the structured, graphical representations of probabilistic relationships between several random variables The graph structure is an explicit representation of conditional dependence (encoded by
network edges)
And the fundamental computation has become probabilistic inference
Maseeh College of Engineering and Computer ScienceHammerstrom04/18/23 5
A Simple Bayesian Network
A
C
B
D
P(a)
P(b|a)
P(c|a)P(d|b,c)
P(d|b,c) d1 d2
b1, c1 0.5 0.5
b2, c1 0.3 0.7
b1, c2 0.9 0.1
b2, c2 0.8 0.2
The CPT for node D,there are similar tablesfor A, B, and C
Each node has a CPT or“Conditional Probability Table”
Maseeh College of Engineering and Computer ScienceHammerstrom04/18/23 6
Encoder DecoderNoisy
Channel
Noise
Original Messagey
Encoded Messagex
ReceivedMessage
x’
DecodedMessage
y’
We need to “infer” the most likely originalmessage given the data we received and ourknowledge of the statistics of channel errorsand the messages being generated
The Inference Problem:Choose the most likely y,based on P[y|x’]
( ' | ) ( ) ( ' | ) ( )( | ')
( ') ( ' | ) ( )x
p x y p y p x y p yp y x
p x p x x p x
Inference,A SimpleExample
Maseeh College of Engineering and Computer ScienceHammerstrom
Assumption: Bayesian Inference As A Fundamental Computation In Future Systems
In mapping inference computations to hardware there there are a number of issues to be considered, including: The type and degree of parallelism (multiple, independent threads versus data parallelism) Arithmetic precision Inter-thread communication Local storage requirements, etc.
There are several variations on basic Bayesian techniques, over a number of different fields, from communication theory and pattern recognition, to computer vision, robotics, and speech recognition
However, for this review of inference as a computational model, three general families of algorithms are considered: 1) Inference by Analysis, 2) Inference by Random Sampling, and 3) Inference Using Distributed Representations
04/18/23 7
Maseeh College of Engineering and Computer ScienceHammerstrom
Analytic Inference
Analytic techniques constitute the most widely used approach for doing inference over Bayesian Networks
Most Bayesian Networks are evaluated using Bayesian Belief Propagation (BBP) developed by Pearl Typically data are input to the network by setting certain variables to known or observed
values (“evidence”) Bayesian Belief Propagation is then performed to find the probability distributions of the free
variables
Analytic techniques require significant precision and dynamic range, generally in the form of floating point representations
This plus the limited parallelism makes them good candidates for multi-core architectures, but not necessarily for more advanced nano-scale computation
04/18/23 8
Maseeh College of Engineering and Computer ScienceHammerstrom
Random Sampling
Another approach to handling inference is via random sampling There are a number of techniques, most of which fall under the general category of
Monte Carlo Simulation Again, evidence is input by setting some nodes to known values Then random samples of the free variables are generated
Two techniques that are commonly used at Adaptive Importance Sampling and Markov Chain Monte Carlo simulation
These techniques basically use adaptive sampling techniques to do an algorithmic search of the model’s vector space
For large complex Bayesian Structures, such random sampling can often be the best way to evaluate the network
However, random sampling suffers from the fact that as the size of the Bayesian Network increases, increasingly larger sample sets are required to obtain sufficiently accurate statistics
04/18/23 9
Maseeh College of Engineering and Computer ScienceHammerstrom
An example of hardware / device support for such a system is the work of Prof. Krishna Palem (Rice University) on probabilistic CMOS (pCMOS) Prof. Palem has shown that using pCMOS can provide significant performance benefits in
implementing Monte Carlo random sampling pCMOS logic is used to accelerate the generation of random numbers, thereby accelerating
the sampling process Monte Carlo techniques are computationally intensive and so still tend to have
scaling limitations However, on the plus side, massively parallel evaluations are possible and arithmetic
precision requirements are less constrained So such techniques map cleanly to simpler, massively parallel, low precision computing
structures These techniques may also benefit from morphic cores with hardware accelerated
random number generation.
04/18/23 10
Maseeh College of Engineering and Computer ScienceHammerstrom
Distributed Data Representation Networks
Bayesian Networks based on distributed data representations (DDR) are actually a different way to structure Bayesian networks, and although analytic and sampling techniques can be used on these structures, they also allow different kinds of massively parallel execution
The use of DDR is very promising, but is also the most limited in terms of demonstrations of real applications
Computing with DDRs can be thought of as the computational equivalent of spread spectrum communication In a distributed representation, meaning is not represented by single symbolic units, but is
the result of the interaction of a group of units typically configured in a network structure, and often each unit can participate in several representations
Representing data in this manner more easily allows incremental, integrative, decentralized adaptation
The computational and communication loads are spread more evenly across the system Distributed representation also appears to be an important computational principle in neural
systems
04/18/23 11
Maseeh College of Engineering and Computer ScienceHammerstrom
Biological neural circuits perform inference over huge knowledge structures in fractions of a second
It is not clear exactly how they manage this incredible trick, especially since bayesian inference is so computationally intense
However, there is some speculation that DDR plays an important role – however, more theory and algorithm development is needed This is an active area of research and no doubt much progress will be made by the time
many Emerging Research Devices become commercially available.
One hypothesis is that hierarchical networks lead naturally to the distribution of representation, and subsequently to significant “factorization” of the inference process, this, coupled with massively parallel hardware, may enable entirely new levels of inference capabilities.
04/18/23 12
Maseeh College of Engineering and Computer ScienceHammerstrom
13
One such model was developed by Lee and Mumford, who proposed a hierarchical, Bayesian inference model of the primate visual cortex
Lee and Mumford Visual cortex model
Maseeh College of Engineering and Computer ScienceHammerstrom
Another example is the work of Jeff Hawkins and Dileep George at Numenta, Inc.
Their model starts with an approximation to a general Bayesian module, which can then be combined into a hierarchy to form what they call a Hierarchical Temporal Memory (HTM)
Issues related to hardware architectures for Bayesian Inference and how they may be implemented with emerging devices are now being studied
04/18/23 14
Maseeh College of Engineering and Computer ScienceHammerstrom
Architecture
Mapping Bayesian Networks to a multi-core implementation is straightforward, just implement each node as a task - and in a simple SMP based, multi-core machine that would most certainly provide good performance
However, this approach breaks down as we scale to very large networks
Bayesian Networks tend to be storage intensive, so implementation issues such as data structure organization, memory management and cache utilization also become important In fact a potentially serious performance constraint may be access to primary memory, and it
is not yet clear how effective caching will be in ameliorating this delay
However, as we scale to the very large networks required to solve complex problems, a variety of optimizations become possible and, in fact, necessary
04/18/23 15
Maseeh College of Engineering and Computer ScienceHammerstrom
One promising massively parallel approach is that of associative processing, which has been shown to approximate Bayesian inference, and which has the potential for huge levels of parallelism
Using morphic cores for heterogeneous multi-core structures, such massively parallel implementations of Bayesian networks becomes relevant
Another interesting variation is to eliminate synchrony, where inter-module update messages arrive at random times and computation within a module proceeds at its own pace, updating its internal estimates when it receives update messages, otherwise continuing without them
More study is needed to explore radical new implementation technologies and how they may be used to do inference
04/18/23 16
Maseeh College of Engineering and Computer ScienceHammerstrom
Algorithm Family Summary
Technique Parallelism (Threads)
Inter-Thread Communicatio
n
Computational Precision
Storage/Node State of the Art
Analytic Moderate Moderate High Moderate Mature
Random Sampling
High Low Moderate Moderate Mature
Distributed / Hierarchies
High Low Low Low Preliminary
04/18/23 17