-
An Object Detection/Recognition System using a
3-DimensionalIntegration with Local and Global Wireless
Interconnections
Hiroshi Ando, Seiji Kameda, Nobuo Sasaki, Daisuke Arizono,
Kentaro Kimoto†, NorimitsuFuchigami, Kouta Kaya, Mamoru Sasaki,
Takamaro Kikkawa† and Atsushi Iwata
Graduate School of Advanced Sciences of Matter, Hiroshima
University†Research Center for Nanodevices and Systems, Hiroshima
University
Phone: +81-824-22-7358, E-mail: [email protected]
1. Introduction
In order to realize hyper brain system which can recog-nize
various objects in real-time/real-world, numbers ofchips with
massively parallel processing and wideband in-terconnection
capabilities are needed. To assemble thesemulti chips with
low-power and Gbps bandwidth inter-connection, new integration
techniques which replace theconventional System in Package
techniques are required.
To solve the problem, we have proposed the 3-dimensio-nal custom
stack system (3DCSS) using two kinds ofwireless interconnections:
inductive coupling local wire-less interconnect (LWI) and antenna
coupling globalwireless interconnect (GWI) [1]. In the system LWI
isused to transmit/receive 2D image data between neigh-boring chips
in parallel, and GWI is used to trans-mit/receive global system
clocks and serial data such ascontrol signals or database between
all stacked chips. Inthe asynchronous LWI scheme without any
clocking, thehigh bit rate of 1Gbps/ch and low power dissipation
of0.95mW/ch has been achieved by a 0.18µm CMOS tech-nology [2]. The
generation of ultra short Gaussian mono-cycle pulse which is the
fundamental element for imple-menting GWI has been also
demonstrated in the sametechnology [3].
To implement the multi-object recognition system, theprocessing
algorithm and system/chip architecture whichare suitable to the
3-dimensional integration techniquehave to be developed. Although
many kinds of algo-rithm have been reported in a field of human
face recogni-tion [4], the most of these were developed aiming at
soft-ware realization and did not apply to LSI implementa-tion,
because of complex large-scale calculation and hugememory
capacity.
In this research, we have developed the architectureadopting
“Eigenfaces” method based on PCA (PrincipalComponent Analysis)
which is one of the well-known facedetection/recognition [5]. By
combining the Eigenfacesmethod with 3DCSS, we have proposed
architecture ofthe multi-object recognition system [6]. We have
alsoimplemented the prototype system developing two typesof chips
with a 0.18µm CMOS technology. The chip de-sign utilizing the
advantages of LWI and GWI has beenalso described.
2. Object detection/recognition algorithm
The Eigenfaces method should be suitable for objectrecognition
hardware architecture because of several ad-vantages in both of
recognition performance and hard-ware implementation. This method
has the equivalentor higher and robust recognition performance
comparingwith other recognition algorithms [7]. The various kindsof
object can be detected and recognized by only prepar-ing each
individual database of them without changingprocessing. In hardware
implementation, we can imple-
ment it with massively parallel circuit architecture
andconventional digital circuit techniques without
nonlinearprocessing, and design a chip which is commonly appliedto
detection and recognition without increasing in circuitarea.
We explain a fundamental of the Eigenfaces algorithmbriefly. An
i-th face image consists of M pixels is repre-sented as a row
vector Γi. A preprocessed face Φi is de-fined by Φi = Γi−Ψ, where Ψ
represents the average faceof N images in DB(database), that is Ψ =
1N
∑Nn=1 Γn.
The “eigenfaces” can be calculated as the eigenvectorsak (in an
ascending order, k=1, 2, · · ·, m, m ≪ M) ofthe covariance matrix C
of DB, C = 1M
∑Mn=1 ΦnΦ
tn,
where Φt is a transposed matrix. A face image is trans-formed
into so-called “eigen-space” ωk by a simple op-eration: ωk = atkΦi.
The eigen-space ωk forms a vectorΩ = [ω1 ω2 . . . ωm] that
describes the contribution ofeach eigenface for face image.
Face detection is performed by generally used thresh-olding
methods. A reconstructed image Φr, defined byΦr =
∑mk=1 ωkak, is used as an input of evaluation func-
tion for thresholding. For example, Euclidean distanceε = ∥Φin −
Φr∥ is often used as evaluation function,where Φin is a
preprocessed unknown input image. Ifthe value ε is lower than a
threshold, an unknown in-put image is classified as a human face.
Face recogni-tion is also achieved with the same calculations
exceptfor evaluation function. If the face space vector ΩDBiof i-th
face image in DB leads to the minimum distanceεmin = ∥Ω−ΩDBi∥, we
can know that the input face isthe same as i-th face.
3. Hardware implementation
A schematic of the proposed multi-object recogni-tion system is
shown in Fig. 1. This system consistsof three kinds of chips, that
is Visual Processing chip(VP3D) [8], Detection/Recognition chip
(DR3D) andReference Memory chip (RM3D). Each chip has 21×2chLWIs
which can transmit to and receive data from neigh-boring chips
simultaneously and 2ch GWIs for clock andbinary digital data
receiving. The RM3D has 2ch GWIsand transmitter circuits for clock
and data.
Now we explain the proposed methods of detecting andrecognizing
by this system. At first, original image data isstored in RM3D1 and
transmitted to neighboring VP3Din 21-pixel parallel PWM (pulse
width modulation) sig-nals (LWI-1). The transmission rate is about
160Mbpswhen the maximum bit width and time resolution ofPWM signal
is 8bit and 4ns (250MHz clock distributionby GWI-1), respectively.
Second, massively parallel im-age pre-processing is implemented by
several VP3Ds andresulted image data is transmitted to RM3D2 with
LWI-2 as same as LWI-1. Finally, the DR3D receives pro-
-
cessed image data and object database through
LWI-3(5.3Gbps=21bit/4ns), or after storing other database toRM3D2
from RM3DN by GWI-2 (250Mbps), and detectsand recognizes
objects.
This system has the ability of 40GOPS (Giga Oper-ation Per
Second) at 250MHz operation. Therefore, weexpect to derive 160GOPS
performance at the maximumLWI operation (1Gbps/1ch at present).
RM3D1(image/database)
LWI-1 :160Mbps(input image data)
VP3D(visual processing)
LWI-2 :160Mbps
1 :2
50M
bp
s(C
LK)
2 :2
50M
bp
s(d
atab
ase)
LWI-2 :160Mbps(preprocessed image data)
LWI-3 :5.3Gbps(preprocessed image data,object data and
database)
DR3D(object detection/
recognition)
RM3DN(database)
GW
I-1
:250
Mb
ps
GW
I-2
:250
Mb
ps
RM3D2(preprocessed
image/database)
Figure 1: Multi-object recognition system with 3DCSS.
3.1 Reference memory chip - RM3D
Figure 2 (a) shows a block diagram of the proposedRM3D storing
reference data of both of VP3D andDR3D. The capacity of SRAM is
56kbits for image dataΓ, 196kbits and 123kbits for database Ψ and
a. In com-municating with VP3D, the binary digital image datais
modulated to PWM signal by DPC (Digital-to-PWMConverter) and
transmitted to VP3D in 21-pixel par-allel with LWI. The visual
processed data is receivedand stored after demodulation by PDC
(PWM-to-DigitalConverter). The 21bit digital bus data for one pixel
(8bitΓ, 8bit Ψ and 5bit a) is transmitted to DR3D in pixelserial
with LWI. The clock signal generated by VCO(Voltage Controlled
Oscillator) and binary data storedin memory are transmitted to all
of stacked chips. Ifwe need huge memory capacity for database, we
shouldonly stack several RM3Ds because of wireless
widebandcommunications by GWI.
3.2 Detection/recognition chip - DR3D
A block diagram of the proposed DR3D which enablesto implement
the object detection/recognition algorithmmentioned in Sec. 2 is
shown in Fig. 2 (b). The DR3Dcan achieve the two operation modes of
object detectionand recognition in common circuits and 32-pixel
paralleloperation by utilizing the advantages of Eigenfaces
algo-rithm.
At first, 21bit bus data are received by LWI and storedto each
32×32 shift register, where pixel size of object
is 32×32, and converted to 32-pixel parallel data
byshift-register. Second, in reconstructed image generator,Φi = Γi
− Ψ is calculated by subtracter, ωk = atkΦi iscalculated by
multiplier and we obtain Φr =
∑mk=1 ωkak
by accumulator in 32-pixel parallel. Finally, Manhattandistance
εi = ∥Φi − Φr∥ is calculated with subtracterand compared in
Winner-take-all circuits, detection orrecognition process is
finished.
Thus, the proposed multi-object recognition systemcould be
implemented by making the most of LWI’s andGWI’s advantages that
the Gbps multi channel commu-nications enable to execute parallel
processing and long-line wireless communications make it possible
to stackseveral memory chips.
3.3 Fabrication and integration
Test chips of RM3D and DR3D fabricated in a 0.18µmCMOS
technology are shown in Fig. 3. The chip sizewas 5×5mm2, and the
supply voltage and operation fre-quency were 1.8V and 250MHz,
respectively. The detec-tion time was 580µs and the one-object to
one-databaserecognition time was 4.2µs at 84×84 image and
32×32object size. The 20.6ms detection time and 12.7ms recog-nition
time (30fps) should be achieved if we estimate thesystem ability at
QVGA image which includes about 30objects and 100 database
objects.
The custom flexible printed circuit (FPC) shown inFig. 4 was
developed for testing each chip. Note thatthe most of area around a
chip is needless because thisFPC was used for preliminary
measurements. We con-firmed basic operation such as memory
read/write, con-trol signal generation. Now prototype 3DCSS is
underdevelopment by stacking the measured chip.
4. Conclusion
The multi-object recognition system architecture wasdeveloped by
utilizing the recognition algorithm basedon Eigenfaces method and
the 3-D integration scheme(3DCSS) with two types of wireless
interconnections ofLWI and GWI. The prototype system was designed
with3 types of chips for object detection/recognition, refer-ence
data storage and image pre-processing. Processingperformance of
40GOPS at 250MHz was obtained by thechips with a 0.18µm CMOS
technology. Object detectionand recognition system performance of
580µs detectiontime and 4.2µs one-object to one-database
recognitiontime was obtained.
References[1] A. Iwata, et al., “A 3D-integration scheme
utilizing wire-
less interconnections for implementing hyper brains”, Di-gest of
ISSCC2005, pp. 262-263, Feb 6-10, 2005.
[2] M. Sasaki, et al., “A 0.95mW/1.0Gbps spiral-inductorbased
wireless chip-interconnect with asynchronous com-munication
scheme”, Digest of Sympo. on VLSI Circuits,June 2005.
[3] N. Sasaki, et al., “A single-chip Gaussian monocyclepulse
transmitter using 0.18µm CMOS technology for in-tra/interchip UWB
communication”, Digest of Sympo. onVLSI Circuits, 2006.
[4] W. Zhao, et al., “Face Recognition: A Literature Survey”,ACM
CSUR archive, vol. 35, pp. 399-458, 2003.
[5] M. A. Turk, et al., “Eigenfaces for Recognition”,CVPR’91,
pp. 586–591, 1991.
[6] H. Ando, et al., “A prototype software system for
multi-object recognition and its fpga implementation”, Proc.Third
Hiroshima International Workshop on NTIP, pages89–90, 2004.
-
[7] P. J. Phillips, et al., “The FERET evaluation methodol-ogy
for face-recognition algorithms”, IEEE Trans. PatternAnalysis and
Machine Intelligence, vol. 22, pp. 1090–1104,2000.
[8] S. Kameda, et al., “A Brain-type vision system using
a3-Dimensional integration with local and global
wirelessinterconnections”,Proc. Fourth Hiroshima
InternationalWorkshop on NTIP, pages 38–39, 2005.
56kbits SRAM
CTRL
LW
I
x21
8
192kbits SRAM
8 5
128kbitsSRAM
VC
OR
x
DPCPDC
ΓΓ ΓΓ i
ΨΨ ΨΨ
a k
Selector
aver
age
vect
or
eige
n ve
ctor
imag
e
4x84
SR
AM
4x84
SR
AM
4x84
SR
AM
Tx
Rx
Tx
GW
IMemory
data
base
CLK
x21
(a) RM3D
spiral inductor
dipo
le a
nten
na
Win
ner-
take
-all
ΓΓΓΓi ΨΨΨΨ ak
ΦΦΦΦi εεεεi
25kb
its b
uffe
r m
emor
y
Rx
CLK
x21
8 8 5
x32 x32 x32ΓΓΓΓi ΨΨΨΨ ak
subtracter
multiplier
accumuratorωωωωk ΦΦΦΦr
ΦΦΦΦi
reconstructed image generator
subt
ract
er
Manhattandistance
32x3
2 sh
iftre
gist
er(im
age)
32x3
2 sh
iftre
gist
er (
ave)
32x3
2 sh
iftre
gist
er (
eige
n)
Object detection/recognition
(b) DR3D
Figure 2: Block diagrams of RM3D and DR3D.
(a)
LWI
56kbits SRAMfor image data
123kbits SRAMfor database
196kbits SRAMfor database
Tx/
Rx
An
ten
na
for
dat
abas
e
Data Tx
Data Rx
CLK RxCLK Tx
Tx/Rx Antenna for CLK
LWI
Rx
An
ten
na
for
dat
abas
e
Data Rx
Rx
An
ten
na
for
dat
abas
eRx Antenna for CLK
CLK Rx
Object detection/recognition
(b)
Figure 3: Microphotographs of RM3D (a) and DR3D (b).
3DCSS chip(about 50µµµµm thickness)
pattern forconnecting board
Figure 4: FPC board.