An Object Detection/Recognition System using a 3 ... · 2. Object detection/recognition algorithm The Eigenfaces method should be suitable for object recognition hardware architecture

An Object Detection/Recognition System using a 3-DimensionalIntegration with Local and Global Wireless Interconnections

Hiroshi Ando, Seiji Kameda, Nobuo Sasaki, Daisuke Arizono, Kentaro Kimoto†, NorimitsuFuchigami, Kouta Kaya, Mamoru Sasaki, Takamaro Kikkawa† and Atsushi Iwata

Graduate School of Advanced Sciences of Matter, Hiroshima University†Research Center for Nanodevices and Systems, Hiroshima University

Phone: +81-824-22-7358, E-mail: [email protected]

1. Introduction

In order to realize hyper brain system which can recog-nize various objects in real-time/real-world, numbers ofchips with massively parallel processing and wideband in-terconnection capabilities are needed. To assemble thesemulti chips with low-power and Gbps bandwidth inter-connection, new integration techniques which replace theconventional System in Package techniques are required.

To solve the problem, we have proposed the 3-dimensio-nal custom stack system (3DCSS) using two kinds ofwireless interconnections: inductive coupling local wire-less interconnect (LWI) and antenna coupling globalwireless interconnect (GWI) [1]. In the system LWI isused to transmit/receive 2D image data between neigh-boring chips in parallel, and GWI is used to trans-mit/receive global system clocks and serial data such ascontrol signals or database between all stacked chips. Inthe asynchronous LWI scheme without any clocking, thehigh bit rate of 1Gbps/ch and low power dissipation of0.95mW/ch has been achieved by a 0.18µm CMOS tech-nology [2]. The generation of ultra short Gaussian mono-cycle pulse which is the fundamental element for imple-menting GWI has been also demonstrated in the sametechnology [3].

To implement the multi-object recognition system, theprocessing algorithm and system/chip architecture whichare suitable to the 3-dimensional integration techniquehave to be developed. Although many kinds of algo-rithm have been reported in a field of human face recogni-tion [4], the most of these were developed aiming at soft-ware realization and did not apply to LSI implementa-tion, because of complex large-scale calculation and hugememory capacity.

In this research, we have developed the architectureadopting “Eigenfaces” method based on PCA (PrincipalComponent Analysis) which is one of the well-known facedetection/recognition [5]. By combining the Eigenfacesmethod with 3DCSS, we have proposed architecture ofthe multi-object recognition system [6]. We have alsoimplemented the prototype system developing two typesof chips with a 0.18µm CMOS technology. The chip de-sign utilizing the advantages of LWI and GWI has beenalso described.

2. Object detection/recognition algorithm

The Eigenfaces method should be suitable for objectrecognition hardware architecture because of several ad-vantages in both of recognition performance and hard-ware implementation. This method has the equivalentor higher and robust recognition performance comparingwith other recognition algorithms [7]. The various kindsof object can be detected and recognized by only prepar-ing each individual database of them without changingprocessing. In hardware implementation, we can imple-

ment it with massively parallel circuit architecture andconventional digital circuit techniques without nonlinearprocessing, and design a chip which is commonly appliedto detection and recognition without increasing in circuitarea.

We explain a fundamental of the Eigenfaces algorithmbriefly. An i-th face image consists of M pixels is repre-sented as a row vector Γi. A preprocessed face Φi is de-fined by Φi = Γi−Ψ, where Ψ represents the average faceof N images in DB(database), that is Ψ = 1N

∑Nn=1 Γn.

The “eigenfaces” can be calculated as the eigenvectorsak (in an ascending order, k=1, 2, · · ·, m, m ≪ M) ofthe covariance matrix C of DB, C = 1M

∑Mn=1 ΦnΦ

tn,

where Φt is a transposed matrix. A face image is trans-formed into so-called “eigen-space” ωk by a simple op-eration: ωk = atkΦi. The eigen-space ωk forms a vectorΩ = [ω1 ω2 . . . ωm] that describes the contribution ofeach eigenface for face image.

Face detection is performed by generally used thresh-olding methods. A reconstructed image Φr, defined byΦr =

∑mk=1 ωkak, is used as an input of evaluation func-

tion for thresholding. For example, Euclidean distanceε = ∥Φin − Φr∥ is often used as evaluation function,where Φin is a preprocessed unknown input image. Ifthe value ε is lower than a threshold, an unknown in-put image is classified as a human face. Face recogni-tion is also achieved with the same calculations exceptfor evaluation function. If the face space vector ΩDBiof i-th face image in DB leads to the minimum distanceεmin = ∥Ω−ΩDBi∥, we can know that the input face isthe same as i-th face.

3. Hardware implementation

A schematic of the proposed multi-object recogni-tion system is shown in Fig. 1. This system consistsof three kinds of chips, that is Visual Processing chip(VP3D) [8], Detection/Recognition chip (DR3D) andReference Memory chip (RM3D). Each chip has 21×2chLWIs which can transmit to and receive data from neigh-boring chips simultaneously and 2ch GWIs for clock andbinary digital data receiving. The RM3D has 2ch GWIsand transmitter circuits for clock and data.

Now we explain the proposed methods of detecting andrecognizing by this system. At first, original image data isstored in RM3D1 and transmitted to neighboring VP3Din 21-pixel parallel PWM (pulse width modulation) sig-nals (LWI-1). The transmission rate is about 160Mbpswhen the maximum bit width and time resolution ofPWM signal is 8bit and 4ns (250MHz clock distributionby GWI-1), respectively. Second, massively parallel im-age pre-processing is implemented by several VP3Ds andresulted image data is transmitted to RM3D2 with LWI-2 as same as LWI-1. Finally, the DR3D receives pro-

cessed image data and object database through LWI-3(5.3Gbps=21bit/4ns), or after storing other database toRM3D2 from RM3DN by GWI-2 (250Mbps), and detectsand recognizes objects.

This system has the ability of 40GOPS (Giga Oper-ation Per Second) at 250MHz operation. Therefore, weexpect to derive 160GOPS performance at the maximumLWI operation (1Gbps/1ch at present).

RM3D1(image/database)

LWI-1 :160Mbps(input image data)

VP3D(visual processing)

LWI-2 :160Mbps

1 :2

50M

bp

s(C

LK)

2 :2

50M

bp

s(d

atab

ase)

LWI-2 :160Mbps(preprocessed image data)

LWI-3 :5.3Gbps(preprocessed image data,object data and database)

DR3D(object detection/

recognition)

RM3DN(database)

GW

I-1

:250

Mb

ps

GW

I-2

:250

Mb

ps

RM3D2(preprocessed

image/database)

Figure 1: Multi-object recognition system with 3DCSS.

3.1 Reference memory chip - RM3D

Figure 2 (a) shows a block diagram of the proposedRM3D storing reference data of both of VP3D andDR3D. The capacity of SRAM is 56kbits for image dataΓ, 196kbits and 123kbits for database Ψ and a. In com-municating with VP3D, the binary digital image datais modulated to PWM signal by DPC (Digital-to-PWMConverter) and transmitted to VP3D in 21-pixel par-allel with LWI. The visual processed data is receivedand stored after demodulation by PDC (PWM-to-DigitalConverter). The 21bit digital bus data for one pixel (8bitΓ, 8bit Ψ and 5bit a) is transmitted to DR3D in pixelserial with LWI. The clock signal generated by VCO(Voltage Controlled Oscillator) and binary data storedin memory are transmitted to all of stacked chips. Ifwe need huge memory capacity for database, we shouldonly stack several RM3Ds because of wireless widebandcommunications by GWI.

3.2 Detection/recognition chip - DR3D

A block diagram of the proposed DR3D which enablesto implement the object detection/recognition algorithmmentioned in Sec. 2 is shown in Fig. 2 (b). The DR3Dcan achieve the two operation modes of object detectionand recognition in common circuits and 32-pixel paralleloperation by utilizing the advantages of Eigenfaces algo-rithm.

At first, 21bit bus data are received by LWI and storedto each 32×32 shift register, where pixel size of object

is 32×32, and converted to 32-pixel parallel data byshift-register. Second, in reconstructed image generator,Φi = Γi − Ψ is calculated by subtracter, ωk = atkΦi iscalculated by multiplier and we obtain Φr =

∑mk=1 ωkak

by accumulator in 32-pixel parallel. Finally, Manhattandistance εi = ∥Φi − Φr∥ is calculated with subtracterand compared in Winner-take-all circuits, detection orrecognition process is finished.

Thus, the proposed multi-object recognition systemcould be implemented by making the most of LWI’s andGWI’s advantages that the Gbps multi channel commu-nications enable to execute parallel processing and long-line wireless communications make it possible to stackseveral memory chips.

3.3 Fabrication and integration

Test chips of RM3D and DR3D fabricated in a 0.18µmCMOS technology are shown in Fig. 3. The chip sizewas 5×5mm2, and the supply voltage and operation fre-quency were 1.8V and 250MHz, respectively. The detec-tion time was 580µs and the one-object to one-databaserecognition time was 4.2µs at 84×84 image and 32×32object size. The 20.6ms detection time and 12.7ms recog-nition time (30fps) should be achieved if we estimate thesystem ability at QVGA image which includes about 30objects and 100 database objects.

The custom flexible printed circuit (FPC) shown inFig. 4 was developed for testing each chip. Note thatthe most of area around a chip is needless because thisFPC was used for preliminary measurements. We con-firmed basic operation such as memory read/write, con-trol signal generation. Now prototype 3DCSS is underdevelopment by stacking the measured chip.

4. Conclusion

The multi-object recognition system architecture wasdeveloped by utilizing the recognition algorithm basedon Eigenfaces method and the 3-D integration scheme(3DCSS) with two types of wireless interconnections ofLWI and GWI. The prototype system was designed with3 types of chips for object detection/recognition, refer-ence data storage and image pre-processing. Processingperformance of 40GOPS at 250MHz was obtained by thechips with a 0.18µm CMOS technology. Object detectionand recognition system performance of 580µs detectiontime and 4.2µs one-object to one-database recognitiontime was obtained.

References[1] A. Iwata, et al., “A 3D-integration scheme utilizing wire-

less interconnections for implementing hyper brains”, Di-gest of ISSCC2005, pp. 262-263, Feb 6-10, 2005.

[2] M. Sasaki, et al., “A 0.95mW/1.0Gbps spiral-inductorbased wireless chip-interconnect with asynchronous com-munication scheme”, Digest of Sympo. on VLSI Circuits,June 2005.

[3] N. Sasaki, et al., “A single-chip Gaussian monocyclepulse transmitter using 0.18µm CMOS technology for in-tra/interchip UWB communication”, Digest of Sympo. onVLSI Circuits, 2006.

[4] W. Zhao, et al., “Face Recognition: A Literature Survey”,ACM CSUR archive, vol. 35, pp. 399-458, 2003.

[5] M. A. Turk, et al., “Eigenfaces for Recognition”,CVPR’91, pp. 586–591, 1991.

[6] H. Ando, et al., “A prototype software system for multi-object recognition and its fpga implementation”, Proc.Third Hiroshima International Workshop on NTIP, pages89–90, 2004.

[7] P. J. Phillips, et al., “The FERET evaluation methodol-ogy for face-recognition algorithms”, IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 22, pp. 1090–1104,2000.

[8] S. Kameda, et al., “A Brain-type vision system using a3-Dimensional integration with local and global wirelessinterconnections”,Proc. Fourth Hiroshima InternationalWorkshop on NTIP, pages 38–39, 2005.

56kbits SRAM

CTRL

LW

I

x21

8

192kbits SRAM

8 5

128kbitsSRAM

VC

OR

x

DPCPDC

ΓΓ ΓΓ i

ΨΨ ΨΨ

a k

Selector

aver

age

vect

or

eige

n ve

ctor

imag

e

4x84

SR

AM

4x84

SR

AM

4x84

SR

AM

Tx

Rx

Tx

GW

IMemory

data

base

CLK

x21

(a) RM3D

spiral inductor

dipo

le a

nten

na

Win

ner-

take

-all

ΓΓΓΓi ΨΨΨΨ ak

ΦΦΦΦi εεεεi

25kb

its b

uffe

r m

emor

y

Rx

CLK

x21

8 8 5

x32 x32 x32ΓΓΓΓi ΨΨΨΨ ak

subtracter

multiplier

accumuratorωωωωk ΦΦΦΦr

ΦΦΦΦi

reconstructed image generator

subt

ract

er

Manhattandistance

32x3

2 sh

iftre

gist

er(im

age)

32x3

2 sh

iftre

gist

er (

ave)

32x3

2 sh

iftre

gist

er (

eige

n)

Object detection/recognition

(b) DR3D

Figure 2: Block diagrams of RM3D and DR3D.

(a)

LWI

56kbits SRAMfor image data

123kbits SRAMfor database

196kbits SRAMfor database

Tx/

Rx

An

ten

na

for

dat

abas

e

Data Tx

Data Rx

CLK RxCLK Tx

Tx/Rx Antenna for CLK

LWI

Rx

An

ten

na

for

dat

abas

e

Data Rx

Rx

An

ten

na

for

dat

abas

eRx Antenna for CLK

CLK Rx

Object detection/recognition

(b)

Figure 3: Microphotographs of RM3D (a) and DR3D (b).

3DCSS chip(about 50µµµµm thickness)

pattern forconnecting board

Figure 4: FPC board.

An Object Detection/Recognition System using a 3 ... · 2. Object detection/recognition algorithm The Eigenfaces method should be suitable for object recognition hardware architecture

Documents